Playing With Legos

Do you remember playing with legos as a kid? This is what our query language Rel feels like. You don’t have to write 300 lines of SQL code and go absolutely insane trying to debug or handle it all in your head. You make progress with every little definition. You can divide and conquer the problem by building small pieces, combining them together and creating something that is greater than the sum of its parts.

When I used to write long SQL queries for reports or complex business logic, I would get so jealous that other developers could just refactor 10 lines into a method whenever they wanted but I had to keep this monstrosity of a query all in my head. They had the audacity to complain that somebody interrupted them and ruined their “flow”.

Do you know how long it takes to rebuild 300 lines of SQL code in your head after somebody interrupts you to ask if you want to try that new Italian/Sushi Fusion restaurant that just opened up last week? No Jim, I don’t want seaweed wrapped meatballs topped with fish roe…ever.

Anyway, Rel brings the advantages of procedural programming (opens in a new tab) into the query language. It is absolutely without a doubt, the greatest thing since sliced bread for query writers, and I’m going to show you why.

Our task will be to represent a graph dataset in Rel and ask some basic questions about it.

Zachary’s Karate Graph

In the world of Graphs, Zachary’s Karate Graph (opens in a new tab) is considered a classic.

It represents the social network of a university Karate Club studied by Wayne W. Zachary about 50 years ago. It captures the personal relationships of 34 members of the club and the split that occurred between them when the club administrator and the instructor had a falling out.

Let’s build up the graph, piece by piece. The members of the karate club will be represented by nodes. Since there are just 34 of them we could type them out, but instead we’ll use a method from the standard library (opens in a new tab) called range (opens in a new tab). It looks like:

// Generate a relation with integer values between low and high,
// inclusive, advancing by stride each time.
 
range(low, hi, stride, x)

Let’s create them and output the count just to make sure we are starting things off correctly.

// read query
 
def number_of_members = 34
def node = range[1, number_of_members, 1]
def output = count[node]

Output:

Now we need to create the relationships between these club members. I copied the dataset (opens in a new tab) on the Wikipedia page replacing spaces with commas, brackets with parentheses, added semicolons, and enclosed them inside curly brackets to match Rel’s syntax:

// model
 
// We install the relation, `edge`, as part of our model
// so we can query it later.
def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}

// read query
 
def output = count[edge]

Output:

The count says there are 78 relationships and that is what we expected. Let’s take a look at the edges of one of the nodes.

Discovering Relationships

To get the social relations of the karate club member designated as Node 3 we simply ask for them using edge[3]:

// read query
 
def output = edge[3]

Output:

If we scroll back up to the edge definition, we can see that indeed Node 3 is related to Node 1 and Node 2. However, the relations between members are supposed to be friendships and should be undirected. Let’s fix that and try outputting the friendships of Node 3.

// model
 
def friendship(x, y) = edge(x, y) or edge(y, x)

// read query
 
def output = friendship[3]

Output:

Now we can see that Node 3 has 10 friendships with other members.

When talking about graphs, the idiom is to call these nodes “neighbors”, so we will stick with tradition and make an alias for neighbor that is just “friendship”.

// model
 
def neighbor(x, y) = friendship(x, y)

// read query
 
def output = neighbor[3]

Output:

We have the same 10 results. Remember that for a moment as we ask how many are outgoing relationships and how many are incoming relationships.

Degrees

We call the count of the relationships from and to each node the degree (opens in a new tab) of a node. We can define the “outdegree” of a node as the count of the number of edges that begin with that node, since these are going outwards to other nodes:

// model
 
def outdegree[x] = count[edge[x]]

// read query
 
def output = outdegree[3]

Output:

These are the edges connecting to Node 1 and Node 2 that we queried before. Now let’s try incoming relationships. That should just be the count of the edges coming to us from other nodes.

The right way to write this query is by using for (opens in a new tab). This is like a “for loop” in procedural languages. Now we get the edge relations including x and y and count those that have 3 for x:

// model
 
def indegree[x] = count[edge(y, x) for y]

// read query
 
def output = indegree[3]

Output:

There we go! That works as we expected.

How about the count of all the relationships regardless of direction? For that we just need the degree of a node, and we can simply add the outdegree and the indegree we wrote earlier:

// read query
 
def degree[x] = outdegree[x] + indegree[x]
def output = degree[3]

Output:

The way we can build definitions by combining other definitions is pretty awesome!

We could also take advantage of the neighbor definition we wrote earlier, which looks clearer.

The degree of a node is the count of the neighbors of a node:

// read query
 
def degree[x] = count[neighbor[x]]
def output = degree[3]

Output:

Two-Hop Relationships

So far, we’ve just been talking about single-hop relationships. Let’s try two hops and get the common neighbors between two nodes using intersect (opens in a new tab). It does exactly what you think it does. It takes two sets and returns the things that are in both of them.

// model
 
def common_neighbor[x, y] = intersect[neighbor[x], neighbor[y]]

// read query
 
def output = common_neighbor[3,18]

Output:

Node 1 and Node 2 are indeed the neighbors of both Node 3 and Node 18. I wonder which node has the most common neighbors with Node 3? Let’s first get all the common neighbors for Node 3.

// read query
 
def output = common_neighbor[3]

Output (scroll through the output to see all the results):

Wait a minute. We do not want to see Node 3 here (you may need to scroll through the above output to see rows for Node 3). Having common neighbors with ourselves doesn’t make sense, so let’s get rid of Node 3 as a possible other node y by adding a where condition to our query that says we don’t want common neighbors with ourselves.

// read query
 
def output = common_neighbor[3,y] for y where y != 3

Output (scroll through the output to see all the results):

That looks better but we shouldn’t have to do this, it should be taken care of by the definition of common neighbors, so let’s change it there.

Common Neighbors

When computing common neighbors, y cannot equal x. Let’s redefine common_neighbor:

// model
 
def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]], y != x

// read query
 
def output = common_neighbor[3]

Output (scroll through the output to see all the results):

Now let’s count the number of common neighbors:

// read query
 
def output = count[common_neighbor[3]]

Output:

56? Whoops! That is not what we want. We got the count of all of them, we really want them grouped by neighbor, so let’s count for y (the other node).

// read query
 
def output = count[common_neighbor[3, y]] for y

Output (scroll through the output to see all the results):

I think we may reuse that, so we should move it to a definition.

// model
 
def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]

// read query
 
def output = count_of_common_neighbors[3]

Output (scroll through the output to see all the results):

What if we just want to get the count of common neighbors from Node 3 and say Node 4?

// read query
 
def output  = count_of_common_neighbors[3, 4]

Output:

We get the count of four (opens in a new tab) since Node 1, Node 2, Node 8, and Node 14 are all neighbors of both Node 3 and Node 4.

Okay, but we still haven’t answered the question which karate club member has the most friends in common with member 3?

It looks like we can use the max (opens in a new tab) method to figure out the answer:

// read query
 
def output = max[common_neighbor[3]]

Output:

This output tells us that Node 3 has the most common neighbors with Node 33. Let’s see them:

// read query
 
def output = common_neighbor[3,33]

Output:

Hum… that just shows us Node 9, but we know from earlier that Node 4 had nodes 1, 2, 8, and 14 as common neighbors with Node 3, so Node 33 cannot be the node with the most common neighbors of Node 3.

We are actually asking for the highest node with common neighbors, not the maximum count of common neighbors. For that we can use the definition we created earlier:

// read query
 
def output = max[count_of_common_neighbors[3]]

Output:

The query actually gives us the count of the most neighbors shared by another node, but we want to know which node it is. We have a different method in the standard library for that, called argmax (opens in a new tab).

// read query
 
def output = argmax[count_of_common_neighbors[3]]

Output:

Node 34! Let’s see the common neighbors between them:

// read query
 
def output = common_neighbor[3,34]

Output:

There are indeed six of them. How do I get both the node ID and the count of the common neighbors in the output at the same time? We will find out next time.

Adding Value Incrementally

The ability to build queries the way you build lego blocks, adding value incrementally by dividing and conquering hard queries a little at a time, is the greatest thing since sliced bread for query writers.

In part two of this four-part series we’re going to see three amazing features of the Rel language that make it better than sliced bread with peanut butter and jelly.

Oh yeah, and we’re not stopping there. In part three, you’ll see one more feature of Rel that makes it better than sliced bread with a giant heap of Nutella spread all over it.

Is that making you hungry? Do you want to get that sweet taste of Rel? Subscribe to our newsletter to learn when guided previews and general availability are announced.

Tic-Tac-Toe: Knowledge Extraction From Imperative Code

Applications implemented in imperative programming languages encode vast amounts of human knowledge at their core, but the essence of the application logic is often obscured by the imperative nature of the implementation language.

Loopy Lattices and Recursion

Our semantic optimizer is a query optimizer that uses the knowledge of user constraints, the knowledge of the data at hand, and the knowledge of mathematics to find an algorithm to answer a query in the fastest way it can. Here we touch upon just one feature of the semantic optimizer: recursion.

Playing With Legos

Zachary’s Karate Graph

Discovering Relationships

Degrees

Two-Hop Relationships

Common Neighbors

Adding Value Incrementally

Related Posts

Tic-Tac-Toe: Knowledge Extraction From Imperative Code

Loopy Lattices and Recursion

Get Started!