THIS POST IS ARCHIVED

Playing With Legos

Do you remember playing with legos as a kid? This is what our query language Rel feels like. You don’t have to write 300 lines of SQL code and go absolutely insane trying to debug or handle it all in your head. You make progress with every little definition. You can divide and conquer the problem by building small pieces, combining them together and creating something that is greater than the sum of its parts.

When I used to write long SQL queries for reports or complex business logic, I would get so jealous that other developers could just refactor 10 lines into a method whenever they wanted but I had to keep this monstrosity of a query all in my head. They had the audacity to complain that somebody interrupted them and ruined their “flow”.

Do you know how long it takes to rebuild 300 lines of SQL code in your head after somebody interrupts you to ask if you want to try that new Italian/Sushi Fusion restaurant that just opened up last week? No Jim, I don’t want seaweed wrapped meatballs topped with fish roe…ever.

Anyway, Rel brings the advantages of procedural programming into the query language. It is absolutely without a doubt, the greatest thing since sliced bread for query writers, and I’m going to show you why.

Our task will be to represent a graph dataset in Rel and ask some basic questions about it.

Zachary's Karate Graph

In the world of Graphs, Zachary’s Karate Graph is considered a classic.

It represents the social network of a university Karate Club studied by Wayne W. Zachary about 50 years ago. It captures the personal relationships of 34 members of the club and the split that occurred between them when the club administrator and the instructor had a falling out.

Let’s build up the graph, piece by piece. The members of the karate club will be represented by nodes. Since there are just 34 of them we could type them out, but instead we’ll use a method from the standard library called range. It looks like:

// Generate a relation with integer values between low and high,
// inclusive, advancing by stride each time.

range(low, hi, stride, x)

Let’s create them and output the count just to make sure we are starting things off correctly.

def number_of_members = 34
def node = range[1, number_of_members, 1]
def output = count[node]

Output:

Now we need to create the relationships between these club members. I copied the dataset on the Wikipedia page replacing spaces with commas, brackets with parentheses, added semicolons, and enclosed them inside curly brackets to match Rel’s syntax:

// We install the relation, `edge`, as part of our model
// so we can query it later.
def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}

def output = count[edge]

Output:

The count says there are 78 relationships and that is what we expected. Let’s take a look at the edges of one of the nodes.

Discovering Relationships

To get the social relations of the karate club member designated as Node 3 we simply ask for them using edge[3]:

def output = edge[3]

Output:

If we scroll back up to the edge definition, we can see that indeed Node 3 is related to Node 1 and Node 2. However, the relations between members are supposed to be friendships and should be undirected. Let’s fix that and try outputting the friendships of Node 3.

def friendship(x, y) = edge(x, y) or edge(y, x)

def output = friendship[3]

Output:

Now we can see that Node 3 has 10 friendships with other members.

When talking about graphs, the idiom is to call these nodes “neighbors”, so we will stick with tradition and make an alias for neighbor that is just “friendship”.

def neighbor(x, y) = friendship(x, y)

def output = neighbor[3]

Output:

We have the same 10 results. Remember that for a moment as we ask how many are outgoing relationships and how many are incoming relationships.

Degrees

We call the count of the relationships from and to each node the degree of a node. We can define the “outdegree” of a node as the count of the number of edges that begin with that node, since these are going outwards to other nodes:

def outdegree[x] = count[edge[x]]

def output = outdegree[3]

Output:

These are the edges connecting to Node 1 and Node 2 that we queried before. Now let’s try incoming relationships. That should just be the count of the edges coming to us from other nodes.

The right way to write this query is by using for. This is like a “for loop” in procedural languages. Now we get the edge relations including x and y and count those that have 3 for x:

def indegree[x] = count[edge(y, x) for y]

def output = indegree[3]

Output:

There we go! That works as we expected.

How about the count of all the relationships regardless of direction? For that we just need the degree of a node, and we can simply add the outdegree and the indegree we wrote earlier:

def degree[x] = outdegree[x] + indegree[x]
def output = degree[3]

Output:

The way we can build definitions by combining other definitions is pretty awesome!

We could also take advantage of the neighbor definition we wrote earlier, which looks clearer.

The degree of a node is the count of the neighbors of a node:

def degree[x] = count[neighbor[x]]
def output = degree[3]

Output:

Two-Hop Relationships

So far, we’ve just been talking about single-hop relationships. Let’s try two hops and get the common neighbors between two nodes using intersect. It does exactly what you think it does. It takes two sets and returns the things that are in both of them.

def common_neighbor[x, y] = intersect[neighbor[x], neighbor[y]]

def output = common_neighbor[3,18]

Output:

Node 1 and Node 2 are indeed the neighbors of both Node 3 and Node 18. I wonder which node has the most common neighbors with Node 3? Let's first get all the common neighbors for Node 3.

def output = common_neighbor[3]

Output (scroll through the output to see all the results):

Wait a minute. We do not want to see Node 3 here (you may need to scroll through the above output to see rows for Node 3). Having common neighbors with ourselves doesn’t make sense, so let's get rid of Node 3 as a possible other node y by adding a where condition to our query that says we don’t want common neighbors with ourselves.

def output = common_neighbor[3,y] for y where y != 3

Output (scroll through the output to see all the results):

That looks better but we shouldn't have to do this, it should be taken care of by the definition of common neighbors, so let's change it there.

Common Neighbors

When computing common neighbors, y cannot equal x. Let's redefine common_neighbor:

def delete:rel:catalog:model["id-common-neighbor-install"] = rel:catalog:model["id-common-neighbor-install"]

def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]], y != x

def output = common_neighbor[3]

Output (scroll through the output to see all the results):

Now let's count the number of common neighbors:

def output = count[common_neighbor[3]]

Output:

56? Whoops! That is not what we want. We got the count of all of them, we really want them grouped by neighbor, so let's count for y (the other node).

def output = count[common_neighbor[3, y]] for y

Output (scroll through the output to see all the results):

I think we may reuse that, so we should move it to a definition.

def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]

def output = count_of_common_neighbors[3]

Output (scroll through the output to see all the results):

What if we just want to get the count of common neighbors from Node 3 and say Node 4?

def output  = count_of_common_neighbors[3, 4]

Output:

We get the count of four since Node 1, Node 2, Node 8, and Node 14 are all neighbors of both Node 3 and Node 4.

Okay, but we still haven't answered the question which karate club member has the most friends in common with member 3?

It looks like we can use the max method to figure out the answer:

def output = max[common_neighbor[3]]

Output:

This output tells us that Node 3 has the most common neighbors with Node 33. Let's see them:

def output = common_neighbor[3,33]

Output:

Hum… that just shows us Node 9, but we know from earlier that Node 4 had nodes 1, 2, 8, and 14 as common neighbors with Node 3, so Node 33 cannot be the node with the most common neighbors of Node 3.

We are actually asking for the highest node with common neighbors, not the maximum count of common neighbors. For that we can use the definition we created earlier:

def output = max[count_of_common_neighbors[3]]

Output:

The query actually gives us the count of the most neighbors shared by another node, but we want to know which node it is. We have a different method in the standard library for that, called argmax.

def output = argmax[count_of_common_neighbors[3]]

Output:

Node 34! Let’s see the common neighbors between them:

def output = common_neighbor[3,34]

Output:

There are indeed six of them. How do I get both the node ID and the count of the common neighbors in the output at the same time? We will find out next time.

Adding Value Incrementally

The ability to build queries the way you build lego blocks, adding value incrementally by dividing and conquering hard queries a little at a time, is the greatest thing since sliced bread for query writers.

In part two of this four-part series we’re going to see three amazing features of the Rel language that make it better than sliced bread with peanut butter and jelly.

Oh yeah, and we’re not stopping there. In part three, you’ll see one more feature of Rel that makes it better than sliced bread with a giant heap of Nutella spread all over it.

Is that making you hungry? Do you want to get that sweet taste of Rel? Subscribe to our newsletter to learn when guided previews and general availability are announced.