Join us at Snowflake Summit June 26-29 in Las Vegas!

Playing With Legos

Max De Marzi

26 July 2022

5 min read

Do you remember playing with legos as a kid? This is what our query language Rel feels like. You don’t have to write 300 lines of SQL code and go absolutely insane trying to debug or handle it all in your head. You make progress with every little definition. You can divide and conquer the problem by building small pieces, combining them together and creating something that is greater than the sum of its parts.

When I used to write long SQL queries for reports or complex business logic, I would get so jealous that other developers could just refactor 10 lines into a method whenever they wanted but I had to keep this monstrosity of a query all in my head. They had the audacity to complain that somebody interrupted them and ruined their “flow”.

Do you know how long it takes to rebuild 300 lines of SQL code in your head after somebody interrupts you to ask if you want to try that new Italian/Sushi Fusion restaurant that just opened up last week? No Jim, I don’t want seaweed wrapped meatballs topped with fish roe…ever.

Anyway, Rel brings the advantages of procedural programming into the query language. It is absolutely without a doubt, the greatest thing since sliced bread for query writers, and I’m going to show you why.

Our task will be to represent a graph dataset in Rel and ask some basic questions about it.

Zachary's Karate Graph

In the world of Graphs, Zachary’s Karate Graph is considered a classic.

It represents the social network of a university Karate Club studied by Wayne W. Zachary about 50 years ago. It captures the personal relationships of 34 members of the club and the split that occurred between them when the club administrator and the instructor had a falling out.

Let’s build up the graph, piece by piece. The members of the karate club will be represented by nodes. Since there are just 34 of them we could type them out, but instead we’ll use a method from the standard library called range. It looks like:

// Generate a relation with integer values between low and high, inclusive, advancing by stride each time.

range(low, hi, stride, x)

Let’s create them and output the count just to make sure we are starting things off correctly.

def number_of_members = 34
def node = range[1, number_of_members, 1] 
def output = count[node] 

Output:

#Int64
134

Now we need to create the relationships between these club members. I copied the dataset on the Wikipedia page replacing spaces with commas, brackets with parentheses, added semicolons, and enclosed them inside curly brackets to match Rel’s syntax:

def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}

def output = count[edge] 

Output:

#Int64
178

The count says there are 78 relationships and that is what we expected. Let’s take a look at the edges of one of the nodes.

Discovering Relationships

To get the social relations of the karate club member designated as Node 3 we simply ask for them using edge[3]:

def output = edge[3]

Output:

#Int64
11
22

If we scroll back up to the edge definition, we can see that indeed Node 3 is related to Node 1 and Node 2. However, the relations between members are supposed to be friendships and should be undirected. Let’s fix that and try outputting the friendships of Node 3.

def friendship(x, y) = edge(x, y) or edge(y, x) 
def output = friendship[3]

Output:

#Int64
11
22
34
48
59
610
714
828
929
1033

Now we can see that Node 3 has 10 friendships with other members.

When talking about graphs, the idiom is to call these nodes “neighbors”, so we will stick with tradition and make an alias for neighbor that is just “friendship”.

def neighbor(x, y) = friendship(x, y) 
def output = neighbor[3]

Output:

#Int64
11
22
34
48
59
610
714
828
929
1033

We have the same 10 results. Remember that for a moment as we ask how many are outgoing relationships and how many are incoming relationships.

Degrees

We call the count of the relationships from and to each node the degree of a node. We can define the “outdegree” of a node as the count of the number of edges that begin with that node, since these are going outwards to other nodes:

def outdegree[x] = count[edge[x]]
def output = outdegree[3]

Output:

#Int64
12

These are the edges connecting to Node 1 and Node 2 that we queried before. Now let’s try incoming relationships. That should just be the count of the edges coming to us from other nodes.

The right way to write this query is by using for. This is like a “for loop” in procedural languages. Now we get the edge relations including x and y and count those that have 3 for x:

def indegree[x] = count[edge(y, x) for y]
def output = indegree[3]

Output:

#Int64
18

There we go! That works as we expected.

How about the count of all the relationships regardless of direction? For that we just need the degree of a node, and we can simply add the outdegree and the indegree we wrote earlier:

def degree[x] = outdegree[x] + indegree[x]
def output = degree[3]

Output:

#Int64
110

The way we can build definitions by combining other definitions is pretty awesome!

We could also take advantage of the neighbor definition we wrote earlier, which looks clearer.

The degree of a node is the count of the neighbors of a node:

def degree[x] = count[neighbor[x]]
def output = degree[3]

Output:

#Int64
110

Two-Hop Relationships

So far, we’ve just been talking about single-hop relationships. Let’s try two hops and get the common neighbors between two nodes using intersect. It does exactly what you think it does. It takes two sets and returns the things that are in both of them.

def common_neighbor[x, y] = intersect[neighbor[x], neighbor[y]]
def output = common_neighbor[3,18]

Output:

#Int64
11
22

Node 1 and Node 2 are indeed the neighbors of both Node 3 and Node 18. I wonder which node has the most common neighbors with Node 3? Let's first get all the common neighbors for Node 3.

def output = common_neighbor[3]

Output (additional results omitted):

#Int64Int64
112
214
318
419
5114
621
724
828
9214
1031
1132

Wait a minute.  We do not want to see Node 3 here. Having common neighbors with ourselves doesn’t make sense, so let's get rid of Node 3 as a possible other node y by adding a where condition to our query that says we don’t want common neighbors with ourselves.

def output = common_neighbor[3,y] for y where y != 3

Output (additional results omitted):

#Int64Int64
112
214
318
419
5114
621
724
828
9214
1041
1142

That looks better but we shouldn't have to do this, it should be taken care of by the definition of common neighbors, so let's change it there.

Common Neighbors

When computing common neighbors, y cannot equal x:

def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]], y != x
def output = common_neighbor[3] 

Output (additional results omitted):

#Int64Int64
112
214
318
418
5114
621
724
828
9214
1041
1142

Now let's count the number of common neighbors:

def output = count[common_neighbor[3]]  

Output:

#Int64
156

56? Whoops! That is not what we want. We got the count of all of them, we really want them grouped by neighbor, so let's count for y (the other node).

def output = count[common_neighbor[3, y]] for y 

Output:

#Int64Int64
115
224
344
451

I think we may reuse that, so we should move it to a definition.

def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]
def output = count_of_common_neighbors[3]  

Output:

#Int64Int64
115
224
344
451

What if we just want to get the count of common neighbors from Node 3 and say Node 4?

def output  = count_of_common_neighbors[3, 4] 

Output:

#Int64
14

We get the count of four since Node 1, Node 2, Node 8, and Node 14 are all neighbors of both Node 3 and Node 4.

Okay, but we still haven't answered the question which karate club member has the most friends in common with member 3?

It looks like we can use the max method to figure out the answer:

def output = max[common_neighbor[3]]

Output:

#Int64
133

This output tells us that Node 3 has the most common neighbors with Node 33. Let's see them:

def output = common_neighbor[3,33]

Output:

#Int64
19

Hum… that just shows us Node 9, but we know from earlier that Node 4 had nodes 1, 2, 8, and 14 as common neighbors with Node 3, so Node 33 cannot be the node with the most common neighbors of Node 3.

We are actually asking for the highest node with common neighbors, not the maximum count of common neighbors. For that we can use the definition we created earlier:

def output = max[count_of_common_neighbors[3]] 

Output:

#Int64
16

The query actually gives us the count of the most neighbors shared by another node, but we want to know which node it is. We have a different method in the standard library for that, called argmax.

def output = argmax[count_of_common_neighbors[3]]

Output:

#Int64
134

Node 34! Let’s see the common neighbors between them:

def output = common_neighbor[3,34]

Output:

#Int64
19
210
314
428
529
633

There are indeed six of them. How do I get both the node ID and the count of the common neighbors in the output at the same time? We will find out next time.

Adding Value Incrementally

The ability to build queries the way you build lego blocks, adding value incrementally by dividing and conquering hard queries a little at a time, is the greatest thing since sliced bread for query writers.

In part two of this four-part series we’re going to see three amazing features of the Rel language that make it better than sliced bread with peanut butter and jelly.

Oh yeah, and we’re not stopping there. In part three, you’ll see one more feature of Rel that makes it better than sliced bread with a giant heap of Nutella spread all over it.

Is that making you hungry? Do you want to get that sweet taste of Rel? Subscribe to our newsletter to learn when guided previews and general availability are announced.


Related Posts

Get Early Access

Join our community, keep up to date with the latest developments in our monthly newsletter, and get early access to RelationalAI.