Join us at Snowflake Summit June 26-29 in Las Vegas!

Max De Marzi

26 July 2022

5 min read

Do you remember playing with legos as a kid? This is what our query language Rel feels like. You don’t have to write 300 lines of SQL code and go absolutely insane trying to debug or handle it all in your head. You make progress with every little definition. You can divide and conquer the problem by building small pieces, combining them together and creating something that is greater than the sum of its parts.

When I used to write long SQL queries for reports or complex business logic, I would get so jealous that other developers could just refactor 10 lines into a method whenever they wanted but I had to keep this monstrosity of a query all in my head. They had the audacity to complain that somebody interrupted them and ruined their “flow”.

Do you know how long it takes to rebuild 300 lines of SQL code in your head after somebody interrupts you to ask if you want to try that new Italian/Sushi Fusion restaurant that just opened up last week? No Jim, I don’t want seaweed wrapped meatballs topped with fish roe…ever.

Anyway, Rel brings the advantages of procedural programming into the query language. It is absolutely without a doubt, the greatest thing since sliced bread for query writers, and I’m going to show you why.

Our task will be to represent a graph dataset in Rel and ask some basic questions about it.

In the world of Graphs, Zachary’s Karate Graph is considered a classic.

It represents the social network of a university Karate Club studied by Wayne W. Zachary about 50 years ago. It captures the personal relationships of 34 members of the club and the split that occurred between them when the club administrator and the instructor had a falling out.

Let’s build up the graph, piece by piece. The members of the karate club will be represented by nodes. Since there are just 34 of them we could type them out, but instead we’ll use a method from the standard library called range. It looks like:

```
// Generate a relation with integer values between low and high, inclusive, advancing by stride each time.
range(low, hi, stride, x)
```

Let’s create them and output the count just to make sure we are starting things off correctly.

```
def number_of_members = 34
def node = range[1, number_of_members, 1]
def output = count[node]
```

Output:

# | Int64 |

1 | 34 |

Now we need to create the relationships between these club members. I copied the dataset on the Wikipedia page replacing spaces with commas, brackets with parentheses, added semicolons, and enclosed them inside curly brackets to match Rel’s syntax:

```
def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}
def output = count[edge]
```

Output:

# | Int64 |

1 | 78 |

The count says there are 78 relationships and that is what we expected. Let’s take a look at the edges of one of the nodes.

To get the social relations of the karate club member designated as Node 3 we simply ask for them using edge[3]:

`def output = edge[3]`

Output:

# | Int64 |

1 | 1 |

2 | 2 |

If we scroll back up to the edge definition, we can see that indeed Node 3 is related to Node 1 and Node 2. However, the relations between members are supposed to be friendships and should be undirected. Let’s fix that and try outputting the friendships of Node 3.

```
def friendship(x, y) = edge(x, y) or edge(y, x)
def output = friendship[3]
```

Output:

# | Int64 |

1 | 1 |

2 | 2 |

3 | 4 |

4 | 8 |

5 | 9 |

6 | 10 |

7 | 14 |

8 | 28 |

9 | 29 |

10 | 33 |

Now we can see that Node 3 has 10 friendships with other members.

When talking about graphs, the idiom is to call these nodes “neighbors”, so we will stick with tradition and make an alias for neighbor that is just “friendship”.

```
def neighbor(x, y) = friendship(x, y)
def output = neighbor[3]
```

Output:

# | Int64 |

1 | 1 |

2 | 2 |

3 | 4 |

4 | 8 |

5 | 9 |

6 | 10 |

7 | 14 |

8 | 28 |

9 | 29 |

10 | 33 |

We have the same 10 results. Remember that for a moment as we ask how many are outgoing relationships and how many are incoming relationships.

We call the count of the relationships from and to each node the degree of a node. We can define the “outdegree” of a node as the count of the number of edges that begin with that node, since these are going outwards to other nodes:

```
def outdegree[x] = count[edge[x]]
def output = outdegree[3]
```

Output:

# | Int64 |

1 | 2 |

These are the edges connecting to Node 1 and Node 2 that we queried before. Now let’s try incoming relationships. That should just be the count of the edges coming to us from other nodes.

The right way to write this query is by using for. This is like a “for loop” in procedural languages. Now we get the edge relations including `x`

and `y`

and count those that have `3`

for `x`

:

```
def indegree[x] = count[edge(y, x) for y]
def output = indegree[3]
```

Output:

# | Int64 |

1 | 8 |

There we go! That works as we expected.

How about the count of all the relationships regardless of direction? For that we just need the degree of a node, and we can simply add the outdegree and the indegree we wrote earlier:

```
def degree[x] = outdegree[x] + indegree[x]
def output = degree[3]
```

Output:

# | Int64 |

1 | 10 |

The way we can build definitions by combining other definitions is pretty awesome!

We could also take advantage of the neighbor definition we wrote earlier, which looks clearer.

The degree of a node is the count of the neighbors of a node:

```
def degree[x] = count[neighbor[x]]
def output = degree[3]
```

Output:

# | Int64 |

1 | 10 |

So far, we’ve just been talking about single-hop relationships. Let’s try two hops and get the common neighbors between two nodes using intersect. It does exactly what you think it does. It takes two sets and returns the things that are in both of them.

```
def common_neighbor[x, y] = intersect[neighbor[x], neighbor[y]]
def output = common_neighbor[3,18]
```

Output:

# | Int64 |

1 | 1 |

2 | 2 |

Node 1 and Node 2 are indeed the neighbors of both Node 3 and Node 18. I wonder which node has the most common neighbors with Node 3? Let's first get all the common neighbors for Node 3.

`def output = common_neighbor[3]`

Output (additional results omitted):

# | Int64 | Int64 |

1 | 1 | 2 |

2 | 1 | 4 |

3 | 1 | 8 |

4 | 1 | 9 |

5 | 1 | 14 |

6 | 2 | 1 |

7 | 2 | 4 |

8 | 2 | 8 |

9 | 2 | 14 |

10 | 3 | 1 |

11 | 3 | 2 |

Wait a minute. We do not want to see Node 3 here. Having common neighbors with ourselves doesn’t make sense, so let's get rid of Node 3 as a possible other node `y`

by adding a `where`

condition to our query that says we don’t want common neighbors with ourselves.

`def output = common_neighbor[3,y] for y where y != 3`

Output (additional results omitted):

# | Int64 | Int64 |

1 | 1 | 2 |

2 | 1 | 4 |

3 | 1 | 8 |

4 | 1 | 9 |

5 | 1 | 14 |

6 | 2 | 1 |

7 | 2 | 4 |

8 | 2 | 8 |

9 | 2 | 14 |

10 | 4 | 1 |

11 | 4 | 2 |

That looks better but we shouldn't have to do this, it should be taken care of by the definition of common neighbors, so let's change it there.

When computing common neighbors, `y`

cannot equal `x`

:

```
def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]], y != x
def output = common_neighbor[3]
```

Output (additional results omitted):

# | Int64 | Int64 |

1 | 1 | 2 |

2 | 1 | 4 |

3 | 1 | 8 |

4 | 1 | 8 |

5 | 1 | 14 |

6 | 2 | 1 |

7 | 2 | 4 |

8 | 2 | 8 |

9 | 2 | 14 |

10 | 4 | 1 |

11 | 4 | 2 |

Now let's count the number of common neighbors:

`def output = count[common_neighbor[3]] `

Output:

# | Int64 |

1 | 56 |

56? Whoops! That is not what we want. We got the count of all of them, we really want them grouped by neighbor, so let's count for `y `

(the other node).

`def output = count[common_neighbor[3, y]] for y `

Output:

# | Int64 | Int64 |

1 | 1 | 5 |

2 | 2 | 4 |

3 | 4 | 4 |

4 | 5 | 1 |

I think we may reuse that, so we should move it to a definition.

```
def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]
def output = count_of_common_neighbors[3]
```

Output:

# | Int64 | Int64 |

1 | 1 | 5 |

2 | 2 | 4 |

3 | 4 | 4 |

4 | 5 | 1 |

What if we just want to get the count of common neighbors from Node 3 and say Node 4?

`def output = count_of_common_neighbors[3, 4] `

Output:

# | Int64 |

1 | 4 |

We get the count of four since Node 1, Node 2, Node 8, and Node 14 are all neighbors of both Node 3 and Node 4.

Okay, but we still haven't answered the question which karate club member has the most friends in common with member 3?

It looks like we can use the max method to figure out the answer:

`def output = max[common_neighbor[3]]`

Output:

# | Int64 |

1 | 33 |

This output tells us that Node 3 has the most common neighbors with Node 33. Let's see them:

`def output = common_neighbor[3,33]`

Output:

# | Int64 |

1 | 9 |

Hum… that just shows us Node 9, but we know from earlier that Node 4 had nodes 1, 2, 8, and 14 as common neighbors with Node 3, so Node 33 cannot be the node with the most common neighbors of Node 3.

We are actually asking for the highest node with common neighbors, not the maximum count of common neighbors. For that we can use the definition we created earlier:

`def output = max[count_of_common_neighbors[3]] `

Output:

# | Int64 |

1 | 6 |

The query actually gives us the count of the most neighbors shared by another node, but we want to know which node it is. We have a different method in the standard library for that, called argmax.

`def output = argmax[count_of_common_neighbors[3]]`

Output:

# | Int64 |

1 | 34 |

Node 34! Let’s see the common neighbors between them:

`def output = common_neighbor[3,34]`

Output:

# | Int64 |

1 | 9 |

2 | 10 |

3 | 14 |

4 | 28 |

5 | 29 |

6 | 33 |

There are indeed six of them. How do I get both the node ID and the count of the common neighbors in the output at the same time? We will find out next time.

The ability to build queries the way you build lego blocks, adding value incrementally by dividing and conquering hard queries a little at a time, is the greatest thing since sliced bread for query writers.

In part two of this four-part series we’re going to see three amazing features of the Rel language that make it better than sliced bread with peanut butter and jelly.

Oh yeah, and we’re not stopping there. In part three, you’ll see one more feature of Rel that makes it better than sliced bread with a giant heap of Nutella spread all over it.

Is that making you hungry? Do you want to get that sweet taste of Rel? Subscribe to our newsletter to learn when guided previews and general availability are announced.

RelationalAI's declarative modeling language Rel can be a powerful tool for machine learning data preprocessing. It is concise, readable, and facilitates testing and debugging in development. Rel can significantly simplify your machine learning data pipeline.

Read MoreIn this series of blog posts, we show how to implement neighborhood-based, graph-based, content-based, and hybrid recommender systems using RelationalAI’s declarative modeling language, Rel. The implementations of these algorithms demonstrate the efficiency of our Relational Knowledge Graph System (RKGS) for aggregating over paths on large and sparse graphs, computing similarities using our graph analytics library, and building compact, easy to read models, without needing to transfer data outside of our system.

Read More