Many take the phrase “good fences make good neighbors” to mean we should build barriers between us and the outside world. We build fences for protection, privacy and to establish boundaries. But that’s not at all the phrase means (opens in a new tab). It comes from the poem Mending Wall (opens in a new tab) by Robert Frost (opens in a new tab) where two neighbors meet every year to rebuild the fence on the borders of their properties.
It is not the mended fence that is important, but rather the coming together of the two neighbors to meet consistently and work together for a common purpose. It is the act of mending relations, not building fences, that matters.
It stands to reason then that the person with the most friends and neighbors has the opportunity and responsibility to maintain lots of relationships. Last time we left off after finding out which member of our example university Karate Club has the most friends in common with the member represented as Node 3, but before we jump in, lets recap the Rel code we’ve written so far:
// model def friendship(x, y) = edge(x, y) or edge(y, x) def neighbor(x, y) = friendship(x, y) def outdegree[x] = count[edge[x]] def indegree[x] = count[edge(y, x) for y] def degree[x] = count[neighbor[x]] def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]] , y != x def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]
We were able to see the node with the highest common neighbors using
argmax and the count of the common neighbors for that node using
Now, what we would like to see is the node ID and the count of common neighbors together. We can output both this way:
// read query def output = argmax[count_of_common_neighbors], max[count_of_common_neighbors]
That’s great, but having to repeat
count_of_common_neighbors feels wrong. How about relationship arguments? Why do I have to repeat myself? Why do I have to tell you multiple times? Why didn’t you listen to me the first time I told you? You never help out around the house. I am tired of your promises; you never change. Just who are you texting at 1am? This is all your fault. My mother was right about you! I hate you, I want a divorce.
Wait what? No, relationship arguments are bad! We don’t want relationship arguments, we want to be able to pass relations as arguments (opens in a new tab) so they can be reused. Turns out we can do that using an “inline” definition:
// read query @inline def arg_and_max[Relation] = argmax[Relation], max[Relation] def output = arg_and_max[count_of_common_neighbors]
[Sweet](https://www.youtube.com/watch?v=4QubDlidziA (opens in a new tab)! Pause here, take a breath. Wash away the flashbacks of fighting in your personal relationships. Now think about what this makes possible. Relations as arguments. We’re talking about meta programming (opens in a new tab) in a query language. This is amazing (opens in a new tab).
Ok, let’s try something else now. We can do two hops, but what about multiple and variable hops? Which nodes can we travel to starting from a node? We’ll use the power of recursion (opens in a new tab) to define “reachable” as being related to us by an edge or having an edge to something already reachable. Let’s see which nodes we can reach from Node 3:
// read query def reachable(x, y) = edge(x, y) or exists(t: edge(x, t) and reachable(t, y)) def output = reachable
That doesn’t look right. Oh of course! That’s because we have an undirected graph. We need to use
neighbor instead of
edge to travel from both incoming and outgoing edges!
// model def reachable_undirected(x, y) = neighbor(x, y) or exists(t: neighbor(x, t) and reachable_undirected(t, y))
// read query def output = reachable_undirected
There we go, now we have the right result. But we can actually shorten our definitions. In Rel we have two shortcuts for these types of queries. One is point-free syntax (opens in a new tab) which lets us omit the variables (
y, etc.) when their omission doesn’t lead to ambiguity… in other words, if they don’t make the query any clearer.
The second is relational composition (opens in a new tab), which lets us use the dot
. to join the last element of a relation with the first element of another relation so we can actually write our definitions like this:
def reachable = edge; reachable.edge def reachable_undirected = neighbor; reachable_undirected.neighbor
Take another pause here to appreciate this. We can define self-referencing recursive definitions. Both
reachable_undirected are just one line of code each. Recursion is incredibly easy to express in Rel. Think about all the graph queries and algorithms that can be easily expressed using recursion. It’s like being able to use the Five Point Palm Exploding Heart Technique (opens in a new tab) to defeat complex graph problems.
Now, back to the output of that query… it is a long list, how about we count them?
// read query def output = count[reachable_undirected]
We can reach every node in the graph as it is one big connected component (opens in a new tab), since the members were all in the same Karate Club. We should define that as well. Typically the connected component ID will be the lowest node ID in the set of connected nodes.
// model def connected_component[x] = min[reachable_undirected[x]]
// read query def output = connected_component
It is all 1s, since that is the lowest node ID of the entire graph. Here’s a visual of the Karate Club graph as one big component:
Let’s add some dummy nodes and a relationship between them to help us understand how the definition of connected components works. These nodes will connect with each other but not with any of the nodes in the real Karate Club graph.
// read query def node = 35;36;37;38 def edge = (35,36); (35,37); (36,38) def output = connected_component
This looks a little more interesting and helps us understand what’s going on. Node 1 through Node 34 are in component 1 and Node 35 through Node 38 are in component 35, because 1 and 35 are the lowest node ids in each set. Look back at how little code it took us to get here. We built it one step at a time.
Ok, what about similarities? Just how similar were the social relationships of the members of the Karate Club? Do we have any nodes that are practically mirror images of each other?
In the world of graphs, there are a bunch of similarity measures. We can start with cosine similarity (opens in a new tab), which measures the count of common neighbors divided by the square root of the product of the counts of each of the two nodes.
That’s a mouthful, and the Rel definition is almost the same as English:
// model def cosine_similarity[x, y] = count_of_common_neighbors[x, y] / sqrt[degree[x] * degree[y]]
// read query def output = cosine_similarity
This gives us the cosine similarity scores between each pair of nodes. We can also ask just for the scores of one node. Let’s try Node 3:
// read query def output = cosine_similarity
We can see that Node 3 is more similar to Node 4 than it is to Node 5. If we think about their common neighbors we can see that is true.
Let’s see which nodes are the most similar by sorting the similarity score in descending order. The documentation tells us we have a
reverse_sort method for that.
// read query def output = reverse_sort[cosine_similarity]
That doesn’t look right! It is sorted by the node ids in reverse order, not the similarity score. We need to tell it to do it by the last value in the relation. As luck would have it, Rel supports variable arguments (opens in a new tab) using
... which lets us define methods that take zero or more arguments.
We can also use an inline definition from another inlined definition. So instead of reverse sorting the relation as is, we will move the similarity score to the front, then we will reverse sort by it, then the first node, then the second node.
So our Rel code looks like this:
// model @inline def last_to_first[R](last_arg, other_args...) = R(other_args..., last_arg) @inline def reverse_sort_by_last[R](iterator, other_args..., last_arg) = reverse_sort[last_to_first[R]](iterator, last_arg, other_args...)
// read query def output = reverse_sort_by_last[cosine_similarity]
The order of the elements on the left side of the equal sign is the output, and the elements on the right side are the input. The
last_to_first definition takes the value in the last element and moves it to the front.
(1, 2, 3.5);(3, 8, 4.1) becomes
(3.5, 1, 2);(4.1, 3, 8).
reverse_sort_by_last definition applies
last_to_first to the relation being passed in (which we are calling
R and will have the values of cosine_similarity) and then performs the
reverse_sort on the output of last_to_first` and moves the values of the tuples again.
The values in the
last_arg get moved back to the end. The output is the iterator (created by
reverse_sort) followed by the first two values, which were captured by
other_args… and then the last value in
last_arg, which is our similarity score.
Now we can see Node 21 and Node 23 have a perfect cosine similarity score of 1.0. They are both related to Node 33 and Node 34 and nothing else.
The same goes for Nodes 15, 16, 18, and 19. Coming in last are Node 3 and Node 33 with a similarity score around 0.09 because they only have one neighbor in common: Node 9.
Let’s do one more, jaccard similarity (opens in a new tab). This is the ratio of the intersection over the union of the neighbors of two nodes. We need the count of the common neighbors divided by the sum of the counts of each node minus the count of common neighbors.
We already know that the count of neighbors is just the degree of a node, so let’s reuse that again. This time let’s just look at the similarity scores for Node 3:
// model def jaccard_sim[x, y] = count_of_common_neighbors[x, y] / (degree[x] + degree[y] - count_of_common_neighbors[x, y])
// read query def output = reverse_sort_by_last[jaccard_sim]
Turns out Node 3 has the highest similarity to Node 4, then Node 34 and then both Node 31 and Node 8.
To recap, today we learned that Rel supports relations as arguments, recursion, variable arguments and inlined definitions that reference other inlined definitions.
These features give you tremendous power so you can express your business logic clearly and concisely. Power that was typically reserved for procedural languages is now available in your models and queries. Rel is a pretty amazing language, and we’re not done yet.
Next time we will look at triangles and cliques and introduce more mind-blowing features of Rel. Be sure to subscribe to our newsletter to learn what they are!
Playing With Legos
Do you remember playing with legos as a kid? This is what our query language Rel feels like. You don’t have to write 300 lines of SQL code and go absolutely insane trying to debug or handle it all in your head. You make progress with every little definition. You can divide and conquer the problem by building small pieces, combining them together and creating something that is greater than the sum of its parts.
Tic-Tac-Toe: Knowledge Extraction From Imperative Code
Applications implemented in imperative programming languages encode vast amounts of human knowledge at their core, but the essence of the application logic is often obscured by the imperative nature of the implementation language.