THIS POST IS ARCHIVED

Wednesday, August 3, 2022

Good Neighbors

Many take the phrase “good fences make good neighbors” to mean we should build barriers between us and the outside world. We build fences for protection, privacy and to establish boundaries. But that’s not at all the phrase means. It comes from the poem Mending Wall by Robert Frost where two neighbors meet every year to rebuild the fence on the borders of their properties.

It is not the mended fence that is important, but rather the coming together of the two neighbors to meet consistently and work together for a common purpose. It is the act of mending relations, not building fences, that matters.

It stands to reason then that the person with the most friends and neighbors has the opportunity and responsibility to maintain lots of relationships. Last time we left off after finding out which member of our example university Karate Club has the most friends in common with the member represented as Node 3, but before we jump in, lets recap the Rel code we’ve written so far:

def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}

def friendship(x, y) = edge(x, y) or edge(y, x)
def neighbor(x, y) = friendship(x, y)
def outdegree[x] = count[edge[x]]
def indegree[x] = count[edge(y, x) for y]
def degree[x] = count[neighbor[x]]
def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]] , y != x
def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]

We were able to see the node with the highest common neighbors using argmax and the count of the common neighbors for that node using max.

Now, what we would like to see is the node ID and the count of common neighbors together. We can output both this way:

def output = argmax[count_of_common_neighbors[3]], max[count_of_common_neighbors[3]]

That's great, but having to repeat count_of_common_neighbors[3] feels wrong. How about relationship arguments? Why do I have to repeat myself? Why do I have to tell you multiple times? Why didn’t you listen to me the first time I told you? You never help out around the house. I am tired of your promises; you never change. Just who are you texting at 1am? This is all your fault. My mother was right about you! I hate you, I want a divorce.

Wait what? No, relationship arguments are bad! We don’t want relationship arguments, we want to be able to pass relations as arguments so they can be reused. Turns out we can do that using an “inline” definition:

@inline
def arg_and_max[Relation] = argmax[Relation], max[Relation]

def output = arg_and_max[count_of_common_neighbors[3]]

[Sweet](https://www.youtube.com/watch?v=4QubDlidziA! Pause here, take a breath. Wash away the flashbacks of fighting in your personal relationships. Now think about what this makes possible. Relations as arguments. We’re talking about meta programming in a query language. This is amazing.

Ok, let's try something else now. We can do two hops, but what about multiple and variable hops? Which nodes can we travel to starting from a node? We’ll use the power of recursion to define “reachable” as being related to us by an edge or having an edge to something already reachable. Let’s see which nodes we can reach from Node 3:

def reachable(x, y) = edge(x, y)
                      or exists(t: edge(x, t) and reachable(t, y))
def output = reachable[3]

That doesn't look right. Oh of course! That’s because we have an undirected graph. We need to use neighbor instead of edge to travel from both incoming and outgoing edges!

def reachable_undirected(x, y) = neighbor(x, y) or
                                 exists(t: neighbor(x, t) and reachable_undirected(t, y))

def output = reachable_undirected[3]

There we go, now we have the right result. But we can actually shorten our definitions. In Rel we have two shortcuts for these types of queries. One is point-free syntax which lets us omit the variables (x, y, etc.) when their omission doesn’t lead to ambiguity… in other words, if they don’t make the query any clearer.

The second is relational composition, which lets us use the dot . to join the last element of a relation with the first element of another relation so we can actually write our definitions like this:

def reachable = edge; reachable.edge
def reachable_undirected = neighbor; reachable_undirected.neighbor

Take another pause here to appreciate this. We can define self-referencing recursive definitions. Both reachable and reachable_undirected are just one line of code each. Recursion is incredibly easy to express in Rel. Think about all the graph queries and algorithms that can be easily expressed using recursion. It’s like being able to use the Five Point Palm Exploding Heart Technique to defeat complex graph problems.

Now, back to the output of that query… it is a long list, how about we count them?

def output = count[reachable_undirected[3]]

We can reach every node in the graph as it is one big connected component, since the members were all in the same Karate Club. We should define that as well. Typically the connected component ID will be the lowest node ID in the set of connected nodes.

def connected_component[x] = min[reachable_undirected[x]]

def output = connected_component

It is all 1s, since that is the lowest node ID of the entire graph. Here’s a visual of the Karate Club graph as one big component:

Image courtesy of Wikimedia Foundation

Let's add some dummy nodes and a relationship between them to help us understand how the definition of connected components works. These nodes will connect with each other but not with any of the nodes in the real Karate Club graph.

def node = 35;36;37;38
def edge = (35,36); (35,37); (36,38)
def output = connected_component

This looks a little more interesting and helps us understand what's going on. Node 1 through Node 34 are in component 1 and Node 35 through Node 38 are in component 35, because 1 and 35 are the lowest node ids in each set. Look back at how little code it took us to get here. We built it one step at a time.

Ok, what about similarities? Just how similar were the social relationships of the members of the Karate Club? Do we have any nodes that are practically mirror images of each other?

In the world of graphs, there are a bunch of similarity measures. We can start with cosine similarity, which measures the count of common neighbors divided by the square root of the product of the counts of each of the two nodes.

That’s a mouthful, and the Rel definition is almost the same as English:

def cosine_similarity[x, y]  = count_of_common_neighbors[x, y] /
  sqrt[degree[x] * degree[y]]

def output = cosine_similarity

This gives us the cosine similarity scores between each pair of nodes. We can also ask just for the scores of one node. Let’s try Node 3:

def output = cosine_similarity[3]

We can see that Node 3 is more similar to Node 4 than it is to Node 5. If we think about their common neighbors we can see that is true.

Let’s see which nodes are the most similar by sorting the similarity score in descending order. The documentation tells us we have a reverse_sort method for that.

def output = reverse_sort[cosine_similarity]

That doesn’t look right! It is sorted by the node ids in reverse order, not the similarity score. We need to tell it to do it by the last value in the relation. As luck would have it, Rel supports variable arguments using ... which lets us define methods that take zero or more arguments.

We can also use an inline definition from another inlined definition. So instead of reverse sorting the relation as is, we will move the similarity score to the front, then we will reverse sort by it, then the first node, then the second node.

So our Rel code looks like this:

@inline
def last_to_first[R](last_arg, other_args...) = R(other_args..., last_arg)

@inline
def reverse_sort_by_last[R](iterator, other_args..., last_arg) =
    reverse_sort[last_to_first[R]](iterator, last_arg, other_args...)

def output = reverse_sort_by_last[cosine_similarity]

The order of the elements on the left side of the equal sign is the output, and the elements on the right side are the input. The last_to_first definition takes the value in the last element and moves it to the front.

So (1, 2, 3.5);(3, 8, 4.1) becomes (3.5, 1, 2);(4.1, 3, 8).

The reverse_sort_by_last definition applies last_to_first to the relation being passed in (which we are calling R and will have the values of cosine_similarity) and then performs the reverse_sort on the output of last_to_first` and moves the values of the tuples again.

The values in the last_arg get moved back to the end. The output is the iterator (created by reverse_sort) followed by the first two values, which were captured by other_args… and then the last value in last_arg, which is our similarity score.

Now we can see Node 21 and Node 23 have a perfect cosine similarity score of 1.0. They are both related to Node 33 and Node 34 and nothing else.

The same goes for Nodes 15, 16, 18, and 19. Coming in last are Node 3 and Node 33 with a similarity score around 0.09 because they only have one neighbor in common: Node 9.

Let’s do one more, jaccard similarity. This is the ratio of the intersection over the union of the neighbors of two nodes. We need the count of the common neighbors divided by the sum of the counts of each node minus the count of common neighbors.

We already know that the count of neighbors is just the degree of a node, so let's reuse that again. This time let’s just look at the similarity scores for Node 3:

def jaccard_sim[x, y] = count_of_common_neighbors[x, y] / (degree[x] + degree[y] - count_of_common_neighbors[x, y])

def output = reverse_sort_by_last[jaccard_sim[3]]

Turns out Node 3 has the highest similarity to Node 4, then Node 34 and then both Node 31 and Node 8.

To recap, today we learned that Rel supports relations as arguments, recursion, variable arguments and inlined definitions that reference other inlined definitions.

These features give you tremendous power so you can express your business logic clearly and concisely. Power that was typically reserved for procedural languages is now available in your models and queries. Rel is a pretty amazing language, and we’re not done yet.

Next time we will look at triangles and cliques and introduce more mind-blowing features of Rel. Be sure to subscribe to our newsletter to learn what they are!