THIS POST IS ARCHIVED

Wednesday, August 17, 2022

Understanding Our World

“Walking with a friend in the dark is better than walking alone in the light” is a saying attributed to Helen Keller… someone who spent her life in silent darkness as she was both deaf and blind. Anne Sullivan was her lifelong companion and Miracle Worker who taught her language, reading, and writing. Anne was the friend in the darkness we all hope to have, someone who can help us understand the world. Today we will continue to shine a light on the friendships of the Zachary Karate Club Graph to learn more about the Rel language.

Last time we found all the nodes reachable from any one node, found the connected components in our graph and calculated cosine and jaccard similarities.

Today we'll look at the nuances of the social relations of the members of the Karate Club. We will explore one simple and one complex structure in the graph.

Before we continue, I must warn you: You will stare at five lines of code for an awfully long time. It will melt your brain, as it did mine. I had to scrape it off the floor with an ice cream scoop…but once you understand this code, the hidden power of Rel will be revealed.

Finding Triangles

Let’s start off with triangles by finding three nodes that are all connected together. We can define this method as three sets of neighbor relations. We can also add a clause to force our triangles to be in node ID order and eliminate out of order “duplicates”:

def edge = {
(2,1);
(3,1);(3,2);
(4,1);(4,2);(4,3);
(5,1);
(6,1);
(7,1);(7,5);(7,6);
(8,1);(8,2);(8,3);(8,4);
(9,1);(9,3);
(10,3);
(11,1);(11,5);(11,6);
(12,1);
(13,1);(13,4);
(14,1);(14,2);(14,3);(14,4);
(17,6);(17,7);
(18,1);(18,2);
(20,1);(20,2);
(22,1);(22,2);
(26,24);(26,25);
(28,3);(28,24);(28,25);
(29,3);
(30,24);(30,27);
(31,2);(31,9);
(32,1);(32,25);(32,26);(32,29);
(33,3);(33,9);(33,15);(33,16);(33,19);(33,21);(33,23);(33,24);(33,30);(33,31);(33,32);
(34,9);(34,10);(34,14);(34,15);(34,16);(34,19);(34,20);(34,21);(34,23);(34,24);(34,27);(34,28);(34,29);(34,30);(34,31);(34,32);(34,33)
}
def friendship(x, y) = edge(x, y) or edge(y, x)
def neighbor(x, y) = friendship(x, y)
def outdegree[x] = count[edge[x]]
def indegree[x] = count[edge(y, x) for y]
def degree[x] = count[neighbor[x]]
def common_neighbor[x,y] = intersect[neighbor[x], neighbor[y]] , y != x
def count_of_common_neighbors[x, y] = count[common_neighbor[x, y]]

def triangle(a, b, c) = neighbor(a, b) and neighbor(a, c) and neighbor(b, c) and a < b < c

def output = triangle

It found 45 triangle friendships between our members.

What if we want to see all the triangles that include the member represented by Node 3?

We have three elements, a, b and c, that are part of a triangle. Using the logical operation for true () we can make sure that Node 3 is one of them. The Rel code is simpler than the written version, see for yourself:

def output = a, b, c : triangle(a,b,c) and {a;b;c}(3)

We can also turn this into a method “node triangle” by replacing the 3 with a placeholder x to be filled in later and get the same output.

def node_triangle(x, a, b, c) = triangle(a,b,c) and {a;b;c}(x)

def output = node_triangle[3]

Demand Transformation

Now you may be wondering… are we finding the node triangles for all of the nodes in the graph first and then filtering for Node 3, or are we only finding the node triangles for Node 3?

This leads us into the topic of demand transformation, which is a fancy term for turning infinite calculations into finite calculations. If you squint your eyes you can kind of think of node_triangle as a database “view”.

Then the question becomes, should you materialize this view or not? To materialize a view means to compute and store the results. It isn’t a big deal for our small karate graph, but think about what happens when you have 3 billion nodes and 20 billion relationships.

Demand transformation is equivalent to materializing only the parts of the view we want, and not the entire thing.

RAI is able to figure out this answer for some queries today, more with every release, and hopefully all very soon. However if you want to force the system to only compute what you ask for now, you can add the @ondemand annotation to our method:

def delete:rel:catalog:model["triangle3-x-install"] = rel:catalog:model["triangle3-x-install"]

@ondemand
def node_triangle(x, a, b, c) = triangle(a,b,c) and {a;b;c}(x)

def output = node_triangle[3]

These annotations are not a hint or suggestion, they are a command and will force the compiler to do what you ask.

We could also ask that a “view” is not created at all by using the @inline annotations we’ve seen before. This will simply copy and paste the code in the definition whenever it is used instead of actually building a definition.

As users, we typically don't want to worry about these kinds of details, but it is nice to know we have ways to tell the system what we want the compiler to do.

Cliques

Now how about Cliques? Do you remember high school or secondary school when you were a teenager? Where a bunch of small groups of kids with shared interests hung out together pretty much exclusively?

Think about the movie Mean Girls or The Craft. Think about Grease if you like musicals or Carrie if horror is more your thing. Those movies have great examples of cliques.

Let's find cliques with four members in our graph. We can define a “four clique” as four nodes that are all neighbors with each other:

def four_cliques(a, b, c, d) = neighbor(a, b) and
  neighbor(a, c) and
  neighbor(a, d) and
  neighbor(b, c) and
  neighbor(b, d) and
  neighbor(c, d) and
  a < b < c < d

def output = four_cliques

That definition is quite long. What if we reuse the triangles definition to shorten it a bit? We can say that a clique of four members starts as a triangle then adds a fourth member that is neighbors with all three members of the triangle like this:

def delete:rel:catalog:model["four-cliques-install"] = rel:catalog:model["four-cliques-install"]

def four_cliques(a, b, c, d) = triangle(a, b, c) and
  neighbor(a, d) and
  neighbor(b, d) and
  neighbor(c, d) and
  c < d

def output = four_cliques

We get the same output as above.

What about cliques with five members? Well, that's just a clique of four with an additional member who is friends with everyone in the group:

def five_cliques(a, b, c, d, e) =
  four_cliques(a, b, c, d) and
  neighbor(a, e) and
  neighbor(b, e) and
  neighbor(c, e) and
  neighbor(d, e) and
  d < e

def output = five_cliques

It’s starting to get a bit out of hand. What would be really nice is for the language to let us express what a clique is, and not have to add one level at a time.

Rel lets us do just that using entities.

Entities

Entities are not scary apparitions. They are a tool in the language that lets us define a “thing”.

So what exactly is this clique thing then? We can think of a clique as an entity in our graph where every node must have a relationship with all the other nodes in that clique.

We want to be able to build cliques of different sizes, so we will start at the very beginning with size zero. This is an empty clique, which we will allow by setting it to true.

Next, we need a way to add members to a clique. In order for this to be allowed, the new member must be connected to all the other members.

// model

entity Clique nil = true
entity Clique add_to_clique(member, members) = connected(member, members)

Next, we need to define what it means to be "connected".

Let’s start with the simplest case. We will allow a new node to be added to an empty clique so we can grow the size of our clique from zero members to one member.

So let’s restrict this baseline definition to force the new member to be a node and members to be nil. Since there is only one node, there aren’t any nodes to be related to, so our definition is just:

// model

def connected(new_member, members) = node(new_member) and nil(members)

With that out of the way, we will create another definition for connected when the clique is not empty.

First, the new member must be a neighbor of the last member to be added to the clique.

Second, the new member must have a node ID that is greater than the node ID of the last member (so we don’t get duplicate cliques in various orders).

Third, we need to perform this check recursively against the other members of this clique except the one we just checked.

So how do we do that exactly? We are going to request that the new member is connected to the other members. But where do we get the last_member and other_members from? We get them from add_to_clique in reverse.

// model

def connected(new_member, members) =
        neighbor(new_member, last_member)
    and new_member > last_member
    and connected(new_member, other_members)
    and add_to_clique(last_member, other_members, members) from (last_member, other_members)

When your head stops spinning from that definition, you have to come to the realization that Rel is not a procedural language. It is a relational declarative language with roots in logic programming.

The definitions are rules, or statements, that can be true given the right conditions. So if members is a list of nodes in a clique, we can walk backwards through add_to_clique to extract the last_member and other_members.

From the clique, we know the last node added to it must be the one with the highest ID and the other members those nodes with IDs below it.

Let me show you a bit more to see if it clicks. How would we know if a node is a member of a clique? Well two ways.

First, it can be the last member added to the clique. We can define that method using add_to_clique again, with member as the first element, the underscore (which means we don’t care what goes here) as the second value, and members as our clique.

// model

def clique_member(member, members) = add_to_clique(member, _, members)

But if it’s not the last member to be added to the clique, we need to check the other members in the clique.

How do we do that? Well, we need to get other members from the members using add_to_clique again, and then check to see if the first definition is true for our member and the other members:

// model

def clique_member(member, members) =
     add_to_clique(_, other_members, members)
     and clique_member(member, other_members) from other_members

Is it starting to make sense now? No? Not yet? Okay, one more.

How do we find the size of a clique? If the clique is empty, then the size is equal to zero. That’s easy right?

// model

def clique_size[members in Clique] = 0, nil(members)

But what if the clique is not empty ? The clique size is equal to one (for the current member) plus the clique size of the other members… and where do we get the other members from?

From going backwards through add_to_clique again and ignoring the current member which is the last one added:

// model

def clique_size[members in Clique] = 1 + clique_size[other_members]
  from other_members where members = add_to_clique[_, other_members]

If you're wondering why we created clique_member and clique_size it's because we're going to use them next.

Let’s see the members of our cliques for any clique of size five we find in our graph. The first value in our output will be the hash ID of the clique and the second value will be one of the member node IDs:

// read query

def output = member : clique_member(member, members), clique_size(members, 5)
  for members

Output:

#	Int64	Int64
1	100909350452976023874304331568223927482	1
2	100909350452976023874304331568223927482	2
3	100909350452976023874304331568223927482	3
4	100909350452976023874304331568223927482	4
5	100909350452976023874304331568223927482	8
6	217755388808321031956546083382655110160	1
7	217755388808321031956546083382655110160	2
8	217755388808321031956546083382655110160	3
9	217755388808321031956546083382655110160	4
10	217755388808321031956546083382655110160	14

Exactly the same as we had earlier in our definition of five_cliques. If we want to make the output pretty we can use table from the standard library.

What table does is take a relation that has three elements: the column name, the key, and the value and lays it out nicely. We can use enumerate to get a number as a column name for the position of the clique member.

// read query

def data[id, members] =
  enumerate[member : clique_member(member, members) ][id],
  clique_size(members, 5)

def output = table[data]

Output:

#		1	2	3	4	5
1	100909350452976023874304331568223927482	1	2	3	4	8
2	217755388808321031956546083382655110160	1	2	3	4	14

I hope you enjoyed this preview into Rel and its features. You have learned about the power to add annotations to force your will upon the compiler and the ability to define entities.

Rel goes far beyond a simple type system by allowing you to define the rules your creations have to obey. Finally, you learned that Rel definitions are not methods that turn input into output, but rather rules that can be evaluated in both directions.

Next time, we will follow the social paths of the Karate Club members using the Ford Fulkerson algorithm. Check out Part 1 and Part 2 if you haven't already, and subscribe to our newsletter to stay up to date with RelationalAI.