RelationalAI Simple Start Snowflake Notebook
Overview
This notebook provides a minimalistic starting point to help you get up and running with RelationalAI in Snowflake Notebooks.
To see examples that showcase RelationalAI's analytics capabilities, check out the other notebooks on the docs site.
What you will learn
- How to get RelationalAI's Python library running in your Snowflake account
- How to run a simple graph algorithm over your data.
Let's get started!
One preliminary step is required before you can run this notebook:
- Choose Notebook settings from the triple-dot (⋮) dropdown in the top-right corner of the Snowflake Notebooks window and switch to the External access tab. Turn on the
PYPI_ACCESS_INTEGRATION
toggle. This allows your notebook download the RelationalAI library from PyPI.
%pip install relationalai networkx matplotlib
Note: If you're running this notebook on a warehouse rather than a container, the
pip install
on the previous line won't work. Instead, follow these steps:
- Select
networkx
andmatplotlib
from the Packages dropdown in the top-right corner of the Snowflake Notebooks window. These packages are notrelationalai
dependencies, but they are used for graph visualization purposes in this notebook.- Upload the RelationalAI Python library as a ZIP file into your notebook filesystem. You can download the ZIP file from the RelationalAI website here.
- Uncomment and run the code cell below to make the packages in the ZIP file visible to the Python interpreter.
# import sys
# sys.path.append("./relationalai.zip")
Now we're ready to import the RelationalAI Python library and start using it.
import relationalai as rai
from relationalai.std import aggregates
from relationalai.std.graphs import Graph
from relationalai.std import alias
import networkx as nx
import matplotlib.pyplot as plt
Data Setup
When RelationalAI was installed into your account, the account administrator may have created the table RAI_DEMO.SIMPLE_START.CONNECTIONS
. If the table has not been created, then you can uncomment and run the SQL below from a Snowflake worksheet with account admin privileges.
# Note: you won't need this if the table already exists
# create database if not exists RAI_DEMO;
# create schema if not exists RAI_DEMO.SIMPLE_START;
# create or replace table RAI_DEMO.SIMPLE_START.CONNECTIONS (
# station_1 int,
# station_2 int
# );
# insert into RAI_DEMO.SIMPLE_START.CONNECTIONS (station_1, station_2) values
# (1, 2),
# (1, 3),
# (3, 4),
# (1, 4),
# (4, 5),
# (5, 7),
# (6, 7),
# (6, 8),
# (7, 8);
Define Model in RelationalAI
Let's define our model object. Models represent collections of objects. Objects, like Python objects, have types and properties, which we will define in a bit.
Note: Invoking the Model
constructor will create a new engine for you if it hasn't already been created, so running this line of code might take a few minutes:
model = rai.Model("SimpleStart", ensure_change_tracking=True)
Note: The ensure_change_tracking
flag is set to True
to ensure that change tracking is enabled for all Snowflake source tables and views consumed by the model. Alternatively, you may set this in your RAI configuration.
Referencing Snowflake Data
Because RelationalAI is tightly integrated with Snowflake, we can refer to data that we are streaming to our RelationalAI schema by simply referring to the source:
<my_database>.<my_schema>.<my_table>.
The data for this example consists of a single table called CONNECTIONS
whose columns are called station_1
and station_2
. These station values represent IDs of power stations, and a row in the table represents a connection (via transmission lines and substations) between two power stations.
Accordingly, we will introduce two types that represent the two kinds of objects in our model: Station
and Connection
.
Station = model.Type("Station")
Connection = model.Type(
"Connection",
source="RAI_DEMO.SIMPLE_START.CONNECTIONS"
)
Simple Queries
We can run a query to count the number of connections as follows:
# Count number of rows in the connections table:
with model.query() as select:
connection = Connection()
num_records = aggregates.count(connection)
res = select(num_records)
res.results
result |
---|
9 |
We haven't yet said what a Station
is. We can do that using a rule. In this rule we will also define the is_connected
property for stations:
with model.rule():
connection = Connection()
station_1 = Station.add(id=connection.station_1)
station_2 = Station.add(id=connection.station_2)
station_1.is_connected.extend([station_2])
station_2.is_connected.extend([station_1])
Now we can ask for a list of all the stations IDs:
with model.query() as select:
station = Station()
res = select(station.id)
res.results
id |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
Graph Analysis
The power stations and connections between them form a graph. We can model a graph in RelationalAI by wrapping the model in a Graph
object and associating data with its Node
and Edge
properties.
graph = Graph(model, undirected = True)
graph.Node.extend(Station, label=Station.id)
graph.Edge.extend(Station.is_connected)
def visualize_graph(graph_data):
G = nx.DiGraph()
for node, props in graph_data['nodes'].items():
G.add_node(node, **props)
for (a, b), props in graph_data['edges'].items():
G.add_edge(a, b, **props)
pos = nx.planar_layout(G)
plt.figure(figsize=(4, 2))
labels = nx.get_node_attributes(G, 'label')
nx.draw(G, pos, labels=labels, with_labels=True, font_color="white", edge_color="gray")
plt.show()
graph_data = graph.fetch()
visualize_graph(graph_data)
You can see from the figure that Stations 4, 5, and 7 are especially critical to this network because they provide the only connection between two larger components. It's important to be able to quantify and compute this kind of information because it would not be so visually apparent in a larger, real-world network.
We can do that using a graph analytics metric called betweenness centrality. This metric and others are available under the graph.compute
namespace:
with model.query() as select:
station = Station()
centrality = graph.compute.betweenness_centrality(station)
res = select(station.id, alias(centrality, "betweenness_centrality"))
res.results
id | betweenness_centrality |
---|---|
1 | 6.0 |
2 | 0.0 |
3 | 0.0 |
4 | 12.0 |
5 | 12.0 |
6 | 0.0 |
7 | 10.0 |
8 | 0.0 |
As predicted, Stations 4, 5, and 7 have the highest betweenness centrality scores!
Conclusion
The functionality demonstrated in this notebook barely scratches the surface of what's possible with RelationalAI. The Example Notebooks page on the docs site contains a variety of example notebooks, each of which explores a scenario and a set of analytics capabilities in greater depth.
Brought to you by RelationalAI & Snowflake Native Applications!