Boolean and Missing: What Are They Good For?
RelationalAI is built on knowledge graphs, which rarely use null and boolean values. And yet, Rel, RelationAI’s declarative modeling language, has a Missing
data type to represent null
values, and a Bool
type to represent true
and false
boolean values.
Let's explore the role null and boolean values play in a dataset and learn when to use Missing
and Bool
types in Rel.
Representing When Facts Are Not Present
Consider a database of invoices for a company.
You might want to store purchase orders associated with an invoice. But not every client uses purchase orders, so you record those with a null value --- perhaps as NULL
in a SQL table or as null
in a JSON document. Maybe you need to track whether or not the invoice has been paid. You record that as a true
or false
boolean value.
The resulting table of invoices in a SQL database might look something like this:
id | purchase_order | paid |
1 | NULL | true |
1 | “PO-0006” | false |
What do NULL
and boolean
values have in common?
Both concepts represent the presence, or lack thereof, of a fact.
NULL
represents the absence of a purchase order. True
and false
represent whether or not a payment has been received. The presence, or non-presence, of a fact determines which value is assigned.
The question is: if a fact is not present, does it need to be in your data?
Modeling: Missing and Boolean (Usually) Aren’t Necessary
In a knowledge graph, nodes representing entities in a model are related by edges representing the various relationships between those entities.
For example, an Invoice
entity can be related to a PurchaseOrder
entity via an edge labeled has_purchase_order
, and a Payment entity via a has_payment
edge:
n a knowledge graph, you don’t need to store NUL
L values or keep track of true
and false
. Missing data simply isn’t there. If an Invoice
has no PurchaseOrder
, there’s no has_purchase_order
edge. If an Invoice
hasn’t been paid, there’s no has_payment
edge.
With so much emphasis on knowledge graphs, it may seem contradictory to have Missing
and Bool
values in Rel, but sometimes you really do need them.
For more information on modeling graphs, see Data Modeling: Graph Normal Form.
Data Ingestion: Reconstructible Representations in Rel
When you ingest data into RelationalAI’s Relational Knowledge Graph System (RKGS), you create a representation of that data in Rel.
Representations of data need to be high-fidelity. That is, you should be able to reconstruct the original data from the Rel representation. This means that Rel needs a way to represent null and boolean values.
For example, an invoice might be ingested from a JSON document:
// json
{
"id": 1,
"purchase_order": null,
"paid": true
}
In Rel, this JSON document gets represented as the following relation:
// Rel
{
(:id, 1);
(:purchase_order, missing);
(:paid, boolean_true);
}
The missing
value is the only primitive value of the Missing
type. The Bool
type has two primitive values: boolean_true
and boolean_false
.
Having values of type Missing
and Bool
in Rel allows you to export relations as JSON, if needed. And you can accurately reconstruct data that has been ingested. But you usually won’t use Missing
and Bool
types in your models.