Boolean and Missing: What Are They Good For?

David Amos

26 October 2022

2 min read

RelationalAI is built on knowledge graphs, which rarely use null and boolean values. And yet, Rel, RelationAI’s declarative modeling language, has a Missing data type to represent null values, and a Bool type to represent true and false boolean values.

Let's explore the role null and boolean values play in a dataset and learn when to use Missing and Bool types in Rel.

Representing When Facts Are Not Present

Consider a database of invoices for a company.

You might want to store purchase orders associated with an invoice. But not every client uses purchase orders, so you record those with a null value — perhaps as NULL in a SQL table or as null in a JSON document. Maybe you need to track whether or not the invoice has been paid. You record that as a true or false boolean value.

The resulting table of invoices in a SQL database might look something like this:

idpurchase_orderpaid
1NULLtrue
1“PO-0006”false

What do NULL and boolean values have in common?

Both concepts represent the presence, or lack thereof, of a fact.

NULL represents the absence of a purchase order. True and false represent whether or not a payment has been received. The presence, or non-presence, of a fact determines which value is assigned.

The question is: if a fact is not present, does it need to be in your data?

Modeling: Missing and Boolean (Usually) Aren’t Necessary

In a knowledge graph, nodes representing entities in a model are related by edges representing the various relationships between those entities.

For example, an Invoice entity can be related to a PurchaseOrder entity via an edge labeled has_purchase_order, and a Payment entity via a has_payment edge:

In a knowledge graph, you don’t need to store NULL values or keep track of true and false. Missing data simply isn’t there. If an Invoice has no PurchaseOrder, there’s no has_purchase_order edge. If an Invoice hasn’t been paid, there’s no has_payment edge.

With so much emphasis on knowledge graphs, it may seem contradictory to have Missing and Bool values in Rel, but sometimes you really do need them.

For more information on modeling graphs, see Data Modeling: Graph Normal Form.

Data Ingestion: Reconstructible Representations in Rel

When you ingest data into RelationalAI’s Relational Knowledge Graph Management System (RKGMS), you create a representation of that data in Rel.

Representations of data need to be high-fidelity. That is, you should be able to reconstruct the original data from the Rel representation. This means that Rel needs a way to represent null and boolean values.

For example, an invoice might be ingested from a JSON document:

//json:
{
    "id": 1,
    "purchase_order": null,
    "paid": true
}

In Rel, this JSON document gets represented as the following relation:

//rel:
{
    (:id, 1);
    (:purchase_order, missing);
    (:paid, boolean_true);
}

The missing value is the only primitive value of the Missing type. The Bool type has two primitive values: boolean_true and boolean_false.

Having values of type Missing and Bool in Rel allows you to export relations as JSON, if needed. And you can accurately reconstruct data that has been ingested. But you usually won’t use Missing and Bool types in your models.



Related Posts

Get Early Access

Join our community, keep up to date with the latest developments in our monthly newsletter, and get early access to RelationalAI.