Parsing the Crowded World of Data Analytics: Highlights
Our board member Bob Muglia recently met with Sanjeev Mohan in an interview for the It Depends podcast. Bob and Sanjeev discussed the challenges, trends, technologies, and the general pulse of the ever-changing data analytics market.
Let’s look at some highlights from their discussion!
The Challenge Facing the Modern Data Stack
Bob and Sanjeev discussed the challenges that arise from centralized data. Bob noted how this leads to relational architectures.
“We’ve moved from a world where companies had data scattered all around in different places, to a world where data is centralized into a single analytic database or data lake.
This exposes issues around governance, performance, and data modeling. Many products have been developed to try and fill the gaps that exist in these systems, but the problem is that they don’t interoperate well. It’s not a shared environment.
It's interesting that the tools being used around the modern data stack in general are not being built on the modern data stack. This is because when you begin to work with the metadata associated with data, the relationships associated with that metadata become incredibly complicated and exceed the conceptual complexity of what you can efficiently do in today's SQL databases. You can't run the queries you need to run on Snowflake or BigQuery.
To calculate the relationships between data dynamically, you are required to move to a new generation of relational architectures.”
Asking Interesting Questions
The conversation turned toward the nature of analytics problems, and how relational architectures expand the possibilities for data analytics.
“The most interesting questions lie in the relationships you never thought to ask about. It's just not feasible to build a system that precalculates all of those connections. You end up with “more atoms in a grain of sand than grains of sand on earth” if you look at the full complexity of some of these models.
The current generation of SQL databases are built on algorithms that are 40 years old. That's true whether you go back to DB2 in the early Oracle days or you go to Snowflake and BigQuery and Redshift; they're all built on similar algorithms of a binary join where you take tables and join them together.
That's just inefficient on some of these complex relationships. Particularly, if recursion is involved.
We need new algorithms. And they are being built right now. There’s a whole new generation of exciting relational algorithms called worst-case optimal join algorithms, which allow you to solve some of these complicated problems.”
Modeling with Relational Knowledge Graphs
Finally, Bob discussed the work we’re doing here at RelationalAI.
“RelationalAI is the company driving relational knowledge graphs. The CEO, Molham Aref, is coordinating research across a number of universities. The company works with 20 universities across the world, with an extensive network of researchers who are building these next-generation algorithms.
Effectively, it's a way of explicitly modeling your business.
The expressiveness of these systems is dramatically greater than the way we do it today.
This way, all the code can be incorporated in one place, written as declarative statements that the semantic optimizer decides when to execute. That goes back to Codd’s theorem from the very beginning of the relational paradigm.”
These are just a few teasers from this fascinating talk! Want to hear more? Check out the video for the full interview.