RelationalAI
01 January 2014
less than a minute read
In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR
Author: Todd L. Veldhuizen. 2014.
In Proceedings of the 17th International Conference on Database Theory (ICDT ‘14).
Recent years have seen exciting developments in join algorithms.In 2008, Atserias, Grohe and Marx (henceforth AGM) proved a tight bound on the maximum result size of a full conjunctive query, given constraints on the input relation sizes. In 2012, Ngo, Porat, R«e and Rudra (henceforth NPRR) devised a join algorithm with worst-case running time proportional to the AGM bound [8]. Our commercial database system LogicBlox employs a novel join algorithm, leapfrog triejoin, which compared conspicuously well to the
NPRR algorithm in preliminary benchmarks. This spurred us to analyze the complexity of leapfrog triejoin. In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR. We improve on the results of NPRR by proving that leapfrog triejoin achieves worst-case optimality for finer-grained classes of database instances, such as those defined by constraints on projection cardinalities. We show that NPRR is not worst-case optimal for such classes, giving a counterexample where
leapfrog triejoin runs inO(nlogn)time and NPRR runs in Θ(n1.375)time. On a practical note, leapfrog triejoin can be implemented using conventional data structures such as B-trees, and extends naturally to ∃1 queries. We believe our algorithm offers a useful addition to the existing toolbox of join algorithms, being easy to absorb, simple to implement, and having a concise optimality proof.
We defined named entity recognition (NER) in the legal domain and presented our approach towards generating ground truth data. In what follows, we go over the state-of-the-art in the NER domain and elaborate on the experiments we ran and the lessons we learned.
Read MoreNamed entity recognition is a difficult challenge to solve, particularly in the legal domain. Extracting ground truth labels from long, hierarchical documents is often slow and prone to error. RelationalAI proposes a new, scalable algorithm based on the principles of data-centric AI, designed to meet this challenge and generate high-quality annotations with minimal supervision.
Read MoreMolham shares some history of relational databases, trends in modern cloud-native database systems, and the innovations pioneered at RelationalAI to bring deep learning with relations from idea to reality.
Read More