RelationalAI Research Shines at SIGMOD/PODS: 15 Papers, 3 Awards, and Talks on the Future of Graphs, Queries, and Optimization

Friday, June 27, 2025

RelationalAI Research Shines at SIGMOD/PODS: 15 Papers, 3 Awards, and Talks on the Future of Graphs, Queries, and Optimization

RelationalAI will have a big presence at this year's annual SIGMOD/PODS International Conference on Management of Data, with at least 15 accepted papers coauthored by RAI researchers, scientists, engineers, and faculty in our research network. SIGMOD/PODS is a top conference for new research in database theory and applications, and also serves as host to at least 12 workshops and 14 tutorials, during which sub-communities can engage more deeply in specific topics. RelationalAI works on the cutting-edge of problems in database systems and theory, programming languages, machine learning, and query optimization, with insights from our research directly impacting our Relational Knowledge Graph Management System. We are excited to share an overview of our accepted works and participation in SIGMOD/PODS here!

Come meet us at SIGMOD/PODS or check out our website to learn more!

Awards Several RAI affiliates received awards for their work in the proceedings this year, while others were recognized for their long-term impact in the database community. These awards include:

A SIGMOD Best Paper Award to Senior Computer Scientist Mahmoud Abo Khamis, members of our faculty research network Dan Olteanu (University of Zürich) and Dan Suciu (University of Washington), and Haozhe Zhang and Christoph Mayer (both from University of Zürich), for developing LpBound, a new cardinality estimator for multijoin queries.
A PODS Distinguished Paper Award to Abo Khamis and Suciu, in joint work with Xiao Hu (University of Waterloo), for their framework that unifies combinatorial techniques with fast matrix multiplication to answer Boolean conjunctive queries.
Abo Khamis and VP of Research and Query Optimizer Lead Hung Ngo received this year's PODS Test of Time Award for developing a geometric framework for computing joins. This work was initially published in the PODS 2015 proceedings, and was joint with faculty Atri Rudra (University at Buffalo) and Christopher Ré (Stanford).

Invited Talks. Since 2016, the Gems of Pods event features results in PODS that have been influential to the PODS community and beyond. Wim Martens (RelationalAI researcher and professor at the University of Bayreuth) will give this year's Gems of PODS talk “Querying graph data: Where We Are and Where to Go” for his line of work on developing the theory of query languages for graph databases. Much of this work is joint with other RAI affiliates, including Leonid Libkin, Filip Murlak, Liat Peterfreund, and Domagoj Vrgo. In addition, Leonid Libkin (RelationalAI researcher and professor at the University of Edinburgh) will give a Keynote talk “GQL in academia: a progress report” for the GRADES-NDA workshop co-located with SIGMOD/PODS. This talk will focus on recent standards for query languages for property graphs, PGQ and GQL. Libkin's talk will overview formal models, expressiveness, and limitations of these languages, and propose future research directions for improving graph query standards.

Papers. The following papers will appear in the SIGMOD and PODS proceedings. In the list, we include all authors on a paper and mark those affiliated with RelationalAI with a *.

SIGMOD

Rel: A Programming Language for Relational Data Aref*, Guagliardo*, Kastrinis*, Libkin*, Marsault*, Martens*, McGrath*, Murlak*, Nystrom*, Peterfreund*, Rogers*, Sirangelo*, Vrgoč*, Zhao*, Zreika*

Rel is the programming language currently implemented as part of RelationalAI's relational knowledge graph management system, which is available as a co-processor to Snowflake. This paper details the main technical innovations of Rel and will appear in SIGMOD's Industry track.

LpBound: Pessimistic Cardinality Estimation using $ell_p$-Norms of Degree Sequences Zhang, Mayer, Abo Khamis*, Olteanu*, Suciu*

A central problem in query optimization is cardinality estimation, where the goal is to use statistics about a database to estimate a query output's size. Tight estimates on the output's size are essential, as they help inform many decisions in query planning. This paper introduces LPBound, an estimator that uses a new type of database statistic that is compact and quickly computable, but still rich.

LpBound in Action: Cardinality Estimation with One-Sided Guarantees Zhang, Mayer, Abo Khamis*, Olteanu*, Suciu*

This paper demonstrates LPBound and will appear in SIGMOD's Demo track. LpBound can be more accurate than several commercial and open-source estimators, while remaining fast to compute and using small amounts of memory. Experiments show the estimates from LpBound lead to query plans whose runtime is orders of magnitude lower than those produced by open-source systems.

Dangers of List Processing in Querying Property Graphs Gheerbrant, Libkin*, Rogova

This work studies the addition of lists and list processing to property graph query languages (e.g., Cypher and GQL). They show the expressiveness from these additions can overwhelm the query optimizer, and propose some ways to safely use lists and list processing in these languages.

Moving on From Group Commit: Autonomous Commit Enables High Throughput and Low Latency on NVMe SSDs Nguyen, Alhomssi*, Ziegler, Leis

Modern NVMe SSDs support parallelism and have very low write latency, while also being widely available. This work proposes a new commit processing protocol for Database Management Systems that uses these key features of NVMe SSDs to achieve low latency while maintaining high throughput.

HoneyComb: A Parallel Worst-Case Optimal Join on Multicores Wu, Suciu*

Worst-case optimal join (WCOJ) algorithms are used to compute multijoin queries in applications such as graph databases, social network analyses, and querying knowledge graphs. Moreover, WCOJ algorithms are theoretically optimal. This paper introduces HoneyComb, a parallelized version of WCOJ for large multicore systems with shared memory.

Galley: Modern Query Optimization for Sparse Tensor Programs Deeds, Ahrens, Balazinska, Suciu*

Made popular by applications in deep learning, tensor programming models are abstractions suitable for problems that can be expressed as operations on tensors. Extensions of these models to sparse tensors force users to make complicated optimization choices that drastically impact efficiency. This work introduces Galley, a sparse tensor programming model that has a built-in optimization strategy.

Using Process Calculus for Optimizing Data and Computation Sharing in Complex Stateful Parallel Computations}.

Tian, Koch, Olteanu*Complex stateful parallel computations have broad applications for simulations in economics and epidemiology. This work introduces a formalism based on $pi$-calculus for expressing and optimizing these computations, called behavioral equations. Further, this work expresses several optimizations using behavioral equations, and builds a system implementing behavioral equations and their optimizations.

Computing Inconsistency Measures Under Differential PrivacyMohapatra, Gilad, He, Kimelfeld*

Differential privacy (DP) is a criterion that aims to ensure sensitive data remains private. Data is often perturbed in some manner to ensure DP, but in doing so its quality may be compromised.This work studies inconsistency measurements (a type of quality assessment) in a database protected by DP.

PODS

Fast Matrix Multiplication meets the Submodular WidthAbo Khamis*, Hu, Suciu* Answering Boolean Conjunctive Queries is one of the most fundamental problems in database theory. This paper unifies known algorithmic techniques for the problem including fast matrix multiplication into a single information theoretic framework that derives the best known algorithm for any query.
Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue Arenas*, Merkl, Pichler*, Riveros

When the set of answers to a query is very large, some systems may prefer to return a reasonably sized, yet diverse (by some metric), subset to the user instead of the whole output.This paper finds a tight characterization for the diversity metric so that computing such a subset of query answers is tractable.

Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree ConstraintsIm*, Moseley*, Ngo*, Pruhs*

This work studies cardinality estimation when the database statistics describe the conditional entropy of a pair of sets of variables. They provide an algorithm for when these database statistics have a particular structure, and accompanying lower bounds to show that relaxing this structure results in a setting as difficult as the general one.

Polynomial Time Convergence of the Iterative Evaluation of Datalog ProgramsIm*, Moseley*, Ngo*, Pruhs*

This work shows that the convergence of \textsf{Datalog}$^{circ}$, a more expressive extension of \textsf{Datalog}, programs is polynomial in natural parameters of the program and underlying semiring. One consequence of these results is that the worst-case convergence of general programs is not much worse than the worst-case convergence of linear programs.

Insert-Only versus Insert-Delete in Dynamic Query EvaluationAbo Khamis*, Kara, Olteanu*, Suciu*

In dynamic query evaluation, the goal is to maintain query answers while the database is changing. This paper connects query evaluation in the static and dynamic settings. For instance, this work shows that a certain class of algorithms for static query evaluation can be used for dynamic query evaluation, and how lower bounds in the static setting can be used to obtain lower bounds in the dynamic setting.

Output-Sensitive Evaluation of Regular Path QueriesAbo Khamis*, Kara, Olteanu*, Suciu*

The Product Graph algorithm is a simple algorithm for computing regular path queries that is widely used in database systems, including that of RelationalAI. This work introduces a refinement of the Product Graph algorithm that has better runtime when the size of the query output is small.

Workshops Throughout the course of the SIGMOD/PODS conference there are a number of workshops and tutorials that bring together different sub-communities and highlight emerging technologies.

One of the workshops during SIGMOD/PODS is the 8th annual Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). The GRADES-NDA workshop focuses on problems in large-scale graph data management and analytics, including domain-specific challenges, data mining, and graph algorithms. The following papers will appear at GRADES-NDA:

A Compendium of Regular Expression Shapes in SPARQL QueriesHammerer, Martens*

This work studies and classifies over 148 million regular path queries (RPQs), which are regular expressions over the labels of a graph database. RPQs are ubiquitous in graph databases. Among their findings, this work classifies the syntactic shapes, frequency of the operators, and degree of non-determinism of the RPQs.

Extending Pattern Matching Queries in Property Graphs with Interpreted PredicatesLibkin*, Sirangelo*, Yilmaz

Property graph query languages, like GQL and SQL/PGQ, are not yet fully expressive. They struggle with complex conditions on data or computations over infinite domains (e.g., numerical comparisons, constraints over real numbers, or variables not in the database). This work begins formalizing such expressive query languages, by pulling inspiration from constraint databases.

The 19th International Symposium on Database Programming Languages (DBPL) will also be co-located with SIGMOD. DBPL is an interdisciplinary symposium aiming to bring together researchers and practitioners who work in the intersection of programming language and data management. This year's DBPL is co-organized by a member of our faculty research network Amir Shaikhha, who is a professor at the University of Edinburgh.

Dan Olteanu will also talk about LPBound and the potential impacts of their work during the “Warmup: Query Optimization Unleashed” session (chaired by Dan Suciu) on techniques in query optimization.