Michelle Yi & Nikolaos Vasiloglou
26 October 2021
2 min read
Language models have become prevalent in almost every way we interact with technology today. Any time an ask is made of Siri, Alexa, Google Assistant, search engines, or other methods using language, language models are invoked to help us do everything from answer questions to translate bodies of text.
Because of the astounding success of technologies such as GPT-3, language models are dominating not only our daily interactions with technology via over 300+ known apps today, but the academic field as well: almost any major publication you can think of has run something on them.
Despite all this popularity, we actually don’t know why language models are so effective. Recent scholarship has offered mathematical explanations for the effectiveness of language models. But in a more intuitive sense, language models store knowledge inside a network that can do efficient inference. With very few labels, a program can create an application. This is also referred to as few shot learning, or learning from only a handful of examples—humans are good at this.
In this series, we are going to talk about these trends and more in artificial intelligence spotlighted at ICLR 2021, with a specific focus on massive language models and related areas. Language models and data programming are the two crucial concepts we’re discussing here; language models are manifested through two main technologies called transformers and self-supervision. Transformers are an architecture for converting (transforming) one sequence of text into another using two parts, an encoder and decoder. These concepts will be the basis for everything we cover.
At RelationalAI, we are particularly interested in how we can augment these foundational concepts and make them both more powerful and more accessible, as language models can be expensive and limiting in who can leverage or develop them.
Key areas we can complement language models include:
To this end, we will focus the first part of this series on the intersection of language models and increasing their accessibility without relying on massive amounts of data and through new methods of computation. Recent advances in this area, in addition to the use of transformers and self-supervision mentioned previously, make us challenge the use of compute versus memory, GPUs as a standard for deep learning, and even the way we think about the sparse nature of updates in large networks.
We will close Part I with an information retrieval example that puts several of these concepts together.
The second half of this series will cover “Questioning the state of the art (SOTA)”, where we will examine and challenge the latest architectures and techniques used in language models today. Here, we explore everything from pruning neural networks to the distillation of models and even adding logic to enhance the text generation of transformers.
At RelationalAI, we believe that transformers should live in-database. Part II will conclude with an example that is the culmination of multiple concepts covered and shows why we believe transformers-in-database can both reduce the resources needed and supercharge models for all to use.
The job of a database administrator is evolving, just as almost all careers in IT. DBAs are often the ones closest to the task of capturing the real world. This is arguably the base of the entire application stack. If your model is wrong, your data will be wrong, your predictions will be wrong, your decisions will be wrong. It all falls apart without the right model. It falls apart without those people who know what to capture and in what detail, what to connect and in what ways. That’s the DBA.Read More
Our semantic optimizer is a query optimizer that uses the knowledge of user constraints, the knowledge of the data at hand, and the knowledge of mathematics to find an algorithm to answer a query in the fastest way it can. Here we touch upon just one feature of the semantic optimizer: recursion.Read More
About a decade ago, Marko Rodriguez wrote a blog post on Loopy Lattices. It became infamous amongst graph database practitioners as it taught them a very important lesson: Never give in to the temptation to ask for all potential paths between two nodes.Read More