Primary goals
- Type safety
- Expressiveness
- Composability
- Familiarity
The internals are designed to map the Ibis API to the backend.
The main user-facing component of Ibis is expressions. The base class of all expressions in Ibis is the Expr
class.
Expressions provide the user facing API, most of which is defined in ibis/expr/api.py
.
Ibis’s type system consists of a set of rules for specifying the types of inputs to ibis.expr.types.Node
subclasses. Upon construction of a Node
subclass, Ibis performs validation of every input to the node based on the rule that was used to declare the input.
Rules are defined in ibis.expr.rules
.
Expr
classExpressions are a thin but important abstraction over operations, containing only type information and shape information, i.e., whether they are tables, columns, or scalars.
Examples of expression types include StringValue
and Table
.
ibis.expr.types.Node
classNode
subclasses make up the core set of operations of Ibis. Each node corresponds to a particular operation.
Most nodes are defined in the ibis.expr.operations
module.
Examples of nodes include ibis.expr.operations.Add
and ibis.expr.operations.Sum
.
Nodes (transitively) inherit from a class that allows node authors to define their node’s input arguments directly in the class body.
Additionally the output_type
member of the class is a rule or method that defines the shape (scalar or column) and element type of the operation.
An example of usage is a node that representats a logarithm operation:
import ibis.expr.rules as rlz
from ibis.expr.operations import Value
class Log(Value):
# A double scalar or column
arg = rlz.double
# Optional argument, defaults to None
base = rlz.optional(rlz.double)
# Output expression's datatype will correspond to arg's datatype
dtype = rlz.dtype_like('arg')
# Output expression will be scalar if arg is scalar, column otherwise
shape = rlz.shape_like('arg')
This class describes an operation called Log
that takes one required argument: a double scalar or column, and one optional argument: a double scalar or column named base
that defaults to nothing if not provided. The base
argument is None
by default so that the expression will behave as the underlying database does.
Similar objects are instantiated when you use Ibis APIs:
Separating expressions from their underlying operations makes it easy to generically describe and validate the inputs to particular nodes. In the log example, it doesn’t matter what operation (node) the double-valued arguments are coming from, they must only satisfy the requirement denoted by the rule.
Separation of the ibis.expr.types.Node
and ibis.expr.types.Expr
classes also allows the API to be tied to the physical type of the expression rather than the particular operation, making it easy to define the API in terms of types rather than specific operations.
Furthermore, operations often have an output type that depends on the input type. An example of this is the greatest
function, which takes the maximum of all of its arguments. Another example is CASE
statements, whose THEN
expressions determine the output type of the expression.
This allows Ibis to provide only the APIs that make sense for a particular type, even when an operation yields a different output type depending on its input. Concretely, this means that you cannot perform operations that don’t make sense, like computing the average of a string column.
The next major component of Ibis is the compilers.
The first few versions of Ibis directly generated strings, but the compiler infrastructure was generalized to support compilation of SQLGlot-based expressions.
The compiler works by translating the different pieces of SQL expression into a string or SQLGlot expression.
The main pieces of a SELECT
statement are:
select_set
)WHERE
clauses (where
)GROUP BY
clauses (group_by
)HAVING
clauses (having
)LIMIT
clauses (limit
)ORDER BY
clauses (order_by
)DISTINCT
clauses (distinct
)Each of these pieces is translated into a SQL string and finally assembled by the instance of the ibis.sql.compiler.ExprTranslator
subclass specific to the backend being compiled. For example, the ibis.impala.compiler.ImpalaExprTranslator
is one of the subclasses that will perform this translation.
While Ibis was designed with an explicit goal of first-class SQL support, Ibis can target other systems such as pandas or Polars.
Presumably we want to do something with our compiled expressions. This is where execution comes in.
This is least complex part of Ibis, mostly only requiring Ibis to correctly handle whatever the database hands back.
By and large, the execution of compiled SQL is handled by the database to which SQL is sent from Ibis.
However, once the data arrives from the database we need to convert that data to a pandas DataFrame.
The Query class, with its ibis.sql.client.Query._fetch
method, provides a way for Ibis ibis.sql.client.SQLClient
objects to do any additional processing necessary after the database returns results to the client.