Skip to content

Table Expressions

Table expressions form the basis for most Ibis expressions.

Table (Expr)

Methods

aggregate(self, metrics=None, by=None, having=None, **kwargs)

Aggregate a table with a given set of reductions grouping by by.

Parameters:

Name Type Description Default
metrics Sequence[ir.Scalar] | None

Aggregate expressions

None
by Sequence[ir.Value] | None

Grouping expressions

None
having Sequence[ir.BooleanValue] | None

Post-aggregation filters

None
kwargs ir.Value

Named aggregate expressions

{}

Returns:

Type Description
Table

An aggregate table expression

alias(self, alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters:

Name Type Description Default
alias str

Name of the child expression

required

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.alias("my_t").sql("SELECT sum(double_col) FROM my_t")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64
    
  month           int32
r1 := View[r0]: my_t
  schema:
    index           int64
      
    month           int32
SQLStringView[r1]: _ibis_view_0
  query: 'SELECT sum(double_col) FROM my_t'
  schema:
    sum(double_col) float64

Returns:

Type Description
ir.Table

An table expression

asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))

Perform an "as-of" join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Optionally, match keys with by before joining with predicates.

Parameters:

Name Type Description Default
left Table

Table expression

required
right Table

Table expression

required
predicates str | ir.BooleanColumn | Sequence[str | ir.BooleanColumn]

Join expressions

()
by str | ir.Column | Sequence[str | ir.Column]

column to group by before joining

()
tolerance str | ir.IntervalScalar | None

Amount of time to look behind when joining

None
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

Returns:

Type Description
Table

Table expression

count(self)

Compute the number of rows in the table.

Returns:

Type Description
ir.IntegerScalar

Number of rows in the table

cross_join(left, right, *rest, *, suffixes=('_x', '_y'))

Compute the cross join of a sequence of tables.

Parameters:

Name Type Description Default
left Table

Left table

required
right Table

Right table

required
rest Table

Additional tables to cross join

()
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

Examples:

>>> import ibis
>>> schemas = [(name, 'int64') for name in 'abcde']
>>> a, b, c, d, e = [
...     ibis.table([(name, type)], name=name) for name, type in schemas
... ]
>>> joined1 = ibis.cross_join(a, b, c, d, e)
>>> joined1
r0 := UnboundTable[e]
  e int64
r1 := UnboundTable[d]
  d int64
r2 := UnboundTable[c]
  c int64
r3 := UnboundTable[b]
  b int64
r4 := UnboundTable[a]
  a int64
r5 := CrossJoin[r3, r2]
r6 := CrossJoin[r5, r1]
r7 := CrossJoin[r6, r0]
CrossJoin[r4, r7]

Returns:

Type Description
Table

Cross join of left, right and rest

difference(self, *tables, *, distinct=True, **kwargs)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
*tables Table

One or more table expressions

()
distinct bool

Only diff distinct rows not occurring in the calling table

True

Returns:

Type Description
Table

The rows present in self that are not present in tables.

distinct(self)

Compute the set of unique rows in the table.

drop(self, fields)

Remove fields from a table.

Parameters:

Name Type Description Default
fields str | Sequence[str]

Fields to drop

required

Returns:

Type Description
Table

Expression without fields

dropna(self, subset=None, how='any')

Remove rows with null values from the table.

Parameters:

Name Type Description Default
subset Sequence[str] | None

Columns names to consider when dropping nulls. By default all columns are considered.

None
how Literal['any', 'all']

Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). Options are 'any' or 'all'. Default is 'any'.

'any'

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'), name='t')
>>> t = t.dropna()  # Drop all rows where any values are null
>>> t
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'any'
>>> t.dropna(how='all')  # Only drop rows where all values are null
r0 := UnboundTable: t
  a int64
  b string
r1 := DropNa[r0]
  how: 'all'
>>> t.dropna(subset=['a'], how='all')  # Only drop rows where all values in column 'a' are null  # noqa: E501
r0 := UnboundTable: t
  a int64
  b string
DropNa[r0]
  how: 'all'
  subset:
    r0.a

Returns:

Type Description
Table

Table expression

fillna(self, replacements)

Fill null values in a table expression.

Parameters:

Name Type Description Default
replacements ir.Scalar | Mapping[str, ir.Scalar]

Value with which to fill the nulls. If passed as a mapping, the keys are column names that map to their replacement value. If passed as a scalar, all columns are filled with that value.

required

Examples:

>>> import ibis
>>> import ibis.expr.datatypes as dt
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> t = t.fillna(0.0)  # Replace nulls in all columns with 0.0
>>> t.fillna({c: 0.0 for c, t in t.schema().items() if t == dt.float64})
r0 := UnboundTable[unbound_table_...]
  a int64
  b string
r1 := FillNa[r0]
  replacements:
    0.0
FillNa[r1]
  replacements:
    frozendict({})

Returns:

Type Description
Table

Table expression

filter(self, predicates)

Select rows from table based on predicates.

Parameters:

Name Type Description Default
predicates ir.BooleanValue | Sequence[ir.BooleanValue]

Boolean value expressions used to select rows in table.

required

Returns:

Type Description
Table

Filtered table expression

get_column(self, name)

Get a reference to a single column from the table

Returns:

Type Description
Column

A column named name.

get_columns(self, iterable)

Get multiple columns from the table

Examples:

>>> import ibis
>>> table = ibis.table(
...    [
...        ('a', 'int64'),
...        ('b', 'string'),
...        ('c', 'timestamp'),
...        ('d', 'float'),
...    ],
...    name='t'
... )
>>> a, b, c = table.get_columns(['a', 'b', 'c'])

Returns:

Type Description
list[Column]

List of column expressions

group_by(self, by=None, **additional_grouping_expressions)

Create a grouped table expression.

Parameters:

Name Type Description Default
by None

Grouping expressions

None
additional_grouping_expressions Any

Named grouping expressions

{}

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a='int32', b='timestamp', c='double'), name='t')
>>> t.group_by([_.a, _.b]).aggregate(sum_of_c=_.c.sum())
r0 := UnboundTable: t
  a int32
  b timestamp
  c float64
Aggregation[r0]
  metrics:
    sum_of_c: Sum(r0.c)
  by:
    a: r0.a
    b: r0.b

Returns:

Type Description
GroupedTable

A grouped table expression

groupby(self, by=None, **additional_grouping_expressions)

Create a grouped table expression.

Parameters:

Name Type Description Default
by None

Grouping expressions

None
additional_grouping_expressions Any

Named grouping expressions

{}

Examples:

>>> import ibis
>>> from ibis import _
>>> t = ibis.table(dict(a='int32', b='timestamp', c='double'), name='t')
>>> t.group_by([_.a, _.b]).aggregate(sum_of_c=_.c.sum())
r0 := UnboundTable: t
  a int32
  b timestamp
  c float64
Aggregation[r0]
  metrics:
    sum_of_c: Sum(r0.c)
  by:
    a: r0.a
    b: r0.b

Returns:

Type Description
GroupedTable

A grouped table expression

head(self, n=5)

Select the first n rows of a table.

The result set is not deterministic without a sort.

Parameters:

Name Type Description Default
n int

Number of rows to include, defaults to 5

5

Returns:

Type Description
Table

table limited to n rows

info(self, buf=None)

Show column names, types and null counts.

Parameters:

Name Type Description Default
buf IO[str] | None

A writable buffer, defaults to stdout

None

intersect(self, *tables, *, distinct=True, **kwargs)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
*tables Table

One or more table expressions

()
distinct bool

Only return distinct rows

True

Returns:

Type Description
Table

A new table containing the intersection of all input tables.

join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))

Perform a join between two tables.

Parameters:

Name Type Description Default
left Table

Left table to join

required
right Table

Right table to join

required
predicates str | Sequence[str | tuple[str | ir.Column, str | ir.Column] | ir.BooleanColumn]

Boolean or column names to join on

()
how Literal['inner', 'left', 'outer', 'right', 'semi', 'anti', 'any_inner', 'any_left', 'left_semi']

Join method

'inner'
suffixes tuple[str, str]

Left and right suffixes that will be used to rename overlapping columns.

('_x', '_y')

limit(self, n, offset=0)

Select the first n rows starting at offset.

Parameters:

Name Type Description Default
n int

Number of rows to include

required
offset int

Number of rows to skip first

0

Returns:

Type Description
Table

The first n rows of table starting at offset

materialize(self)

DEPRECATED: materialize is deprecated as of v3.0; remove the .materialize() call, it has no effect

mutate(self, exprs=None, **mutations)

Add columns to a table expression.

Parameters:

Name Type Description Default
exprs Sequence[ir.Expr] | None

List of named expressions to add as columns

None
mutations ir.Value

Named expressions using keyword arguments

{}

Examples:

Using keywords arguments to name the new columns

>>> import ibis
>>> table = ibis.table(
...     [('foo', 'double'), ('bar', 'double')],
...     name='t'
... )
>>> expr = table.mutate(qux=table.foo + table.bar, baz=5)
>>> expr
r0 := UnboundTable[t]
  foo float64
  bar float64
Selection[r0]
  selections:
    r0
    baz: 5
    qux: r0.foo + r0.bar

Use the name method to name the new columns.

>>> new_columns = [ibis.literal(5).name('baz',),
...                (table.foo + table.bar).name('qux')]
>>> expr2 = table.mutate(new_columns)
>>> expr.equals(expr2)
True

Returns:

Type Description
Table

Table expression with additional columns

prevent_rewrite(self, client=None)

Prevent optimization from happening below this expression.

DEPRECATED: prevent_rewrite is deprecated as of v4.0;

Only valid on SQL-string generating backends.

Parameters:

Name Type Description Default
client None

A client to use to create the SQLQueryResult operation. This can be useful if you're compiling an expression that derives from an UnboundTable operation.

None

Returns:

Type Description
Table

An opaque SQL query

projection(self, *exprs, **named_exprs)

Compute a new table expression using exprs.

Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters:

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str]

Column expression, string, or list of column expressions and strings.

()

Examples:

Simple projection

>>> import ibis
>>> t = ibis.table(dict(a="int64", b="double"), name='t')
>>> proj = t.select(t.a, b_plus_1=t.b + 1)
>>> proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    a:        r0.a
    b_plus_1: r0.b + 1
>>> proj2 = t.select("a", b_plus_1=t.b + 1)
>>> proj.equals(proj2)
True

Aggregate projection

>>> agg_proj = t.select(sum_a=t.a.sum(), mean_b=t.b.mean())
>>> agg_proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    sum_a:  Window(Sum(r0.a), window=Window(how='rows'))
    mean_b: Window(Mean(r0.b), window=Window(how='rows'))

Note the Window objects here.

Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like

>>> t.select(demeaned_a=t.a - t.a.mean())
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    demeaned_a: r0.a - Window(Mean(r0.a), window=Window(how='rows'))

Returns:

Type Description
Table

Table expression

relabel(self, substitutions)

Change table column names, otherwise leaving table unaltered.

Parameters:

Name Type Description Default
substitutions Mapping[str, str]

Name mapping

required

Returns:

Type Description
Table

A relabeled table expression

rowid(self)

A numbering expression representing the row number of the results.

It can be 0 or 1 indexed depending on the backend. Check the backend documentation for specifics.

Examples:

>>> my_table[my_table.rowid(), my_table.name].execute()
1|Ibis
2|pandas
3|Dask

Returns:

Type Description
ir.IntegerValue

An integer column

schema(self)

Get the schema for this table (if one is known)

Returns:

Type Description
sch.Schema

The table's schema.

select(self, *exprs, **named_exprs)

Compute a new table expression using exprs.

Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters:

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str]

Column expression, string, or list of column expressions and strings.

()

Examples:

Simple projection

>>> import ibis
>>> t = ibis.table(dict(a="int64", b="double"), name='t')
>>> proj = t.select(t.a, b_plus_1=t.b + 1)
>>> proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    a:        r0.a
    b_plus_1: r0.b + 1
>>> proj2 = t.select("a", b_plus_1=t.b + 1)
>>> proj.equals(proj2)
True

Aggregate projection

>>> agg_proj = t.select(sum_a=t.a.sum(), mean_b=t.b.mean())
>>> agg_proj
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    sum_a:  Window(Sum(r0.a), window=Window(how='rows'))
    mean_b: Window(Mean(r0.b), window=Window(how='rows'))

Note the Window objects here.

Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like

>>> t.select(demeaned_a=t.a - t.a.mean())
r0 := UnboundTable[t]
  a int64
  b float64
Selection[r0]
  selections:
    demeaned_a: r0.a - Window(Mean(r0.a), window=Window(how='rows'))

Returns:

Type Description
Table

Table expression

set_column(self, name, expr)

Replace an existing column with a new expression.

Parameters:

Name Type Description Default
name str

Column name to replace

required
expr ir.Value

New data for column

required

Returns:

Type Description
Table

Table expression with new columns

sort_by(self, sort_exprs)

Sort table by sort_exprs

Parameters:

Name Type Description Default
sort_exprs str | ir.Column | ir.SortKey | tuple[str | ir.Column, bool] | Sequence[tuple[str | ir.Column, bool]]

Sort specifications

required

Examples:

>>> import ibis
>>> t = ibis.table(dict(a='int64', b='string'))
>>> t.sort_by(['a', ibis.desc('b')])
r0 := UnboundTable: unbound_table_0
  a int64
  b string
Selection[r0]
  sort_keys:
     asc|r0.a
    desc|r0.b

Returns:

Type Description
Table

Sorted table

sql(self, query)

Run a SQL query against a table expression.

The SQL string is backend specific

query must be valid SQL for the execution backend the expression will run against

See Table.alias for details on using named table expressions in a SQL string.

Parameters:

Name Type Description Default
query str

Query string

required

Examples:

>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.sql("SELECT sum(double_col) FROM functional_alltypes")
>>> expr
r0 := AlchemyTable: functional_alltypes
  index           int64
    
  month           int32
SQLStringView[r0]: _ibis_view_1
  query: 'SELECT sum(double_col) FROM functional_alltypes'
  schema:
    sum(double_col) float64

Returns:

Type Description
ir.Table

An opaque table expression

to_array(self)

View a single column table as an array.

Returns:

Type Description
ir.Column

A single column view of a table

union(self, *tables, *, distinct=False, **kwargs)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters:

Name Type Description Default
*tables Table

One or more table expressions

()
distinct bool

Only return distinct rows

False

Returns:

Type Description
Table

A new table containing the union of all input tables.

unpack(self, *columns)

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters:

Name Type Description Default
columns str

String column names to project into self.

()

Examples:

>>> schema = dict(a="struct<b: float, c: string>", d="string")
>>> t = ibis.table(schema, name="t")
>>> t
UnboundTable: t
  a struct<b: float64, c: string>
  d string
>>> t.unpack("a")
r0 := UnboundTable: t
  a struct<b: float64, c: string>
  d string

Selection[r0] selections: b: StructField(r0.a, field='b') c: StructField(r0.a, field='c') d: r0.d

Returns:

Type Description
Table

The child table with struct fields of each of columns projected.

view(self)

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns:

Type Description
Table

Table expression

GroupedTable

An intermediate table expression to hold grouping information.

Methods

count(self, metric_name='count')

Computing the number of rows per group.

Parameters:

Name Type Description Default
metric_name str

Name to use for the row count metric

'count'

Returns:

Type Description
ir.Table

The aggregated table

having(self, expr)

Add a post-aggregation result filter expr.

Parameters:

Name Type Description Default
expr ir.BooleanScalar

An expression that filters based on an aggregate value.

required

Returns:

Type Description
GroupedTable

A grouped table expression

mutate(self, exprs=None, **kwds)

Return a table projection with window functions applied.

Any arguments can be functions.

Parameters:

Name Type Description Default
exprs ir.Value | Sequence[ir.Value] | None

List of expressions

None
kwds ir.Value

Expressions

{}

Examples:

>>> import ibis
>>> t = ibis.table([
...     ('foo', 'string'),
...     ('bar', 'string'),
...     ('baz', 'double'),
... ], name='t')
>>> t
UnboundTable[t]
  foo string
  bar string
  baz float64
>>> expr = (t.group_by('foo')
...          .order_by(ibis.desc('bar'))
...          .mutate(qux=lambda x: x.baz.lag(),
...                  qux2=t.baz.lead()))
>>> print(expr)
r0 := UnboundTable[t]
  foo string
  bar string
  baz float64
Selection[r0]
  selections:
    r0
    qux:  Window(Lag(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
    qux2: Window(Lead(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))

Returns:

Type Description
Table

A table expression with window functions applied

order_by(self, expr)

Sort a grouped table expression by expr.

Parameters:

Name Type Description Default
expr ir.Value | Iterable[ir.Value]

Expressions to order the results by

required

Returns:

Type Description
GroupedTable

A sorted grouped GroupedTable

over(self, window)

Add a window frame clause to be applied to child analytic

expressions.

Parameters:

Name Type Description Default
window _window.Window

Window to add to child analytic expressions

required

Returns:

Type Description
GroupedTable

A new grouped table expression

projection(self, exprs)

Project new columns out of the grouped table.

size(self, metric_name='count')

Computing the number of rows per group.

Parameters:

Name Type Description Default
metric_name str

Name to use for the row count metric

'count'

Returns:

Type Description
ir.Table

The aggregated table


Last update: August 5, 2022