Table Expressions¶
Table expressions form the basis for most Ibis expressions.
relations
¶
Classes¶
TableExpr (Expr)
¶
Methods¶
aggregate(self, metrics=None, by=None, having=None, **kwargs)
¶
Aggregate a table with a given set of reductions grouping by by
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metrics |
Sequence[ir.ScalarExpr] | None |
Aggregate expressions |
None |
by |
Sequence[ir.ValueExpr] | None |
Grouping expressions |
None |
having |
Sequence[ir.BooleanValue] | None |
Post-aggregation filters |
None |
kwargs |
ir.ValueExpr |
Named aggregate expressions |
{} |
Returns:
Type | Description |
---|---|
TableExpr |
An aggregate table expression |
alias(self, alias)
¶
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying
backend for use in the
TableExpr.sql
method.
.alias
will create a temporary view
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alias |
str |
Name of the child expression |
required |
Examples:
>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.alias("my_t").sql("SELECT sum(double_col) FROM my_t")
>>> expr
r0 := AlchemyTable: functional_alltypes
index int64
⋮
month int32
r1 := View[r0]: my_t
schema:
index int64
⋮
month int32
SQLStringView[r1]: _ibis_view_0
query: 'SELECT sum(double_col) FROM my_t'
schema:
sum(double_col) float64
Returns:
Type | Description |
---|---|
ir.TableExpr |
An table expression |
asof_join(left, right, predicates=(), by=(), tolerance=None, *, suffixes=('_x', '_y'))
¶
Perform an "as-of" join between left
and right
.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Optionally, match keys with by
before joining with predicates
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
TableExpr |
Table expression |
required |
right |
TableExpr |
Table expression |
required |
predicates |
str | ir.BooleanColumn | Sequence[str | ir.BooleanColumn] |
Join expressions |
() |
by |
str | ir.ColumnExpr | Sequence[str | ir.ColumnExpr] |
column to group by before joining |
() |
tolerance |
str | ir.IntervalScalar | None |
Amount of time to look behind when joining |
None |
suffixes |
tuple[str, str] |
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y') |
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
count(self)
¶
Compute the number of rows in the table.
Returns:
Type | Description |
---|---|
ir.IntegerScalar |
Number of rows in the table |
cross_join(left, right, *rest, *, suffixes=('_x', '_y'))
¶
Compute the cross join of a sequence of tables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
TableExpr |
Left table |
required |
right |
TableExpr |
Right table |
required |
rest |
TableExpr |
Additional tables to cross join |
() |
suffixes |
tuple[str, str] |
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y') |
Examples:
>>> import ibis
>>> schemas = [(name, 'int64') for name in 'abcde']
>>> a, b, c, d, e = [
... ibis.table([(name, type)], name=name) for name, type in schemas
... ]
>>> joined1 = ibis.cross_join(a, b, c, d, e)
>>> joined1
r0 := UnboundTable[e]
e int64
r1 := UnboundTable[d]
d int64
r2 := UnboundTable[c]
c int64
r3 := UnboundTable[b]
b int64
r4 := UnboundTable[a]
a int64
r5 := CrossJoin[r3, r2]
r6 := CrossJoin[r5, r1]
r7 := CrossJoin[r6, r0]
CrossJoin[r4, r7]
Returns:
Type | Description |
---|---|
TableExpr |
Cross join of |
difference(self, right)
¶
Compute the set difference of two table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right |
TableExpr |
Table expression |
required |
Returns:
Type | Description |
---|---|
TableExpr |
The rows present in |
distinct(self)
¶
Compute the set of unique rows in the table.
drop(self, fields)
¶
Remove fields from a table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields |
str | Sequence[str] |
Fields to drop |
required |
Returns:
Type | Description |
---|---|
TableExpr |
Expression without |
dropna(self, subset=None, how='any')
¶
Remove rows with null values from the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset |
Sequence[str] | None |
Columns names to consider when dropping nulls. By default all columns are considered. |
None |
how |
Literal['any', 'all'] |
Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). Options are 'any' or 'all'. Default is 'any'. |
'any' |
Examples:
>>> import ibis
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> t = t.dropna() # Drop all rows where any values are null
>>> t = t.dropna(how='all') # Only drop rows where all values are null
>>> t = t.dropna(subset=['a'], how='all') # Only drop rows where all values in column 'a' are null # noqa: E501
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
fillna(self, replacements)
¶
Fill null values in a table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
replacements |
ir.ScalarExpr | Mapping[str, ir.ScalarExpr] |
Value with which to fill the nulls. If passed as a mapping, the keys are column names that map to their replacement value. If passed as a scalar, all columns are filled with that value. |
required |
Examples:
>>> import ibis
>>> import ibis.expr.datatypes as dt
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> t = t.fillna(0.0) # Replace nulls in all columns with 0.0
>>> t.fillna({c: 0.0 for c, t in t.schema().items() if t == dt.float64})
r0 := UnboundTable[unbound_table_...]
a int64
b string
r1 := FillNa[r0]
replacements:
0.0
FillNa[r1]
replacements:
frozendict({})
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
filter(self, predicates)
¶
Select rows from table
based on predicates
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicates |
ir.BooleanValue | Sequence[ir.BooleanValue] |
Boolean value expressions used to select rows in |
required |
Returns:
Type | Description |
---|---|
TableExpr |
Filtered table expression |
get_column(self, name)
¶
Get a reference to a single column from the table
Returns:
Type | Description |
---|---|
ColumnExpr |
A column named |
get_columns(self, iterable)
¶
Get multiple columns from the table
Examples:
>>> import ibis
>>> table = ibis.table(
... [
... ('a', 'int64'),
... ('b', 'string'),
... ('c', 'timestamp'),
... ('d', 'float'),
... ],
... name='t'
... )
>>> a, b, c = table.get_columns(['a', 'b', 'c'])
Returns:
Type | Description |
---|---|
list[ColumnExpr] |
List of column expressions |
group_by(self, by=None, **additional_grouping_expressions)
¶
Create a grouped table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by |
None |
Grouping expressions |
None |
additional_grouping_expressions |
Any |
Named grouping expressions |
{} |
Examples:
>>> import ibis
>>> pairs = [('a', 'int32'), ('b', 'timestamp'), ('c', 'double')]
>>> t = ibis.table(pairs)
>>> b1, b2 = t.a, t.b
>>> result = t.group_by([b1, b2]).aggregate(sum_of_c=t.c.sum())
Returns:
Type | Description |
---|---|
GroupedTableExpr |
A grouped table expression |
groupby(self, by=None, **additional_grouping_expressions)
¶
Create a grouped table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by |
None |
Grouping expressions |
None |
additional_grouping_expressions |
Any |
Named grouping expressions |
{} |
Examples:
>>> import ibis
>>> pairs = [('a', 'int32'), ('b', 'timestamp'), ('c', 'double')]
>>> t = ibis.table(pairs)
>>> b1, b2 = t.a, t.b
>>> result = t.group_by([b1, b2]).aggregate(sum_of_c=t.c.sum())
Returns:
Type | Description |
---|---|
GroupedTableExpr |
A grouped table expression |
head(self, n=5)
¶
Select the first n
rows of a table.
The result set is not deterministic without a sort.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int |
Number of rows to include, defaults to 5 |
5 |
Returns:
Type | Description |
---|---|
TableExpr |
|
info(self, buf=None)
¶
Show column names, types and null counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
buf |
IO[str] | None |
A writable buffer, defaults to stdout |
None |
intersect(self, right)
¶
Compute the set intersection of two table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right |
TableExpr |
Table expression |
required |
Returns:
Type | Description |
---|---|
TableExpr |
The rows common amongst |
join(left, right, predicates=(), how='inner', *, suffixes=('_x', '_y'))
¶
Perform a join between two tables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
left |
TableExpr |
Left table to join |
required |
right |
TableExpr |
Right table to join |
required |
predicates |
str | Sequence[str | tuple[str | ir.ColumnExpr, str | ir.ColumnExpr] | ir.BooleanColumn] |
Boolean or column names to join on |
() |
how |
Literal['inner', 'left', 'outer', 'right', 'semi', 'anti', 'any_inner', 'any_left', 'left_semi'] |
Join method |
'inner' |
suffixes |
tuple[str, str] |
Left and right suffixes that will be used to rename overlapping columns. |
('_x', '_y') |
limit(self, n, offset=0)
¶
Select the first n
rows at beginning of table starting at offset
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int |
Number of rows to include |
required |
offset |
int |
Number of rows to skip first |
0 |
Returns:
Type | Description |
---|---|
TableExpr |
The first |
materialize(self)
¶
DEPRECATED: materialize
is deprecated as of v3.0; remove the .materialize()
call, it has no effect
mutate(self, exprs=None, **mutations)
¶
Add columns to a table expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
Sequence[ir.Expr] | None |
List of named expressions to add as columns |
None |
mutations |
ir.ValueExpr |
Named expressions using keyword arguments |
{} |
Examples:
Using keywords arguments to name the new columns
>>> import ibis
>>> table = ibis.table(
... [('foo', 'double'), ('bar', 'double')],
... name='t'
... )
>>> expr = table.mutate(qux=table.foo + table.bar, baz=5)
>>> expr
r0 := UnboundTable[t]
foo float64
bar float64
Selection[r0]
selections:
r0
baz: 5
qux: r0.foo + r0.bar
Use the name
method to name
the new columns.
>>> new_columns = [ibis.literal(5).name('baz',),
... (table.foo + table.bar).name('qux')]
>>> expr2 = table.mutate(new_columns)
>>> expr.equals(expr2)
True
Returns:
Type | Description |
---|---|
TableExpr |
Table expression with additional columns |
prevent_rewrite(self, client=None)
¶
Prevent optimization from happening below this expression.
Only valid on SQL-string generating backends.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
client |
None |
A client to use to create the SQLQueryResult operation. This can be
useful if you're compiling an expression that derives from an
|
None |
Returns:
Type | Description |
---|---|
TableExpr |
An opaque SQL query |
projection(self, exprs)
¶
Compute a new table expression using exprs
.
Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
ir.ValueExpr | str | Sequence[ir.ValueExpr | str] |
Column expression, string, or list of column expressions and strings. |
required |
Examples:
Simple projection
>>> import ibis
>>> fields = [('a', 'int64'), ('b', 'double')]
>>> t = ibis.table(fields, name='t')
>>> proj = t.projection([t.a, (t.b + 1).name('b_plus_1')])
>>> proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
a: r0.a
b_plus_1: r0.b + 1
>>> proj2 = t[t.a, (t.b + 1).name('b_plus_1')]
>>> proj.equals(proj2)
True
Aggregate projection
>>> agg_proj = t[t.a.sum().name('sum_a'), t.b.mean().name('mean_b')]
>>> agg_proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
sum_a: WindowOp(Sum(r0.a), window=Window(how='rows'))
mean_b: WindowOp(Mean(r0.b), window=Window(how='rows'))
Note the Window
objects here.
Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like
>>> t[(t.a - t.a.mean()).name('demeaned_a')]
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
demeaned_a: r0.a - WindowOp(Mean(r0.a), window=Window(how='rows'))
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
relabel(self, substitutions)
¶
Change table column names, otherwise leaving table unaltered.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
substitutions |
Mapping[str, str] |
Name mapping |
required |
Returns:
Type | Description |
---|---|
TableExpr |
A relabeled table expression |
rowid(self)
¶
A numbering expression representing the row number of the results.
It can be 0 or 1 indexed depending on the backend. Check the backend documentation for specifics.
Examples:
>>> my_table[my_table.rowid(), my_table.name].execute()
1|Ibis
2|pandas
3|Dask
Returns:
Type | Description |
---|---|
ir.IntegerValue |
An integer column |
schema(self)
¶
Get the schema for this table (if one is known)
Returns:
Type | Description |
---|---|
sch.Schema |
The table's schema. |
select(self, exprs)
¶
Compute a new table expression using exprs
.
Passing an aggregate function to this method will broadcast the aggregate's value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
ir.ValueExpr | str | Sequence[ir.ValueExpr | str] |
Column expression, string, or list of column expressions and strings. |
required |
Examples:
Simple projection
>>> import ibis
>>> fields = [('a', 'int64'), ('b', 'double')]
>>> t = ibis.table(fields, name='t')
>>> proj = t.projection([t.a, (t.b + 1).name('b_plus_1')])
>>> proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
a: r0.a
b_plus_1: r0.b + 1
>>> proj2 = t[t.a, (t.b + 1).name('b_plus_1')]
>>> proj.equals(proj2)
True
Aggregate projection
>>> agg_proj = t[t.a.sum().name('sum_a'), t.b.mean().name('mean_b')]
>>> agg_proj
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
sum_a: WindowOp(Sum(r0.a), window=Window(how='rows'))
mean_b: WindowOp(Mean(r0.b), window=Window(how='rows'))
Note the Window
objects here.
Their existence means that the result of the aggregation will be broadcast across the number of rows in the input column. The purpose of this expression rewrite is to make it easy to write column/scalar-aggregate operations like
>>> t[(t.a - t.a.mean()).name('demeaned_a')]
r0 := UnboundTable[t]
a int64
b float64
Selection[r0]
selections:
demeaned_a: r0.a - WindowOp(Mean(r0.a), window=Window(how='rows'))
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
set_column(self, name, expr)
¶
Replace an existing column with a new expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Column name to replace |
required |
expr |
ir.ValueExpr |
New data for column |
required |
Returns:
Type | Description |
---|---|
TableExpr |
Table expression with new columns |
sort_by(self, sort_exprs)
¶
Sort table by sort_exprs
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sort_exprs |
str | ir.ColumnExpr | ir.SortKey | tuple[str | ir.ColumnExpr, bool] | Sequence[tuple[str | ir.ColumnExpr, bool]] |
Sort specifications |
required |
Examples:
>>> import ibis
>>> t = ibis.table([('a', 'int64'), ('b', 'string')])
>>> ab_sorted = t.sort_by([('a', True), ('b', False)])
Returns:
Type | Description |
---|---|
TableExpr |
Sorted table |
sql(self, query)
¶
Run a SQL query against a table expression.
The SQL string is backend specific
query
must be valid SQL for the execution backend the expression
will run against
See TableExpr.alias
for
details on using named table expressions in a SQL string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str |
Query string |
required |
Examples:
>>> con = ibis.duckdb.connect("ci/ibis-testing-data/ibis_testing.ddb")
>>> t = con.table("functional_alltypes")
>>> expr = t.sql("SELECT sum(double_col) FROM functional_alltypes")
>>> expr
r0 := AlchemyTable: functional_alltypes
index int64
⋮
month int32
SQLStringView[r0]: _ibis_view_1
query: 'SELECT sum(double_col) FROM functional_alltypes'
schema:
sum(double_col) float64
Returns:
Type | Description |
---|---|
ir.TableExpr |
An opaque table expression |
to_array(self)
¶
View a single column table as an array.
Returns:
Type | Description |
---|---|
ir.ColumnExpr |
A single column view of a table |
union(self, right, distinct=False)
¶
Compute the set union of two table expressions.
The input tables must have identical schemas.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right |
TableExpr |
Table expression |
required |
distinct |
bool |
Only union distinct rows not occurring in the calling table (this can be very expensive, be careful) |
False |
Returns:
Type | Description |
---|---|
TableExpr |
Union of table and |
view(self)
¶
Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.
Returns:
Type | Description |
---|---|
TableExpr |
Table expression |
groupby
¶
User API for grouping operations.
Classes¶
GroupedTableExpr
¶
An intermediate table expression to hold grouping information.
Methods¶
count(self, metric_name='count')
¶
Computing the number of rows per group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric_name |
str |
Name to use for the row count metric |
'count' |
Returns:
Type | Description |
---|---|
ir.TableExpr |
The aggregated table |
having(self, expr)
¶
Add a post-aggregation result filter expr
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.BooleanScalar |
An expression that filters based on an aggregate value. |
required |
Returns:
Type | Description |
---|---|
GroupedTableExpr |
A grouped table expression |
mutate(self, exprs=None, **kwds)
¶
Return a table projection with window functions applied.
Any arguments can be functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs |
ir.ValueExpr | Sequence[ir.ValueExpr] | None |
List of expressions |
None |
kwds |
ir.ValueExpr |
Expressions |
{} |
Examples:
>>> import ibis
>>> t = ibis.table([
... ('foo', 'string'),
... ('bar', 'string'),
... ('baz', 'double'),
... ], name='t')
>>> t
UnboundTable[t]
foo string
bar string
baz float64
>>> expr = (t.group_by('foo')
... .order_by(ibis.desc('bar'))
... .mutate(qux=lambda x: x.baz.lag(),
... qux2=t.baz.lead()))
>>> print(expr)
r0 := UnboundTable[t]
foo string
bar string
baz float64
Selection[r0]
selections:
r0
qux: WindowOp(Lag(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
qux2: WindowOp(Lead(r0.baz), window=Window(group_by=[r0.foo], order_by=[desc|r0.bar], how='rows'))
Returns:
Type | Description |
---|---|
TableExpr |
A table expression with window functions applied |
order_by(self, expr)
¶
Sort a grouped table expression by expr
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.ValueExpr | Iterable[ir.ValueExpr] |
Expressions to order the results by |
required |
Returns:
Type | Description |
---|---|
GroupedTableExpr |
A sorted grouped GroupedTableExpr |
over(self, window)
¶
Add a window frame clause to be applied to child analytic expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window |
_window.Window |
Window to add to child analytic expressions |
required |
Returns:
Type | Description |
---|---|
GroupedTableExpr |
A new grouped table expression |
projection(self, exprs)
¶
Project new columns out of the grouped table.
size(self, metric_name='count')
¶
Computing the number of rows per group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric_name |
str |
Name to use for the row count metric |
'count' |
Returns:
Type | Description |
---|---|
ir.TableExpr |
The aggregated table |