Skip to content

Generic Expression APIs

These expressions are available on scalars and columns of any element type.

Value

Bases: Expr

Base class for a data generating expression having a known type.

Functions

as_table()

Promote the expression to a table.

Returns:

Type Description
Table

A table expression

Examples:

>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().name("len").as_table()
>>> expected = t.select(len=t.a.length())
>>> expr.equals(expected)
True

asc()

Sort an expression ascending.

between(lower, upper)

Check if this expression is between lower and upper, inclusive.

Parameters:

Name Type Description Default
lower Value

Lower bound

required
upper Value

Upper bound

required

Returns:

Type Description
BooleanValue

Expression indicating membership in the provided range

case()

Create a SimpleCaseBuilder to chain multiple if-else statements.

Add new search expressions with the .when() method. These must be comparable with this column expression. Conclude by calling .end()

Returns:

Type Description
SimpleCaseBuilder

A case builder

Examples:

>>> import ibis
>>> t = ibis.table([('string_col', 'string')], name='t')
>>> expr = t.string_col
>>> case_expr = (expr.case()
...              .when('a', 'an a')
...              .when('b', 'a b')
...              .else_('null or (not a and not b)')
...              .end())
>>> case_expr
r0 := UnboundTable: t
  string_col string
SimpleCase(...)

cases(case_result_pairs, default=None)

Create a case expression in one shot.

Parameters:

Name Type Description Default
case_result_pairs Iterable[tuple[ir.BooleanValue, Value]]

Conditional-result pairs

required
default Value | None

Value to return if none of the case conditions are true

None

Returns:

Type Description
Value

Value expression

cast(target_type)

Cast expression to indicated data type.

Parameters:

Name Type Description Default
target_type dt.DataType

Type to cast to

required

Returns:

Type Description
Value

Casted expression

coalesce(*args)

Return the first non-null value from args.

Parameters:

Name Type Description Default
args Value

Arguments from which to choose the first non-null value

()

Returns:

Type Description
Value

Coalesced expression

Examples:

>>> import ibis
>>> ibis.coalesce(None, 4, 5).name("x")
x: Coalesce(...)

collect(where=None)

Aggregate this expression's elements into an array.

This function is called array_agg, list_agg, or list in other systems.

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Filter to apply before aggregation

None

Returns:

Type Description
ArrayScalar

Collected array

Examples:

Basic collect usage

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"key": list("aaabb"), "value": [1, 2, 3, 4, 5]})
>>> t
┏━━━━━━━━┳━━━━━━━┓
┃ key    ┃ value ┃
┡━━━━━━━━╇━━━━━━━┩
│ string │ int64 │
├────────┼───────┤
│ a      │     1 │
│ a      │     2 │
│ a      │     3 │
│ b      │     4 │
│ b      │     5 │
└────────┴───────┘
>>> t.value.collect()
[1, 2, 3, 4, 5]
>>> type(t.value.collect())
<class 'ibis.expr.types.arrays.ArrayScalar'>

Collect elements per group

>>> t.group_by("key").agg(v=lambda t: t.value.collect())
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ key    ┃ v                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ array<int64>         │
├────────┼──────────────────────┤
│ a      │ [1, 2, ... +1]       │
│ b      │ [4, 5]               │
└────────┴──────────────────────┘

Collect elements per group using a filter

>>> t.group_by("key").agg(v=lambda t: t.value.collect(where=t.value > 1))
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ key    ┃ v                    ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ array<int64>         │
├────────┼──────────────────────┤
│ a      │ [2, 3]               │
│ b      │ [4, 5]               │
└────────┴──────────────────────┘

desc()

Sort an expression descending.

fillna(fill_value)

Replace any null values with the indicated fill value.

Parameters:

Name Type Description Default
fill_value Scalar

Value with which to replace NA values in self

required

Examples:

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.sex
┏━━━━━━━━┓
┃ sex    ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male   │
│ female │
│ female │
│ ∅      │
│ female │
│ male   │
│ female │
│ male   │
│ ∅      │
│ ∅      │
│ …      │
└────────┘
>>> t.sex.fillna("unrecorded").name("sex")
┏━━━━━━━━━━━━┓
┃ sex        ┃
┡━━━━━━━━━━━━┩
│ string     │
├────────────┤
│ male       │
│ female     │
│ female     │
│ unrecorded │
│ female     │
│ male       │
│ female     │
│ male       │
│ unrecorded │
│ unrecorded │
│ …          │
└────────────┘

Returns:

Type Description
Value

self filled with fill_value where it is NA

greatest(*args)

Compute the largest value among the supplied arguments.

Parameters:

Name Type Description Default
args ir.Value

Arguments to choose from

()

Returns:

Type Description
Value

Maximum of the passed arguments

group_concat(sep=',', where=None)

Concatenate values using the indicated separator to produce a string.

Parameters:

Name Type Description Default
sep str

Separator will be used to join strings

','
where ir.BooleanValue | None

Filter expression

None

Returns:

Type Description
StringScalar

Concatenated string expression

hash(how='fnv')

Compute an integer hash value.

Parameters:

Name Type Description Default
how str

Hash algorithm to use

'fnv'

Returns:

Type Description
IntegerValue

The hash value of self

identical_to(other)

Return whether this expression is identical to other.

Corresponds to IS NOT DISTINCT FROM in SQL.

Parameters:

Name Type Description Default
other Value

Expression to compare to

required

Returns:

Type Description
BooleanValue

Whether this expression is not distinct from other

isin(values)

Check whether this expression's values are in values.

Parameters:

Name Type Description Default
values Value | Sequence[Value]

Values or expression to check for membership

required

Returns:

Type Description
BooleanValue

Expression indicating membership

Examples:

Check whether a column's values are contained in a sequence

>>> import ibis
>>> table = ibis.table(dict(string_col='string'), name="t")
>>> table.string_col.isin(['foo', 'bar', 'baz'])
r0 := UnboundTable: t
  string_col string
Contains(string_col): Contains(...)

Check whether a column's values are contained in another table's column

>>> table2 = ibis.table(dict(other_string_col='string'), name="t2")
>>> table.string_col.isin(table2.other_string_col)
r0 := UnboundTable: t
  string_col string
r1 := UnboundTable: t2
  other_string_col string
Contains(string_col, other_string_col): Contains(...)

isnull()

Return whether this expression is NULL.

least(*args)

Compute the smallest value among the supplied arguments.

Parameters:

Name Type Description Default
args ir.Value

Arguments to choose from

()

Returns:

Type Description
Value

Minimum of the passed arguments

name(name)

Rename an expression to name.

Parameters:

Name Type Description Default
name

The new name of the expression

required

Returns:

Type Description
Value

self with name name

Examples:

>>> import ibis
>>> t = ibis.table(dict(a="int64"), name="t")
>>> t.a.name("b")
r0 := UnboundTable: t
  a int64
b: r0.a

notin(values)

Check whether this expression's values are not in values.

Parameters:

Name Type Description Default
values Value | Sequence[Value]

Values or expression to check for lack of membership

required

Returns:

Type Description
BooleanValue

Whether self's values are not contained in values

notnull()

Return whether this expression is not NULL.

nullif(null_if_expr)

Set values to null if they equal the values null_if_expr.

Commonly use to avoid divide-by-zero problems by replacing zero with NULL in the divisor.

Parameters:

Name Type Description Default
null_if_expr Value

Expression indicating what values should be NULL

required

Returns:

Type Description
Value

Value expression

over(window=None, *, rows=None, range=None, group_by=None, order_by=None)

Construct a window expression.

Parameters:

Name Type Description Default
window

Window specification

None
rows

Whether to use the ROWS window clause

None
range

Whether to use the RANGE window clause

None
group_by

Grouping key

None
order_by

Ordering key

None

Returns:

Type Description
Value

A window function expression

substitute(value, replacement=None, else_=None)

Replace values given in values with replacement.

This is similar to the pandas replace method.

Parameters:

Name Type Description Default
value Value | dict

Expression or dict.

required
replacement Value | None

If an expression is passed to value, this must be passed.

None
else_ Value | None

If an original value does not match value, then else_ is used. The default of None means leave the original value unchanged.

None

Returns:

Type Description
Value

Replaced values

to_pandas(**kwargs)

Convert a column expression to a pandas Series or scalar object.

Parameters:

Name Type Description Default
kwargs

Same as keyword arguments to execute

{}

type()

Return the [DataType] of this expression.

typeof()

Return the data type of the expression.

The values of the returned strings are necessarily backend dependent.

Returns:

Type Description
StringValue

A string indicating the type of the value

Column

Bases: Value, _FixedTextJupyterMixin

Functions

approx_median(where=None)

Return an approximate of the median of self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Filter in values when where is True

None

Returns:

Type Description
Scalar

An approximation of the median of self

approx_nunique(where=None)

Return the approximate number of distinct elements in self.

The result may or may not be exact

Whether the result is an approximation depends on the backend.

Do not depend on the results being exact

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Filter in values when where is True

None

Returns:

Type Description
Scalar

An approximate count of the distinct elements of self

arbitrary(where=None, how='first')

Select an arbitrary value in a column.

Parameters:

Name Type Description Default
where ir.BooleanValue | None

A filter expression

None
how Literal['first', 'last', 'heavy']

The method to use for selecting the element.

  • "first": Select the first non-NULL element
  • "last": Select the last non-NULL element
  • "heavy": Select a frequently occurring value using the heavy hitters algorithm. "heavy" is only supported by Clickhouse backend.
'first'

Returns:

Type Description
Scalar

An expression

argmax(key, where=None)

Return the value of self that maximizes key.

argmin(key, where=None)

Return the value of self that minimizes key.

count(where=None)

Compute the number of rows in an expression.

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Filter expression

None

Returns:

Type Description
IntegerScalar

Number of elements in an expression

cume_dist()

Return the cumulative distribution over a window.

cummax()

Return the cumulative max over a window.

cummin()

Return the cumulative min over a window.

dense_rank()

Position of first element within each group of equal values.

Values are returned in sorted order and duplicate values are ignored.

Equivalent to SQL's DENSE_RANK().

Examples:

values ranks 1 0 1 0 2 1 2 1 2 1 3 2

Returns:

Type Description
IntegerColumn

The rank

first()

Return the first value of a column.

Equivalent to SQL's FIRST_VALUE window function.

lag(offset=None, default=None)

Return the row located at offset rows before the current row.

Parameters:

Name Type Description Default
offset int | ir.IntegerValue | None

Index of row to select

None
default Value | None

Value used if no row exists at offset

None

last()

Return the last value of a column.

Equivalent to SQL's LAST_VALUE window function.

lead(offset=None, default=None)

Return the row located at offset rows after the current row.

Parameters:

Name Type Description Default
offset int | ir.IntegerValue | None

Index of row to select

None
default Value | None

Value used if no row exists at offset

None

max(where=None)

Return the maximum of a column.

min(where=None)

Return the minimum of a column.

mode(where=None)

Return the mode of a column.

nth(n)

Return the nth value over a window.

Parameters:

Name Type Description Default
n int | ir.IntegerValue

Desired rank value

required

Returns:

Type Description
Column

The nth value over a window

ntile(buckets)

Return the integer number of a partitioning of the column values.

Parameters:

Name Type Description Default
buckets int | ir.IntegerValue

Number of buckets to partition into

required

nunique(where=None)

Compute the number of distinct rows in an expression.

Parameters:

Name Type Description Default
where ir.BooleanValue | None

Filter expression

None

Returns:

Type Description
IntegerScalar

Number of distinct elements in an expression

percent_rank()

Return the relative rank of the values in the column.

rank()

Compute position of first element within each equal-value group in sorted order.

Equivalent to SQL's RANK() window function.

Examples:

values ranks 1 0 1 0 2 2 2 2 2 2 3 5

Returns:

Type Description
Int64Column

The min rank

summary(exact_nunique=False, prefix='', suffix='')

Compute a set of summary metrics.

Parameters:

Name Type Description Default
exact_nunique bool

Compute the exact number of distinct values. Typically slower if True.

False
prefix str

String prefix for metric names

''
suffix str

String suffix for metric names

''

Returns:

Type Description
list[NumericScalar]

Metrics list

topk(k, by=None)

Return a "top k" expression.

Parameters:

Name Type Description Default
k int

Return this number of rows

required
by ir.Value | None

An expression. Defaults to count.

None

Returns:

Type Description
TableExpr

A top-k expression

value_counts()

Compute a frequency table.

Returns:

Type Description
Table

Frequency table expression

Examples:

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars": char} for char in "aabcddd")
>>> t
┏━━━━━━━━┓
┃ chars  ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a      │
│ a      │
│ b      │
│ c      │
│ d      │
│ d      │
│ d      │
└────────┘
>>> t.chars.value_counts()
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ chars  ┃ chars_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ int64       │
├────────┼─────────────┤
│ a      │           2 │
│ b      │           1 │
│ c      │           1 │
│ d      │           3 │
└────────┴─────────────┘

Scalar

Bases: Value

Functions

as_table()

Promote the scalar expression to a table.

Returns:

Type Description
Table

A table expression

Examples:

Promote an aggregation to a table

>>> import ibis
>>> import ibis.expr.types as ir
>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().sum().name("len").as_table()
>>> isinstance(expr, ir.Table)
True

Promote a literal value to a table

>>> import ibis.expr.types as ir
>>> lit = ibis.literal(1).name("a").as_table()
>>> isinstance(lit, ir.Table)
True

Last update: August 5, 2022