>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 3, None]})
>>> t.order_by(t.a.asc())
┏━━━━━━━━━┓ ┃ a ┃ ┡━━━━━━━━━┩ │ float64 │ ├─────────┤ │ 1.0 │ │ 2.0 │ │ 3.0 │ │ NULL │ └─────────┘
Scalars and columns of any element type.
Base class for a data generating expression having a known type.
Name | Description |
---|---|
asc | Sort an expression ascending. |
between | Check if this expression is between lower and upper , inclusive. |
case | DEPRECATED: use value.cases() or ibis.cases() instead. |
cases | Create a multi-branch if-else expression. |
cast | Cast expression to indicated data type. |
coalesce | Return the first non-null value from args . |
collect | Aggregate this expression’s elements into an array. |
desc | Sort an expression descending. |
fill_null | Replace any null values with the indicated fill value. |
fillna | DEPRECATED: use fill_null instead. |
group_concat | Concatenate values using the indicated separator to produce a string. |
hash | Compute an integer hash value. |
identical_to | Return whether this expression is identical to other. |
isin | Check whether this expression’s values are in values . |
isnull | Return whether this expression is NULL. |
name | Rename an expression to name . |
notin | Check whether this expression’s values are not in values . |
notnull | Return whether this expression is not NULL. |
nullif | Set values to null if they equal the values null_if_expr . |
over | Construct a window expression. |
substitute | Replace values given in values with replacement . |
to_pandas | Convert an expression to a pandas or scalar object. |
try_cast | Try cast expression to indicated data type. |
type | Return the DataType of self . |
typeof | Return the string name of the datatype of self. |
Sort an expression ascending.
Name | Type | Description | Default |
---|---|---|---|
nulls_first | bool | Whether to sort NULL values first |
False |
Name | Type | Description |
---|---|---|
Value | Sorted expression |
Check if this expression is between lower
and upper
, inclusive.
Name | Type | Description | Default |
---|---|---|---|
lower | Value | Lower bound, inclusive | required |
upper | Value | Upper bound, inclusive | required |
Name | Type | Description |
---|---|---|
BooleanValue | Expression indicating membership in the provided range |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5)
>>> t.bill_length_mm.between(35, 38)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Between(bill_length_mm, 35, 38) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ boolean │ ├─────────────────────────────────┤ │ False │ │ False │ │ False │ │ NULL │ │ True │ └─────────────────────────────────┘
DEPRECATED: use value.cases()
or ibis.cases()
instead.
Create a multi-branch if-else expression.
Equivalent to a SQL CASE
statement.
Name | Type | Description | Default |
---|---|---|---|
branch | tuple[Value, Value] | First (condition , result ) pair. Required. |
required |
branches | tuple[Value, Value] | Additional (condition , result ) pairs. We look through the test values in order and return the result corresponding to the first test value that matches self . If none match, we return else_ . |
() |
else_ | Value | None | Value to return if none of the case conditions evaluate to True . Defaults to NULL . |
None |
Name | Type | Description |
---|---|---|
Value | A value expression |
Note that we never hit the None
case, because x = NULL
is always NULL
, which is not truthy. If you want to replace NULL
s, you should use .fill_null(some_value)
prior to cases()
.
>>> t.mutate(
... result=(
... t.symbol.cases(
... ("+", t.left + t.right),
... ("-", t.left - t.right),
... ("*", t.left * t.right),
... ("/", t.left / t.right),
... (None, -999),
... )
... )
... )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━┓ ┃ left ┃ symbol ┃ right ┃ result ┃ ┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━┩ │ int64 │ string │ int64 │ float64 │ ├───────┼────────┼───────┼─────────┤ │ 5 │ + │ 1 │ 6.0 │ │ 6 │ - │ 2 │ 4.0 │ │ 7 │ * │ 3 │ 21.0 │ │ 8 │ / │ 4 │ 2.0 │ │ 9 │ bogus │ 5 │ NULL │ │ 10 │ NULL │ 6 │ NULL │ └───────┴────────┴───────┴─────────┘
Cast expression to indicated data type.
Similar to pandas.Series.astype
.
Name | Type | Description | Default |
---|---|---|---|
target_type | Any | Type to cast to. Anything accepted by ibis.dtype() |
required |
Name | Type | Description |
---|---|---|
Value | Casted expression |
>>> import ibis
>>> ibis.options.interactive = True
>>> x = ibis.examples.penguins.fetch()["bill_depth_mm"]
>>> x
┏━━━━━━━━━━━━━━━┓ ┃ bill_depth_mm ┃ ┡━━━━━━━━━━━━━━━┩ │ float64 │ ├───────────────┤ │ 18.7 │ │ 17.4 │ │ 18.0 │ │ NULL │ │ 19.3 │ │ 20.6 │ │ 17.8 │ │ 19.6 │ │ 18.1 │ │ 20.2 │ │ … │ └───────────────┘
python’s built-in types can be used
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Cast(bill_depth_mm, int64) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ int64 │ ├────────────────────────────┤ │ 19 │ │ 17 │ │ 18 │ │ NULL │ │ 19 │ │ 21 │ │ 18 │ │ 20 │ │ 18 │ │ 20 │ │ … │ └────────────────────────────┘
or string names
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Cast(bill_depth_mm, uint16) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ uint16 │ ├─────────────────────────────┤ │ 19 │ │ 17 │ │ 18 │ │ NULL │ │ 19 │ │ 21 │ │ 18 │ │ 20 │ │ 18 │ │ 20 │ │ … │ └─────────────────────────────┘
If you make an illegal cast, you won’t know until the backend actually executes it. Consider .try_cast()
.
--------------------------------------------------------------------------- ConversionException Traceback (most recent call last) File /nix/store/jxkxg5dl75hkhka41qg06w3jz3p95a6v-ibis-3.11/lib/python3.11/site-packages/IPython/core/formatters.py:770, in PlainTextFormatter.__call__(self, obj) 763 stream = StringIO() 764 printer = pretty.RepresentationPrinter(stream, self.verbose, 765 self.max_width, self.newline, 766 max_seq_length=self.max_seq_length, 767 singleton_pprinters=self.singleton_printers, 768 type_pprinters=self.type_printers, 769 deferred_pprinters=self.deferred_printers) --> 770 printer.pretty(obj) 771 printer.flush() 772 return stream.getvalue() File /nix/store/jxkxg5dl75hkhka41qg06w3jz3p95a6v-ibis-3.11/lib/python3.11/site-packages/IPython/lib/pretty.py:419, in RepresentationPrinter.pretty(self, obj) 408 return meth(obj, self, cycle) 409 if ( 410 cls is not object 411 # check if cls defines __repr__ (...) 417 and callable(_safe_getattr(cls, "__repr__", None)) 418 ): --> 419 return _repr_pprint(obj, self, cycle) 421 return _default_pprint(obj, self, cycle) 422 finally: File /nix/store/jxkxg5dl75hkhka41qg06w3jz3p95a6v-ibis-3.11/lib/python3.11/site-packages/IPython/lib/pretty.py:794, in _repr_pprint(obj, p, cycle) 792 """A pprint that just redirects to the normal repr function.""" 793 # Find newlines and replace them with p.break_() --> 794 output = repr(obj) 795 lines = output.splitlines() 796 with p.group(): File ~/work/ibis/ibis/ibis/expr/types/core.py:83, in Expr.__repr__(self) 81 def __repr__(self) -> str: 82 if ibis.options.interactive: ---> 83 return _capture_rich_renderable(self) 84 else: 85 return self._noninteractive_repr() File ~/work/ibis/ibis/ibis/expr/types/core.py:63, in _capture_rich_renderable(renderable) 61 console = Console(force_terminal=False) 62 with console.capture() as capture: ---> 63 console.print(renderable) 64 return capture.get().rstrip() File /nix/store/jxkxg5dl75hkhka41qg06w3jz3p95a6v-ibis-3.11/lib/python3.11/site-packages/rich/console.py:1705, in Console.print(self, sep, end, style, justify, overflow, no_wrap, emoji, markup, highlight, width, height, crop, soft_wrap, new_line_start, *objects) 1703 if style is None: 1704 for renderable in renderables: -> 1705 extend(render(renderable, render_options)) 1706 else: 1707 for renderable in renderables: File /nix/store/jxkxg5dl75hkhka41qg06w3jz3p95a6v-ibis-3.11/lib/python3.11/site-packages/rich/console.py:1306, in Console.render(self, renderable, options) 1304 renderable = rich_cast(renderable) 1305 if hasattr(renderable, "__rich_console__") and not isclass(renderable): -> 1306 render_iterable = renderable.__rich_console__(self, _options) 1307 elif isinstance(renderable, str): 1308 text_renderable = self.render_str( 1309 renderable, highlight=_options.highlight, markup=_options.markup 1310 ) File ~/work/ibis/ibis/ibis/expr/types/core.py:106, in Expr.__rich_console__(self, console, options) 103 if opts.interactive: 104 from ibis.expr.types.pretty import to_rich --> 106 rich_object = to_rich(self, console_width=console_width) 107 else: 108 rich_object = Text(self._noninteractive_repr()) File ~/work/ibis/ibis/ibis/expr/types/pretty.py:275, in to_rich(expr, max_rows, max_columns, max_length, max_string, max_depth, console_width) 272 from ibis.expr.types import Scalar 274 if isinstance(expr, Scalar): --> 275 return _to_rich_scalar( 276 expr, max_length=max_length, max_string=max_string, max_depth=max_depth 277 ) 278 else: 279 return _to_rich_table( 280 expr, 281 max_rows=max_rows, (...) 286 console_width=console_width, 287 ) File ~/work/ibis/ibis/ibis/expr/types/pretty.py:299, in _to_rich_scalar(expr, max_length, max_string, max_depth) 290 def _to_rich_scalar( 291 expr: Scalar, 292 *, (...) 295 max_depth: int | None = None, 296 ) -> Pretty: 297 value = format_values( 298 expr.type(), --> 299 [expr.to_pyarrow().as_py()], 300 max_length=max_length or ibis.options.repr.interactive.max_length, 301 max_string=max_string or ibis.options.repr.interactive.max_string, 302 max_depth=max_depth or ibis.options.repr.interactive.max_depth, 303 )[0] 304 return Panel(value, expand=False, box=box.SQUARE) File ~/work/ibis/ibis/ibis/expr/types/core.py:577, in Expr.to_pyarrow(self, params, limit, **kwargs) 549 @experimental 550 def to_pyarrow( 551 self, (...) 555 **kwargs: Any, 556 ) -> pa.Table: 557 """Execute expression and return results in as a pyarrow table. 558 559 This method is eager and will execute the associated expression (...) 575 A pyarrow table holding the results of the executed expression. 576 """ --> 577 return self._find_backend(use_default=True).to_pyarrow( 578 self, params=params, limit=limit, **kwargs 579 ) File ~/work/ibis/ibis/ibis/backends/duckdb/__init__.py:1304, in Backend.to_pyarrow(self, expr, params, limit, **_) 1296 def to_pyarrow( 1297 self, 1298 expr: ir.Expr, (...) 1302 **_: Any, 1303 ) -> pa.Table: -> 1304 table = self._to_duckdb_relation(expr, params=params, limit=limit).arrow() 1305 return expr.__pyarrow_result__(table) ConversionException: Conversion Error: Could not convert string 'a string' to INT64 LINE 1: SELECT CAST('a string' AS BIGINT) AS "Cast('a ... ^
Return the first non-null value from args
.
Name | Type | Description | Default |
---|---|---|---|
args | Value | Arguments from which to choose the first non-null value | () |
Name | Type | Description |
---|---|---|
Value | Coalesced expression |
Aggregate this expression’s elements into an array.
This function is called array_agg
, list_agg
, or list
in other systems.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the order of the items in the result is undefined and backend specific. | None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
distinct | bool | Whether to collect only distinct elements. | False |
Name | Type | Description |
---|---|---|
ArrayScalar |
An array of all the collected elements. |
Basic collect usage
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"key": list("aaabb"), "value": [1, 1, 2, 3, 5]})
>>> t
┏━━━━━━━━┳━━━━━━━┓ ┃ key ┃ value ┃ ┡━━━━━━━━╇━━━━━━━┩ │ string │ int64 │ ├────────┼───────┤ │ a │ 1 │ │ a │ 1 │ │ a │ 2 │ │ b │ 3 │ │ b │ 5 │ └────────┴───────┘
Collect all elements into an array scalar:
Collect only unique elements:
Collect elements in a specified order:
Collect elements per group, filtering out values <= 1:
Sort an expression descending.
Name | Type | Description | Default |
---|---|---|---|
nulls_first | bool | Whether to sort NULL values first. |
False |
Name | Type | Description |
---|---|---|
Value | Sorted expression |
Replace any null values with the indicated fill value.
Name | Type | Description | Default |
---|---|---|---|
fill_value | Scalar | Value with which to replace NULL values in self |
required |
Name | Type | Description |
---|---|---|
Value | self filled with fill_value where it is NULL |
DEPRECATED: use fill_null
instead.
Concatenate values using the indicated separator to produce a string.
Name | Type | Description | Default |
---|---|---|---|
sep | str | The separator to use to join strings. | ',' |
where | ir .BooleanValue | None |
An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the order of the items in the result is undefined and backend specific. | None |
Name | Type | Description |
---|---|---|
StringScalar |
Concatenated string expression |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5)
>>> t[["bill_length_mm", "bill_depth_mm"]]
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ bill_length_mm ┃ bill_depth_mm ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ float64 │ float64 │ ├────────────────┼───────────────┤ │ 39.1 │ 18.7 │ │ 39.5 │ 17.4 │ │ 40.3 │ 18.0 │ │ NULL │ NULL │ │ 36.7 │ 19.3 │ └────────────────┴───────────────┘
┌─────────────────────┐
│ 39.1,39.5,40.3,36.7 │
└─────────────────────┘
┌────────────────────────┐
│ 39.1: 39.5: 40.3: 36.7 │
└────────────────────────┘
Compute an integer hash value.
Name | Type | Description |
---|---|---|
IntegerValue | The hash value of self |
Return whether this expression is identical to other.
Corresponds to IS NOT DISTINCT FROM
in SQL.
Name | Type | Description | Default |
---|---|---|---|
other | Value | Expression to compare to | required |
Name | Type | Description |
---|---|---|
BooleanValue | Whether this expression is not distinct from other |
Check whether this expression’s values are in values
.
NULL
values are propagated in the output. See examples for details.
Name | Type | Description | Default |
---|---|---|---|
values | Value | Sequence[Value] | Values or expression to check for membership | required |
Name | Type | Description |
---|---|---|
BooleanValue | Expression indicating membership |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 3], "b": [2, 3, 4]})
>>> t
┏━━━━━━━┳━━━━━━━┓ ┃ a ┃ b ┃ ┡━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├───────┼───────┤ │ 1 │ 2 │ │ 2 │ 3 │ │ 3 │ 4 │ └───────┴───────┘
Check against a literal sequence of values
┏━━━━━━━━━━━━━━━━━━━━━┓ ┃ InValues(a, (1, 2)) ┃ ┡━━━━━━━━━━━━━━━━━━━━━┩ │ boolean │ ├─────────────────────┤ │ True │ │ True │ │ False │ └─────────────────────┘
Check against a derived expression
┏━━━━━━━━━━━━━━━┓ ┃ InSubquery(a) ┃ ┡━━━━━━━━━━━━━━━┩ │ boolean │ ├───────────────┤ │ False │ │ False │ │ True │ └───────────────┘
Check against a column from a different table
┏━━━━━━━━━━━━━━━┓ ┃ InSubquery(a) ┃ ┡━━━━━━━━━━━━━━━┩ │ boolean │ ├───────────────┤ │ False │ │ True │ │ False │ └───────────────┘
NULL
behavior
┏━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ InValues(x, (1, None)) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━┩ │ boolean │ ├────────────────────────┤ │ True │ │ NULL │ └────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━┓ ┃ InValues(x, (1,)) ┃ ┡━━━━━━━━━━━━━━━━━━━┩ │ boolean │ ├───────────────────┤ │ True │ │ NULL │ │ False │ └───────────────────┘
Return whether this expression is NULL.
Rename an expression to name
.
Name | Type | Description | Default |
---|---|---|---|
name | The new name of the expression | required |
Name | Type | Description |
---|---|---|
Value | self with name name |
Check whether this expression’s values are not in values
.
Opposite of Value.isin()
.
Name | Type | Description | Default |
---|---|---|---|
values | Value | Sequence[Value] | Values or expression to check for lack of membership | required |
Name | Type | Description |
---|---|---|
BooleanValue | Whether self ’s values are not contained in values |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5)
>>> t.bill_depth_mm
┏━━━━━━━━━━━━━━━┓ ┃ bill_depth_mm ┃ ┡━━━━━━━━━━━━━━━┩ │ float64 │ ├───────────────┤ │ 18.7 │ │ 17.4 │ │ 18.0 │ │ NULL │ │ 19.3 │ └───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Not(InValues(bill_depth_mm, (18.7, 18.1))) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ boolean │ ├────────────────────────────────────────────┤ │ False │ │ True │ │ True │ │ NULL │ │ True │ └────────────────────────────────────────────┘
Return whether this expression is not NULL.
Set values to null if they equal the values null_if_expr
.
Commonly used to avoid divide-by-zero problems by replacing zero with NULL
in the divisor.
Equivalent to (self == null_if_expr).ifelse(ibis.null(), self)
.
Name | Type | Description | Default |
---|---|---|---|
null_if_expr | Value | Expression indicating what values should be NULL | required |
Name | Type | Description |
---|---|---|
Value | Value expression |
Construct a window expression.
Name | Type | Description | Default |
---|---|---|---|
window | Window specification | None |
|
rows | Whether to use the ROWS window clause |
None |
|
range | Whether to use the RANGE window clause |
None |
|
group_by | Grouping key | None |
|
order_by | Ordering key | None |
Name | Type | Description |
---|---|---|
Value | A window function expression |
Replace values given in values
with replacement
.
This is similar to the pandas replace
method.
Name | Type | Description | Default |
---|---|---|---|
value | Value | dict | Expression or dict. | required |
replacement | Value | None | If an expression is passed to value, this must be passed. | None |
else_ | Value | None | If an original value does not match value , then else_ is used. The default of None means leave the original value unchanged. |
None |
Name | Type | Description |
---|---|---|
Value | Replaced values | |
[Value.cases() ](./expression-generic.qmd#ibis.expr.types.generic.Value.case) |
||
[ibis.cases() ](./expression-generic.qmd#ibis.cases) |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.island.value_counts().order_by("island")
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ island ┃ island_count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━┩ │ string │ int64 │ ├───────────┼──────────────┤ │ Biscoe │ 168 │ │ Dream │ 124 │ │ Torgersen │ 52 │ └───────────┴──────────────┘
>>> t.island.substitute({"Torgersen": "torg", "Biscoe": "bisc"}).name(
... "island"
... ).value_counts().order_by("island")
┏━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ island ┃ island_count ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━━┩ │ string │ int64 │ ├────────┼──────────────┤ │ Dream │ 124 │ │ bisc │ 168 │ │ torg │ 52 │ └────────┴──────────────┘
Convert an expression to a pandas or scalar object.
Name | Type | Description | Default |
---|---|---|---|
kwargs | Same as keyword arguments to execute |
{} |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.to_pandas(limit=5)
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year | |
---|---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | male | 2007 |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | female | 2007 |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | female | 2007 |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | None | 2007 |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | female | 2007 |
Try cast expression to indicated data type.
If the cast fails for a row, the value is returned as null or NaN depending on target_type and backend behavior.
Name | Type | Description | Default |
---|---|---|---|
target_type | Any | Type to try cast to. Anything accepted by ibis.dtype() |
required |
Name | Type | Description |
---|---|---|
Value | Casted expression |
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"numbers": [1, 2, 3, 4], "strings": ["1.0", "2", "hello", "world"]})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓ ┃ numbers ┃ strings ┃ ┡━━━━━━━━━╇━━━━━━━━━┩ │ int64 │ string │ ├─────────┼─────────┤ │ 1 │ 1.0 │ │ 2 │ 2 │ │ 3 │ hello │ │ 4 │ world │ └─────────┴─────────┘
>>> t = t.mutate(numbers_to_strings=_.numbers.try_cast("string"))
>>> t = t.mutate(strings_to_numbers=_.strings.try_cast("int"))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ numbers ┃ strings ┃ numbers_to_strings ┃ strings_to_numbers ┃ ┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩ │ int64 │ string │ string │ int64 │ ├─────────┼─────────┼────────────────────┼────────────────────┤ │ 1 │ 1.0 │ 1 │ 1 │ │ 2 │ 2 │ 2 │ 2 │ │ 3 │ hello │ 3 │ NULL │ │ 4 │ world │ 4 │ NULL │ └─────────┴─────────┴────────────────────┴────────────────────┘
Return the DataType of self
.
Return the string name of the datatype of self.
The values of the returned strings are necessarily backend dependent. e.g. duckdb may say “DOUBLE”, while sqlite may say “real”.
Name | Type | Description |
---|---|---|
StringValue | A string indicating the type of the value |
>>> import ibis
>>> ibis.options.interactive = True
>>> vals = ibis.examples.penguins.fetch().head(5).bill_length_mm
>>> vals
┏━━━━━━━━━━━━━━━━┓ ┃ bill_length_mm ┃ ┡━━━━━━━━━━━━━━━━┩ │ float64 │ ├────────────────┤ │ 39.1 │ │ 39.5 │ │ 40.3 │ │ NULL │ │ 36.7 │ └────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ TypeOf(bill_length_mm) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━┩ │ string │ ├────────────────────────┤ │ DOUBLE │ │ DOUBLE │ │ DOUBLE │ │ DOUBLE │ │ DOUBLE │ └────────────────────────┘
Different backends have different names for their native types
Name | Description |
---|---|
approx_median | Return an approximate of the median of self . |
approx_nunique | Return the approximate number of distinct elements in self . |
arbitrary | Select an arbitrary value in a column. |
argmax | Return the value of self that maximizes key . |
argmin | Return the value of self that minimizes key . |
as_scalar | Inform ibis that the expression should be treated as a scalar. |
as_table | Promote the expression to a Table. |
count | Compute the number of rows in an expression. |
cume_dist | Return the cumulative distribution over a window. |
cummax | Return the cumulative max over a window. |
cummin | Return the cumulative min over a window. |
dense_rank | Position of first element within each group of equal values. |
first | Return the first value of a column. |
lag | Return the row located at offset rows before the current row. |
last | Return the last value of a column. |
lead | Return the row located at offset rows after the current row. |
max | Return the maximum of a column. |
median | Return the median of the column. |
min | Return the minimum of a column. |
mode | Return the mode of a column. |
nth | Return the n th value (0-indexed) over a window. |
ntile | Return the integer number of a partitioning of the column values. |
nunique | Compute the number of distinct rows in an expression. |
percent_rank | Return the relative rank of the values in the column. |
preview | Print a subset as a single-column Rich Table. |
quantile | Return value at the given quantile. |
rank | Compute position of first element within each equal-value group in sorted order. |
to_list | Convert a column expression to a list. |
topk | Return a “top k” expression. |
value_counts | Compute a frequency table. |
Return an approximate of the median of self
.
Whether the result is an approximation depends on the backend.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | An approximation of the median of self |
Return the approximate number of distinct elements in self
.
Whether the result is an approximation depends on the backend.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | An approximate count of the distinct elements of self |
Select an arbitrary value in a column.
Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
A filter expression | None |
Name | Type | Description |
---|---|---|
Scalar | An expression |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 2, 2], "b": list("aaa"), "c": [4.0, 4.1, 4.2]})
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓ ┃ a ┃ b ┃ c ┃ ┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩ │ int64 │ string │ float64 │ ├───────┼────────┼─────────┤ │ 1 │ a │ 4.0 │ │ 2 │ a │ 4.1 │ │ 2 │ a │ 4.2 │ └───────┴────────┴─────────┘
Return the value of self
that maximizes key
.
If more than one value maximizes key
, the returned value is backend specific. The result may be NULL
.
Name | Type | Description | Default |
---|---|---|---|
key | ir .Value |
Key to use for max computation. |
required |
where | ir .BooleanValue | None |
Keep values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The value of self that maximizes key |
Return the value of self
that minimizes key
.
If more than one value minimizes key
, the returned value is backend specific. The result may be NULL
.
Name | Type | Description | Default |
---|---|---|---|
key | ir .Value |
Key to use for min computation. |
required |
where | ir .BooleanValue | None |
Keep values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The value of self that minimizes key |
Inform ibis that the expression should be treated as a scalar.
Creates a scalar subquery from the column expression. Since ibis cannot be sure that the column expression contains only one value, the column expression is wrapped in a scalar subquery and treated as a scalar.
Note that the execution of the scalar subquery will fail if the column expression contains more than one value.
Name | Type | Description |
---|---|---|
Scalar | A scalar subquery |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ species ┃ species_count ┃ ┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ string │ int64 │ ├─────────┼───────────────┤ │ Adelie │ 44 │ │ Gentoo │ 124 │ └─────────┴───────────────┘
Promote the expression to a Table.
Name | Type | Description |
---|---|---|
Table |
A table expression |
Compute the number of rows in an expression.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter expression | None |
Name | Type | Description |
---|---|---|
IntegerScalar |
Number of elements in an expression |
Return the cumulative distribution over a window.
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(cume_dist=t.values.cume_dist())
┏━━━━━━━━┳━━━━━━━━━━━┓ ┃ values ┃ cume_dist ┃ ┡━━━━━━━━╇━━━━━━━━━━━┩ │ int64 │ float64 │ ├────────┼───────────┤ │ 1 │ 0.333333 │ │ 1 │ 0.333333 │ │ 2 │ 0.833333 │ │ 2 │ 0.833333 │ │ 2 │ 0.833333 │ │ 3 │ 1.000000 │ └────────┴───────────┘
Return the cumulative max over a window.
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "id": [1, 2, 3, 4, 5, 6],
... "grouper": ["a", "a", "a", "b", "b", "c"],
... "values": [3, 2, 1, 2, 3, 2],
... }
... )
>>> t.mutate(cummax=t.values.cummax())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ id ┃ grouper ┃ values ┃ cummax ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ int64 │ int64 │ ├───────┼─────────┼────────┼────────┤ │ 1 │ a │ 3 │ 3 │ │ 2 │ a │ 2 │ 3 │ │ 3 │ a │ 1 │ 3 │ │ 4 │ b │ 2 │ 3 │ │ 5 │ b │ 3 │ 3 │ │ 6 │ c │ 2 │ 3 │ └───────┴─────────┴────────┴────────┘
>>> t.mutate(cummax=t.values.cummax(where=t.grouper != "c", group_by=t.grouper)).order_by(
... t.id
... )
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ id ┃ grouper ┃ values ┃ cummax ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ int64 │ int64 │ ├───────┼─────────┼────────┼────────┤ │ 1 │ a │ 3 │ 3 │ │ 2 │ a │ 2 │ 3 │ │ 3 │ a │ 1 │ 3 │ │ 4 │ b │ 2 │ 2 │ │ 5 │ b │ 3 │ 3 │ │ 6 │ c │ 2 │ NULL │ └───────┴─────────┴────────┴────────┘
Return the cumulative min over a window.
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "id": [1, 2, 3, 4, 5, 6],
... "grouper": ["a", "a", "a", "b", "b", "c"],
... "values": [3, 2, 1, 2, 3, 2],
... }
... )
>>> t.mutate(cummin=t.values.cummin())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ id ┃ grouper ┃ values ┃ cummin ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ int64 │ int64 │ ├───────┼─────────┼────────┼────────┤ │ 1 │ a │ 3 │ 3 │ │ 2 │ a │ 2 │ 2 │ │ 3 │ a │ 1 │ 1 │ │ 4 │ b │ 2 │ 1 │ │ 5 │ b │ 3 │ 1 │ │ 6 │ c │ 2 │ 1 │ └───────┴─────────┴────────┴────────┘
>>> t.mutate(cummin=t.values.cummin(where=t.grouper != "c", group_by=t.grouper)).order_by(
... t.id
... )
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ id ┃ grouper ┃ values ┃ cummin ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ int64 │ int64 │ ├───────┼─────────┼────────┼────────┤ │ 1 │ a │ 3 │ 3 │ │ 2 │ a │ 2 │ 2 │ │ 3 │ a │ 1 │ 1 │ │ 4 │ b │ 2 │ 2 │ │ 5 │ b │ 3 │ 2 │ │ 6 │ c │ 2 │ NULL │ └───────┴─────────┴────────┴────────┘
Position of first element within each group of equal values.
Values are returned in sorted order and duplicate values are ignored.
Equivalent to SQL’s DENSE_RANK()
.
Name | Type | Description |
---|---|---|
IntegerColumn | The rank |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.dense_rank())
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ rank ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 0 │ │ 1 │ 0 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 3 │ 2 │ └────────┴───────┘
Return the first value of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of first is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Return the row located at offset
rows before the current row.
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir .IntegerValue | None |
Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {"year": [2007, 2008, 2009, 2010], "total": [1899.6, 1928.2, 2037.9, 1955.2]}
... )
>>> t.mutate(total_lead=t.total.lag())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ year ┃ total ┃ total_lead ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ float64 │ float64 │ ├───────┼─────────┼────────────┤ │ 2007 │ 1899.6 │ NULL │ │ 2008 │ 1928.2 │ 1899.6 │ │ 2009 │ 2037.9 │ 1928.2 │ │ 2010 │ 1955.2 │ 2037.9 │ └───────┴─────────┴────────────┘
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ year ┃ total ┃ total_lead ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ float64 │ float64 │ ├───────┼─────────┼────────────┤ │ 2007 │ 1899.6 │ 0.0 │ │ 2008 │ 1928.2 │ 0.0 │ │ 2009 │ 2037.9 │ 1899.6 │ │ 2010 │ 1955.2 │ 1928.2 │ └───────┴─────────┴────────────┘
Return the last value of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
An optional filter expression. If provided, only rows where where is True will be included in the aggregate. |
None |
order_by | Any | An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of last is undefined and will be backend specific. |
None |
include_null | bool | Whether to include null values when performing this aggregation. Set to True to include nulls in the result. |
False |
Return the row located at offset
rows after the current row.
Name | Type | Description | Default |
---|---|---|---|
offset | int | ir .IntegerValue | None |
Index of row to select | None |
default | Value | None | Value used if no row exists at offset |
None |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {"year": [2007, 2008, 2009, 2010], "total": [1899.6, 1928.2, 2037.9, 1955.2]}
... )
>>> t.mutate(total_lead=t.total.lead())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ year ┃ total ┃ total_lead ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ float64 │ float64 │ ├───────┼─────────┼────────────┤ │ 2007 │ 1899.6 │ 1928.2 │ │ 2008 │ 1928.2 │ 2037.9 │ │ 2009 │ 2037.9 │ 1955.2 │ │ 2010 │ 1955.2 │ NULL │ └───────┴─────────┴────────────┘
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ year ┃ total ┃ total_lead ┃ ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ float64 │ float64 │ ├───────┼─────────┼────────────┤ │ 2007 │ 1899.6 │ 2037.9 │ │ 2008 │ 1928.2 │ 1955.2 │ │ 2009 │ 2037.9 │ 0.0 │ │ 2010 │ 1955.2 │ 0.0 │ └───────┴─────────┴────────────┘
Return the maximum of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The maximum value in self |
Return the median of the column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Optional boolean expression. If given, only the values where where evaluates to true will be considered for the median. |
None |
Name | Type | Description |
---|---|---|
Scalar | Median of the column |
Compute the median of bill_depth_mm
>>> t.group_by(t.species).agg(median_bill_depth=t.bill_depth_mm.median()).order_by(
... ibis.desc("median_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ species ┃ median_bill_depth ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ string │ float64 │ ├───────────┼───────────────────┤ │ Chinstrap │ 18.45 │ │ Adelie │ 18.40 │ │ Gentoo │ 15.00 │ └───────────┴───────────────────┘
In addition to numeric types, any orderable non-numeric types such as strings and dates work with median
.
>>> t.group_by(t.island).agg(median_species=t.species.median()).order_by(
... ibis.desc("median_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ ┃ island ┃ median_species ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ string │ string │ ├───────────┼────────────────┤ │ Biscoe │ Gentoo │ │ Dream │ Chinstrap │ │ Torgersen │ Adelie │ └───────────┴────────────────┘
Return the minimum of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The minimum value in self |
Return the mode of a column.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter in values when where is True |
None |
Name | Type | Description |
---|---|---|
Scalar | The mode of self |
Return the n
th value (0-indexed) over a window.
.nth(0)
is equivalent to .first()
. Negative will result in NULL
. If the value of n
is greater than the number of rows in the window, NULL
will be returned.
Name | Type | Description | Default |
---|---|---|---|
n | int | ir .IntegerValue |
Desired rank value | required |
Name | Type | Description |
---|---|---|
Column | The nth value over a window |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 3, 4, 5, 6]})
>>> t.mutate(nth=t.values.nth(2))
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ nth ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 3 │ │ 2 │ 3 │ │ 3 │ 3 │ │ 4 │ 3 │ │ 5 │ 3 │ │ 6 │ 3 │ └────────┴───────┘
Return the integer number of a partitioning of the column values.
Name | Type | Description | Default |
---|---|---|---|
buckets | int | ir .IntegerValue |
Number of buckets to partition into | required |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(ntile=t.values.ntile(3))
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ ntile ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 0 │ │ 1 │ 0 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 2 │ 2 │ │ 3 │ 2 │ └────────┴───────┘
Compute the number of distinct rows in an expression.
Name | Type | Description | Default |
---|---|---|---|
where | ir .BooleanValue | None |
Filter expression | None |
Name | Type | Description |
---|---|---|
IntegerScalar |
Number of distinct elements in an expression |
Return the relative rank of the values in the column.
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(percent_rank=t.values.percent_rank())
┏━━━━━━━━┳━━━━━━━━━━━━━━┓ ┃ values ┃ percent_rank ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━━┩ │ int64 │ float64 │ ├────────┼──────────────┤ │ 1 │ 0.0 │ │ 1 │ 0.0 │ │ 2 │ 0.4 │ │ 2 │ 0.4 │ │ 2 │ 0.4 │ │ 3 │ 1.0 │ └────────┴──────────────┘
Print a subset as a single-column Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive
.
Name | Type | Description | Default |
---|---|---|---|
max_rows | int | None | Maximum number of rows to display | None |
max_length | int | None | Maximum length for pretty-printed arrays and maps. | None |
max_string | int | None | Maximum length for pretty-printed strings. | None |
max_depth | int | None | Maximum depth for nested data types. | None |
console_width | int | float | None | Width of the console in characters. If not specified, the width will be inferred from the console. | None |
Return value at the given quantile.
The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.
Name | Type | Description | Default |
---|---|---|---|
quantile | float | ir .NumericValue | Sequence[ir .NumericValue | float] |
0 <= quantile <= 1 , or an array of such values indicating the quantile or quantiles to compute |
required |
where | ir .BooleanValue | None |
Boolean filter for input values | None |
Name | Type | Description |
---|---|---|
Scalar | Quantile of the input |
Compute the 99th percentile of bill_depth
>>> t.group_by(t.species).agg(p99_bill_depth=t.bill_depth_mm.quantile(0.99)).order_by(
... ibis.desc("p99_bill_depth")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ ┃ species ┃ p99_bill_depth ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ string │ float64 │ ├───────────┼────────────────┤ │ Adelie │ 21.200 │ │ Chinstrap │ 20.733 │ │ Gentoo │ 17.256 │ └───────────┴────────────────┘
In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile
.
Let’s compute the 99th percentile of the species
column
>>> t.group_by(t.island).agg(p99_species=t.species.quantile(0.99)).order_by(
... ibis.desc("p99_species")
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ island ┃ p99_species ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ string │ ├───────────┼─────────────┤ │ Biscoe │ Gentoo │ │ Dream │ Chinstrap │ │ Torgersen │ Adelie │ └───────────┴─────────────┘
Compute position of first element within each equal-value group in sorted order.
Equivalent to SQL’s RANK()
window function.
Name | Type | Description |
---|---|---|
Int64Column |
The min rank |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=t.values.rank())
┏━━━━━━━━┳━━━━━━━┓ ┃ values ┃ rank ┃ ┡━━━━━━━━╇━━━━━━━┩ │ int64 │ int64 │ ├────────┼───────┤ │ 1 │ 0 │ │ 1 │ 0 │ │ 2 │ 2 │ │ 2 │ 2 │ │ 2 │ 2 │ │ 3 │ 5 │ └────────┴───────┘
Convert a column expression to a list.
Name | Type | Description | Default |
---|---|---|---|
kwargs | Same as keyword arguments to to_pyarrow |
{} |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.bill_length_mm.to_list(limit=5)
[39.1, 39.5, 40.3, None, 36.7]
Return a “top k” expression.
Computes a Table containing the top k
values by a certain metric (defaults to count).
Added name
parameter.
Name | Type | Description | Default |
---|---|---|---|
k | int | The number of rows to return. | required |
by | ir .Value | None |
The metric to compute “top” by. Defaults to count . |
None |
name | str | None | The name to use for the metric column. A suitable name will be automatically generated if not provided. | None |
Name | Type | Description |
---|---|---|
Table |
The top k values. |
Compute the top 3 diamond colors by frequency:
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ ┃ color ┃ CountStar(diamonds) ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ │ string │ int64 │ ├────────┼─────────────────────┤ │ G │ 11292 │ │ E │ 9797 │ │ F │ 9542 │ └────────┴─────────────────────┘
Compute the top 3 diamond colors by mean price:
┏━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ color ┃ Mean(price) ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ float64 │ ├────────┼─────────────┤ │ J │ 5323.818020 │ │ I │ 5091.874954 │ │ H │ 4486.669196 │ └────────┴─────────────┘
Compute the top 2 diamond colors by max carat:
Compute a frequency table.
Name | Type | Description | Default |
---|---|---|---|
name | str | None | The name to use for the frequency column. A suitable name will be automatically generated if not provided. | None |
Name | Type | Description |
---|---|---|
Table |
The frequency table. |
Compute the count of each unique value in “chars”, ordered by “chars”:
┏━━━━━━━━┳━━━━━━━━━━━━━┓ ┃ chars ┃ chars_count ┃ ┡━━━━━━━━╇━━━━━━━━━━━━━┩ │ string │ int64 │ ├────────┼─────────────┤ │ a │ 2 │ │ b │ 1 │ │ c │ 3 │ │ d │ 4 │ └────────┴─────────────┘
Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”:
┏━━━━━━━━┳━━━━━━━┓ ┃ chars ┃ freq ┃ ┡━━━━━━━━╇━━━━━━━┩ │ string │ int64 │ ├────────┼───────┤ │ b │ 1 │ │ a │ 2 │ │ c │ 3 │ │ d │ 4 │ └────────┴───────┘
The user facing wrapper object providing syntactic sugar for deferreds.
Provides a natural-like syntax for constructing deferred expressions by overloading all of the available dunder methods including the equality operator.
Its sole purpose is to provide a nicer syntax for constructing deferred expressions, thus it gets unwrapped to the underlying deferred expression when used by the rest of the library.
Name | Type | Description | Default |
---|---|---|---|
obj | The deferred object to provide syntax sugar for. | required | |
repr | An optional fixed string to use when repr-ing the deferred expression, instead of the default. This is useful for complex deferred expressions where the arguments don’t necessarily make sense to be user facing in the repr. | None |
Name | Description |
---|---|
as_scalar | Inform ibis that the expression should be treated as a scalar. |
as_table | Promote the scalar expression to a table. |
Inform ibis that the expression should be treated as a scalar.
If the expression is a literal, it will be returned as is. If it depends on a table, it will be turned to a scalar subquery.
Name | Type | Description |
---|---|---|
Scalar | A scalar subquery or a literal |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> max_gentoo_weight = t.filter(t.species == "Gentoo").body_mass_g.max()
>>> light_penguins = t.filter(t.body_mass_g < max_gentoo_weight / 2)
>>> light_penguins.species.value_counts().order_by("species")
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ species ┃ species_count ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ string │ int64 │ ├───────────┼───────────────┤ │ Adelie │ 15 │ │ Chinstrap │ 2 │ └───────────┴───────────────┘
Promote the scalar expression to a table.
Name | Type | Description |
---|---|---|
Table |
A table expression |
Promote an aggregation to a table
>>> import ibis
>>> import ibis.expr.types as ir
>>> t = ibis.table(dict(a="str"), name="t")
>>> expr = t.a.length().sum().name("len").as_table()
>>> isinstance(expr, ir.Table)
True
Promote a literal value to a table
deferred
Deferred expression object.
Use this object to refer to a previous table expression in a chain of expressions.
_
may conflict with other idioms in Python
See https://github.com/ibis-project/ibis/issues/4704 for details.
Use from ibis import deferred as <NAME>
to assign a different name to the deferred object builder.
Another option is to use ibis._
directly.
Create a scalar expression from a Python value.
Ibis supports literal construction of arrays using the following functions:
Constructing these types using literal
will be deprecated in a future release.
Name | Type | Description | Default |
---|---|---|---|
value | Any | A Python value | required |
type | dt .DataType | str | None |
An instance of DataType or a string indicating the ibis type of value . This parameter can be used in cases where ibis’s type inference isn’t sufficient for discovering the type of value . |
None |
Name | Type | Description |
---|---|---|
Scalar | An expression representing a literal value |
Construct an integer literal
Construct a float64
literal from an int
Ibis checks for invalid types
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) File ~/work/ibis/ibis/ibis/expr/datatypes/value.py:268, in normalize(typ, value) 267 try: --> 268 value = int(value) 269 except ValueError: ValueError: invalid literal for int() with base 10: 'foobar' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) Cell In[365], line 1 ----> 1 ibis.literal("foobar", type="int64") File ~/work/ibis/ibis/ibis/common/deferred.py:613, in deferrable.<locals>.wrapper.<locals>.inner(*args, **kwargs) 611 builder = Call(func, *args, **kwargs) 612 return Deferred(builder, repr=repr) --> 613 return func(*args, **kwargs) File ~/work/ibis/ibis/ibis/expr/types/generic.py:2936, in literal(value, type) 2931 raise TypeError( 2932 f"Ibis literal {value!r} cannot be safely coerced to datatype {type}" 2933 ) 2935 dtype = dt.infer(value) if type is None else dt.dtype(type) -> 2936 return ops.Literal(value, dtype=dtype).to_expr() File ~/work/ibis/ibis/ibis/common/bases.py:72, in AbstractMeta.__call__(cls, *args, **kwargs) 52 def __call__(cls, *args, **kwargs): 53 """Create a new instance of the class. 54 55 The subclass may override the `__create__` classmethod to change the (...) 70 71 """ ---> 72 return cls.__create__(*args, **kwargs) File ~/work/ibis/ibis/ibis/common/grounds.py:120, in Annotable.__create__(cls, *args, **kwargs) 116 @classmethod 117 def __create__(cls, *args: Any, **kwargs: Any) -> Self: 118 # construct the instance by passing only validated keyword arguments 119 kwargs = cls.__signature__.validate(cls, args, kwargs) --> 120 return super().__create__(**kwargs) File ~/work/ibis/ibis/ibis/expr/operations/generic.py:147, in Literal.__init__(self, value, dtype) 145 def __init__(self, value, dtype): 146 # normalize ensures that the value is a valid value for the given dtype --> 147 value = dt.normalize(dtype, value) 148 super().__init__(value=value, dtype=dtype) File ~/work/ibis/ibis/ibis/expr/datatypes/value.py:270, in normalize(typ, value) 268 value = int(value) 269 except ValueError: --> 270 raise TypeError(f"Unable to normalize {value!r} to {dtype!r}") 271 if value not in dtype.bounds: 272 raise TypeError( 273 f"Value {value} is out of bounds for type {dtype!r} " 274 f"(bounds: {dtype.bounds})" 275 ) TypeError: Unable to normalize 'foobar' to Int64(nullable=True)
Literals can also be used in a deferred context.
Here’s an example of constructing a table of a column’s type repeated for every row:
>>> from ibis import _, selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.select(s.across(s.all(), ibis.literal(_.type(), type=str).name(_.get_name()))).head(1)
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ string │ string │ string │ string │ string │ string │ string │ string │ ├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼────────┤ │ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ └─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴────────┘
Create a deferred parameter of a given type.
Name | Type | Description | Default |
---|---|---|---|
type | dt .DataType |
The type of the unbound parameter, e.g., double, int64, date, etc. | required |
Name | Type | Description |
---|---|---|
Scalar |
A scalar expression backend by a parameter |
>>> from datetime import date
>>> import ibis
>>> start = ibis.param("date")
>>> t = ibis.memtable(
... {
... "date_col": [date(2013, 1, 1), date(2013, 1, 2), date(2013, 1, 3)],
... "value": [1.0, 2.0, 3.0],
... },
... )
>>> expr = t.filter(t.date_col >= start).value.sum()
>>> expr.execute(params={start: date(2013, 1, 1)})
6.0
Create a NULL scalar.
NULL
s with an unspecified type are castable and comparable to values, but lack datatype-specific methods:
>>> import ibis
>>> ibis.options.interactive = True
>>> ibis.null().upper() # quartodoc: +EXPECTED_FAILURE
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[380], line 3 1 import ibis 2 ibis.options.interactive = True ----> 3 ibis.null().upper() # quartodoc: +EXPECTED_FAILURE AttributeError: 'NullScalar' object has no attribute 'upper'
Generate a range of values.
Integer ranges are supported, as well as timestamp ranges.
start
is inclusive and stop
is exclusive, just like Python’s builtin range
.
When step
equals 0, however, this function will return an empty array.
Python’s range
will raise an exception when step
is zero.
Name | Type | Description | Default |
---|---|---|---|
start | Lower bound of the range, inclusive. | required | |
stop | Upper bound of the range, exclusive. | required | |
step | Step value. Optional, defaults to 1. | required |
Name | Type | Description |
---|---|---|
ArrayValue |
An array of values |
Range using only a stop argument
Simple range using start and stop
Generate an empty range
Negative step values are supported
ibis.range
behaves the same as Python’s range …
… except when the step is zero, in which case ibis.range
returns an empty array
Because the resulting expression is array, you can unnest the values
┏━━━━━━━━━┓ ┃ numbers ┃ ┡━━━━━━━━━┩ │ int8 │ ├─────────┤ │ 0 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ └─────────┘
Timestamp ranges are also supported
┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [datetime.datetime(2002, 1, 1, 0, 0), datetime.datetime(2002, 1, 3, 0, 0), ... +14] │ └─────────────────────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━┓ ┃ ts ┃ ┡━━━━━━━━━━━━━━━━━━━━━┩ │ timestamp │ ├─────────────────────┤ │ 2002-01-01 00:00:00 │ │ 2002-01-03 00:00:00 │ │ 2002-01-05 00:00:00 │ │ 2002-01-07 00:00:00 │ │ 2002-01-09 00:00:00 │ │ 2002-01-11 00:00:00 │ │ 2002-01-13 00:00:00 │ │ 2002-01-15 00:00:00 │ │ 2002-01-17 00:00:00 │ │ 2002-01-19 00:00:00 │ │ … │ └─────────────────────┘
Return the first non-null value from args
.
Name | Type | Description | Default |
---|---|---|---|
args | Any | Arguments from which to choose the first non-null value | () |
Name | Type | Description |
---|---|---|
Value |
Coalesced expression |
Compute the smallest value among the supplied arguments.
Name | Type | Description | Default |
---|---|---|---|
args | Any | Arguments to choose from | () |
Name | Type | Description |
---|---|---|
Value |
Minimum of the passed arguments |
Compute the largest value among the supplied arguments.
Name | Type | Description | Default |
---|---|---|---|
args | Any | Arguments to choose from | () |
Name | Type | Description |
---|---|---|
Value |
Maximum of the passed arguments |
Construct a ternary conditional expression.
Name | Type | Description | Default |
---|---|---|---|
condition | Any | A boolean expression | required |
true_expr | Any | Expression to return if condition evaluates to True |
required |
false_expr | Any | Expression to return if condition evaluates to False or NULL |
required |
Name | Type | Description |
---|---|---|
Value | ir .Value |
The value of true_expr if condition is True else false_expr |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"condition": [True, False, True, None]})
>>> ibis.ifelse(t.condition, "yes", "no")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ IfElse(condition, 'yes', 'no') ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ string │ ├────────────────────────────────┤ │ yes │ │ no │ │ yes │ │ no │ └────────────────────────────────┘
Create a multi-branch if-else expression.
Equivalent to a SQL CASE
statement.
Name | Type | Description | Default |
---|---|---|---|
branch | tuple[Any, Any] | First (condition , result ) pair. Required. |
required |
branches | tuple[Any, Any] | Additional (condition , result ) pairs. We look through the test values in order and return the result corresponding to the first test value that matches self . If none match, we return else_ . |
() |
else_ | Any | None | Value to return if none of the case conditions evaluate to True . Defaults to NULL . |
None |
Name | Type | Description |
---|---|---|
Value |
A value expression |
>>> import ibis
>>> ibis.options.interactive = True
>>> v = ibis.memtable({"values": [1, 2, 1, 2, 3, 2, 4]}).values
>>> ibis.cases((v == 1, "a"), (v > 2, "b"), else_="unk").name("cases")
┏━━━━━━━━┓ ┃ cases ┃ ┡━━━━━━━━┩ │ string │ ├────────┤ │ a │ │ unk │ │ a │ │ unk │ │ b │ │ unk │ │ b │ └────────┘
>>> ibis.cases(
... (v % 2 == 0, "divisible by 2"),
... (v % 3 == 0, "divisible by 3"),
... (v % 4 == 0, "shadowed by the 2 case"),
... ).name("cases")
┏━━━━━━━━━━━━━━━━┓ ┃ cases ┃ ┡━━━━━━━━━━━━━━━━┩ │ string │ ├────────────────┤ │ NULL │ │ divisible by 2 │ │ NULL │ │ divisible by 2 │ │ divisible by 3 │ │ divisible by 2 │ │ divisible by 2 │ └────────────────┘
Create a ascending sort key from asc
or column name.
Name | Type | Description | Default |
---|---|---|---|
expr | ir .Column | str |
The expression or column name to use for sorting | required |
nulls_first | bool | Bool to indicate whether to put NULL values first or not. | False |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t[["species", "year"]].order_by(ibis.asc("year")).head()
┏━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━┩ │ string │ int64 │ ├─────────┼───────┤ │ Adelie │ 2007 │ │ Adelie │ 2007 │ │ Adelie │ 2007 │ │ Adelie │ 2007 │ │ Adelie │ 2007 │ └─────────┴───────┘
Name | Type | Description |
---|---|---|
ir .ValueExpr |
An expression |
Create a descending sort key from expr
or column name.
Name | Type | Description | Default |
---|---|---|---|
expr | ir .Column | str |
The expression or column name to use for sorting | required |
nulls_first | bool | Bool to indicate whether to put NULL values first or not. | False |
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t[["species", "year"]].order_by(ibis.desc("year")).head()
┏━━━━━━━━━┳━━━━━━━┓ ┃ species ┃ year ┃ ┡━━━━━━━━━╇━━━━━━━┩ │ string │ int64 │ ├─────────┼───────┤ │ Adelie │ 2009 │ │ Adelie │ 2009 │ │ Adelie │ 2009 │ │ Adelie │ 2009 │ │ Adelie │ 2009 │ └─────────┴───────┘
Name | Type | Description |
---|---|---|
ir .ValueExpr |
An expression |
Return the formatted SQL string for an expression.
Name | Type | Description | Default |
---|---|---|---|
expr | ir .Expr |
Ibis expression. | required |
dialect | str | None | SQL dialect to use for compilation. | None |
pretty | bool | Whether to use pretty formatting. | True |
kwargs | Scalar parameters | {} |
Name | Type | Description |
---|---|---|
str | Formatted SQL string |
You can also specify the SQL dialect to use for compilation: