Methods
approx_median
Return an approximate of the median of self.
approx_nunique
Return the approximate number of distinct elements in self.
arbitrary
Select an arbitrary value in a column.
argmax
Return the value of self that maximizes key.
argmin
Return the value of self that minimizes key.
as_scalar
Inform ibis to treat this Column as a scalar.
as_table
Promote the expression to a Table .
asc
Sort an expression in ascending order.
between
Check if this expression is between lower and upper, inclusive.
case
DEPRECATED: use value.cases() or ibis.cases() instead.
cases
Create a multi-branch if-else expression.
cast
Cast expression to indicated data type.
coalesce
Return the first non-null value from args.
collect
Aggregate this expression’s elements into an array.
compile
Compile expr to a SQL string (for SQL backends) or a LazyFrame (for the polars backend).
count
Compute the number of rows in an expression.
cume_dist
Return the cumulative distribution over a window.
cummax
Return the cumulative max over a window.
cummin
Return the cumulative min over a window.
dense_rank
Position of first element within each group of equal values.
desc
Sort an expression in descending order.
equals
Return whether this expression is structurally equivalent to other.
execute
Execute an expression against its backend if one exists.
fill_null
Replace NULLs with the given value. Does NOT affect NaN and inf values.
fillna
DEPRECATED: use fill_null instead, which acts exactly the same.
first
Return the first value of a column.
get_backend
Get the current Ibis backend of the expression.
get_name
Return the name of this expression.
group_concat
Concatenate values using the indicated separator to produce a string.
hash
Compute an integer hash value.
identical_to
Like ==, but always returns non-NULL, even when comparing NULLs.
isin
Check whether this expression is in values.
isnull
Whether this expression is NULL. Does NOT detect NaN and inf values.
lag
Return the row located at offset rows before the current row.
last
Return the last value of a column.
lead
Return the row located at offset rows after the current row.
max
Return the maximum of a column.
median
Return the median of the column.
min
Return the minimum of a column.
mode
Return the mode of a column.
name
Rename an expression to name.
notin
Check whether this expression is not in values.
notnull
Return whether this expression is not NULL.
nth
Return the nth value (0-indexed) over a window.
ntile
Return the integer number of a partitioning of the column values.
nullif
Set values to null if they equal the values null_if_expr.
nunique
Compute the number of distinct rows in an expression.
over
Construct a window expression.
percent_rank
Return the relative rank of the values in the column.
pipe
Compose f with self.
preview
Print a subset as a single-column Rich Table.
quantile
Return value at the given quantile.
rank
Compute position of first element within each equal-value group in sorted order.
substitute
Replace values given in values with replacement.
to_csv
Write the results of executing the given expression to a CSV file.
to_delta
Write the results of executing the given expression to a Delta Lake table.
to_json
Write the results of expr to a json file of [{column -> value}, …] objects.
to_list
Convert a column expression to a list.
to_pandas
Convert a table expression to a pandas DataFrame.
to_pandas_batches
Execute expression and return an iterator of pandas DataFrames.
to_parquet
Write the results of executing the given expression to a parquet file.
to_parquet_dir
Write the results of executing the given expression to a parquet file in a directory.
to_polars
Execute expression and return results as a polars dataframe.
to_pyarrow_batches
Execute expression and return a RecordBatchReader.
to_sql
Compile to a formatted SQL string.
to_torch
Execute an expression and return results as a dictionary of torch tensors.
to_xlsx
Write a table to an Excel file.
topk
Computes a Table of the top values by a metric (defaults to count).
try_cast
Try cast expression to indicated data type.
type
Return the DataType of self.
typeof
Return the string name of the datatype of self.
unbind
Return an expression built on UnboundTable instead of backend-specific objects.
value_counts
Compute a frequency table.
visualize
Visualize an expression as a GraphViz graph in the browser.
approx_nunique
approx_nunique(where= None )
Return the approximate number of distinct elements in self.
Whether the result is an approximation depends on the backend.
Parameters
where
ir.BooleanValue | None
Filter in values when where is True
None
Returns
Scalar
An approximate count of the distinct elements of self
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.approx_nunique()
>>> t.body_mass_g.approx_nunique(where= t.species == "Adelie" )
arbitrary
Select an arbitrary value in a column.
Returns an arbitrary (nondeterministic, backend-specific) value from the column. The value will be non-NULL, except if the column is empty or all values are NULL.
Parameters
where
ir.BooleanValue | None
A filter expression
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 2 , 2 ], "b" : list ("aaa" ), "c" : [4.0 , 4.1 , 4.2 ]})
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │ float64 │
├───────┼────────┼─────────┤
│ 1 │ a │ 4.0 │
│ 2 │ a │ 4.1 │
│ 2 │ a │ 4.2 │
└───────┴────────┴─────────┘
>>> t.group_by("a" ).agg(arb= t.b.arbitrary(), c= t.c.sum ()).order_by("a" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ a ┃ arb ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │ float64 │
├───────┼────────┼─────────┤
│ 1 │ a │ 4.0 │
│ 2 │ a │ 8.3 │
└───────┴────────┴─────────┘
argmax
argmax(key, * , where= None )
Return the value of self that maximizes key.
If more than one value maximizes key, the returned value is backend specific. The result may be NULL.
Parameters
key
ir.Value
Key to use for max computation.
required
where
ir.BooleanValue | None
Keep values when where is True
None
Returns
Scalar
The value of self that maximizes key
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmax(t.body_mass_g)
┌────────┐
│ Gentoo │
└────────┘
>>> t.species.argmax(t.body_mass_g, where= t.island == "Dream" )
┌───────────┐
│ Chinstrap │
└───────────┘
argmin
argmin(key, * , where= None )
Return the value of self that minimizes key.
If more than one value minimizes key, the returned value is backend specific. The result may be NULL.
Parameters
key
ir.Value
Key to use for min computation.
required
where
ir.BooleanValue | None
Keep values when where is True
None
Returns
Scalar
The value of self that minimizes key
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.species.argmin(t.body_mass_g)
┌───────────┐
│ Chinstrap │
└───────────┘
>>> t.species.argmin(t.body_mass_g, where= t.island == "Biscoe" )
┌────────┐
│ Adelie │
└────────┘
as_scalar
Inform ibis to treat this Column as a scalar.
Ibis cannot know until execution time whether a column expression contains only one value or many values.
This method is a way to explicitly tell ibis to trust you that this column expression will only contain one value at execution time. This allows you to use this column expression with other tables.
Note that execution will fail if the column DOES contain more than one value.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter (t.species == "Gentoo" , t.body_mass_g > 6200 )
We know from domain knowledge that there is only one Gentoo penguin with body mass greater than 6200g, so heavy_gentoo is a table with exactly one row.
If we try to use t.island == heavy_gentoo.island directly in a filter, we will get an error because we are trying to compare columns from two tables to each other, which we don’t know know to align without a specific join:
>>> t.filter (t.island == heavy_gentoo.island)
---------------------------------------------------------------------------
BinderException Traceback (most recent call last)
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/core/formatters.py:770 , in PlainTextFormatter.__call__ (self, obj)
763 stream = StringIO()
764 printer = pretty.RepresentationPrinter(stream, self .verbose,
765 self .max_width, self .newline,
766 max_seq_length=self .max_seq_length,
767 singleton_pprinters=self .singleton_printers,
768 type_pprinters=self .type_printers,
769 deferred_pprinters=self .deferred_printers)
--> 770 printer . pretty ( obj )
771 printer.flush()
772 return stream.getvalue()
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/lib/pretty.py:420 , in RepresentationPrinter.pretty (self, obj)
409 return meth(obj, self , cycle)
410 if (
411 cls is not object
412 # check if cls defines __repr__
(...) 418 and callable (_safe_getattr(cls , " __repr__ " , None ))
419 ):
--> 420 return _repr_pprint ( obj , self , cycle )
422 return _default_pprint(obj, self , cycle)
423 finally :
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/lib/pretty.py:795 , in _repr_pprint (obj, p, cycle)
793 """A pprint that just redirects to the normal repr function."""
794 # Find newlines and replace them with p.break_()
--> 795 output = repr ( obj )
796 lines = output.splitlines()
797 with p.group():
File ~/work/ibis/ibis/ibis/expr/types/core.py:54 , in Expr.__repr__ (self)
52 def __repr__ (self ) -> str :
53 if ibis.options.interactive:
---> 54 return capture_rich_renderable ( self )
55 else :
56 return self ._noninteractive_repr()
File ~/work/ibis/ibis/ibis/expr/types/rich.py:48 , in capture_rich_renderable (renderable)
46 console = Console(force_terminal=False )
47 with _with_rich_display_disabled(), console.capture() as capture:
---> 48 console . print ( renderable )
49 return capture.get().rstrip()
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/rich/console.py:1724 , in Console.print (self, sep, end, style, justify, overflow, no_wrap, emoji, markup, highlight, width, height, crop, soft_wrap, new_line_start, *objects)
1722 if style is None :
1723 for renderable in renderables:
-> 1724 extend ( render ( renderable , render_options ) )
1725 else :
1726 render_style = self .get_style(style)
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/rich/console.py:1325 , in Console.render (self, renderable, options)
1323 renderable = rich_cast(renderable)
1324 if hasattr (renderable, " __rich_console__ " ) and not isclass(renderable):
-> 1325 render_iterable = renderable . __rich_console__ ( self , _options )
1326 elif isinstance (renderable, str ):
1327 text_renderable = self .render_str(
1328 renderable, highlight=_options.highlight, markup=_options.markup
1329 )
File ~/work/ibis/ibis/ibis/expr/types/core.py:75 , in Expr.__rich_console__ (self, console, options)
73 try :
74 if opts.interactive:
---> 75 rich_object = to_rich ( self , console_width = console_width )
76 else :
77 rich_object = Text(self ._noninteractive_repr())
File ~/work/ibis/ibis/ibis/expr/types/rich.py:70 , in to_rich (expr, max_rows, max_columns, max_length, max_string, max_depth, console_width)
66 return to_rich_scalar(
67 expr, max_length=max_length, max_string=max_string, max_depth=max_depth
68 )
69 else :
---> 70 return to_rich_table (
71 expr ,
72 max_rows = max_rows ,
73 max_columns = max_columns ,
74 max_length = max_length ,
75 max_string = max_string ,
76 max_depth = max_depth ,
77 console_width = console_width ,
78 )
File ~/work/ibis/ibis/ibis/expr/types/_rich.py:331 , in to_rich_table (tablish, max_rows, max_columns, max_length, max_string, max_depth, console_width)
328 if orig_ncols > len (computed_cols):
329 table = table.select(*computed_cols)
--> 331 result = table . limit ( max_rows + 1 ) . to_pyarrow ( )
332 # Now format the columns in order, stopping if the console width would
333 # be exceeded.
334 col_info = []
File ~/work/ibis/ibis/ibis/expr/types/relations.py:628 , in Table.to_pyarrow (self, params, limit, **kwargs)
620 @experimental
621 def to_pyarrow (
622 self ,
(...) 626 **kwargs: Any,
627 ) -> pa.Table:
--> 628 return super ( ) . to_pyarrow ( params = params , limit = limit , * * kwargs )
File ~/work/ibis/ibis/ibis/expr/types/core.py:604 , in Expr.to_pyarrow (self, params, limit, **kwargs)
574 @experimental
575 def to_pyarrow (
576 self ,
(...) 580 **kwargs: Any,
581 ) -> pa.Table | pa.Array | pa.Scalar:
582 """Execute expression to a pyarrow object.
583
584 This method is eager and will execute the associated expression
(...) 602 If the passed expression is a Scalar, a pyarrow scalar is returned.
603 """
--> 604 return self . _find_backend ( use_default = True ) . to_pyarrow (
605 self , params = params , limit = limit , * * kwargs
606 )
File ~/work/ibis/ibis/ibis/backends/duckdb/__init__.py:1398 , in Backend.to_pyarrow (self, expr, params, limit, **kwargs)
1387 def to_pyarrow (
1388 self ,
1389 expr: ir.Expr,
(...) 1394 **kwargs: Any,
1395 ) -> pa.Table:
1396 from ibis . backends . duckdb . converter import DuckDBPyArrowData
-> 1398 table = self . _to_duckdb_relation (
1399 expr , params = params , limit = limit , * * kwargs
1400 ) .to_arrow_table()
1401 return expr.__pyarrow_result__(table, data_mapper=DuckDBPyArrowData)
File ~/work/ibis/ibis/ibis/backends/duckdb/__init__.py:1342 , in Backend._to_duckdb_relation (self, expr, params, limit, **kwargs)
1340 if table_expr.schema().geospatial:
1341 self ._load_extensions([" spatial " ])
-> 1342 return self . con . sql ( sql )
BinderException : Binder Error: Referenced table "t1" not found!
Candidate tables: "t0"
Instead, we should use as_scalar() to tell ibis that we know heavy_gentoo.island contains exactly one value:
>>> from_that_island = t.filter (t.island == heavy_gentoo.island.as_scalar())
>>> from_that_island.species.value_counts().order_by("species" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────┤
│ Adelie │ 44 │
│ Gentoo │ 124 │
└─────────┴───────────────┘
as_table
Promote the expression to a Table .
Examples
>>> import ibis
>>> t = ibis.table(dict (a= "str" ), name= "t" )
>>> expr = t.a.length().name("len" ).as_table()
>>> expected = t.select(len = t.a.length())
>>> expr.equals(expected)
asc
Sort an expression in ascending order.
Parameters
nulls_first
bool
Whether to sort NULL values first
False
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 2 , 3 , None ]})
>>> t.order_by(t.a.asc())
┏━━━━━━━━━┓
┃ a ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│ 1.0 │
│ 2.0 │
│ 3.0 │
│ NULL │
└─────────┘
>>> t.order_by(t.a.asc(nulls_first= True ))
┏━━━━━━━━━┓
┃ a ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│ NULL │
│ 1.0 │
│ 2.0 │
│ 3.0 │
└─────────┘
between
Check if this expression is between lower and upper, inclusive.
Parameters
lower
Value
Lower bound, inclusive
required
upper
Value
Upper bound, inclusive
required
Returns
BooleanValue
Expression indicating membership in the provided range
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5 )
>>> t.bill_length_mm.between(35 , 38 )
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Between(bill_length_mm, 35, 38) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├─────────────────────────────────┤
│ False │
│ False │
│ False │
│ NULL │
│ True │
└─────────────────────────────────┘
case
DEPRECATED: use value.cases() or ibis.cases() instead.
cases
cases(branch, / , * branches, else_= None )
Create a multi-branch if-else expression.
Equivalent to a SQL CASE statement.
Parameters
branch
tuple [Value , Value ]
First (condition, result) pair. Required.
required
branches
tuple [Value , Value ]
Additional (condition, result) pairs. We look through the test values in order and return the result corresponding to the first test value that matches self. If none match, we return else_.
()
else_
Value | None
Value to return if none of the case conditions evaluate to True. Defaults to NULL.
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "left" : [5 , 6 , 7 , 8 , 9 , 10 ],
... "symbol" : ["+" , "-" , "*" , "/" , "bogus" , None ],
... "right" : [1 , 2 , 3 , 4 , 5 , 6 ],
... }
... )
Note that we never hit the None case, because x = NULL is always NULL, which is not truthy. If you want to replace NULLs, you should use .fill_null(some_value) prior to cases().
>>> t.mutate(
... result= (
... t.symbol.cases(
... ("+" , t.left + t.right),
... ("-" , t.left - t.right),
... ("*" , t.left * t.right),
... ("/" , t.left / t.right),
... (None , - 999 ),
... )
... )
... )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━┓
┃ left ┃ symbol ┃ right ┃ result ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │ int64 │ float64 │
├───────┼────────┼───────┼─────────┤
│ 5 │ + │ 1 │ 6.0 │
│ 6 │ - │ 2 │ 4.0 │
│ 7 │ * │ 3 │ 21.0 │
│ 8 │ / │ 4 │ 2.0 │
│ 9 │ bogus │ 5 │ NULL │
│ 10 │ NULL │ 6 │ NULL │
└───────┴────────┴───────┴─────────┘
cast
Cast expression to indicated data type.
Similar to pandas.Series.astype.
Parameters
target_type
Any
Type to cast to. Anything accepted by ibis.dtype()
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> x = ibis.examples.penguins.fetch()["bill_depth_mm" ]
>>> x
┏━━━━━━━━━━━━━━━┓
┃ bill_depth_mm ┃
┡━━━━━━━━━━━━━━━┩
│ float64 │
├───────────────┤
│ 18.7 │
│ 17.4 │
│ 18.0 │
│ NULL │
│ 19.3 │
│ 20.6 │
│ 17.8 │
│ 19.6 │
│ 18.1 │
│ 20.2 │
│ … │
└───────────────┘
python’s built-in types can be used
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Cast(bill_depth_mm, int64) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │
├────────────────────────────┤
│ 19 │
│ 17 │
│ 18 │
│ NULL │
│ 19 │
│ 21 │
│ 18 │
│ 20 │
│ 18 │
│ 20 │
│ … │
└────────────────────────────┘
or string names
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Cast(bill_depth_mm, uint16) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ uint16 │
├─────────────────────────────┤
│ 19 │
│ 17 │
│ 18 │
│ NULL │
│ 19 │
│ 21 │
│ 18 │
│ 20 │
│ 18 │
│ 20 │
│ … │
└─────────────────────────────┘
If you make an illegal cast, you won’t know until the backend actually executes it. Consider .try_cast() .
>>> ibis.literal("a string" ).cast("int64" )
---------------------------------------------------------------------------
ConversionException Traceback (most recent call last)
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/core/formatters.py:770 , in PlainTextFormatter.__call__ (self, obj)
763 stream = StringIO()
764 printer = pretty.RepresentationPrinter(stream, self .verbose,
765 self .max_width, self .newline,
766 max_seq_length=self .max_seq_length,
767 singleton_pprinters=self .singleton_printers,
768 type_pprinters=self .type_printers,
769 deferred_pprinters=self .deferred_printers)
--> 770 printer . pretty ( obj )
771 printer.flush()
772 return stream.getvalue()
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/lib/pretty.py:420 , in RepresentationPrinter.pretty (self, obj)
409 return meth(obj, self , cycle)
410 if (
411 cls is not object
412 # check if cls defines __repr__
(...) 418 and callable (_safe_getattr(cls , " __repr__ " , None ))
419 ):
--> 420 return _repr_pprint ( obj , self , cycle )
422 return _default_pprint(obj, self , cycle)
423 finally :
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/IPython/lib/pretty.py:795 , in _repr_pprint (obj, p, cycle)
793 """A pprint that just redirects to the normal repr function."""
794 # Find newlines and replace them with p.break_()
--> 795 output = repr ( obj )
796 lines = output.splitlines()
797 with p.group():
File ~/work/ibis/ibis/ibis/expr/types/core.py:54 , in Expr.__repr__ (self)
52 def __repr__ (self ) -> str :
53 if ibis.options.interactive:
---> 54 return capture_rich_renderable ( self )
55 else :
56 return self ._noninteractive_repr()
File ~/work/ibis/ibis/ibis/expr/types/rich.py:48 , in capture_rich_renderable (renderable)
46 console = Console(force_terminal=False )
47 with _with_rich_display_disabled(), console.capture() as capture:
---> 48 console . print ( renderable )
49 return capture.get().rstrip()
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/rich/console.py:1724 , in Console.print (self, sep, end, style, justify, overflow, no_wrap, emoji, markup, highlight, width, height, crop, soft_wrap, new_line_start, *objects)
1722 if style is None :
1723 for renderable in renderables:
-> 1724 extend ( render ( renderable , render_options ) )
1725 else :
1726 render_style = self .get_style(style)
File /nix/store/1w0931lgp02pk2yqql58b10fnfmvcf9x-ibis-3.12/lib/python3.12/site-packages/rich/console.py:1325 , in Console.render (self, renderable, options)
1323 renderable = rich_cast(renderable)
1324 if hasattr (renderable, " __rich_console__ " ) and not isclass(renderable):
-> 1325 render_iterable = renderable . __rich_console__ ( self , _options )
1326 elif isinstance (renderable, str ):
1327 text_renderable = self .render_str(
1328 renderable, highlight=_options.highlight, markup=_options.markup
1329 )
File ~/work/ibis/ibis/ibis/expr/types/core.py:75 , in Expr.__rich_console__ (self, console, options)
73 try :
74 if opts.interactive:
---> 75 rich_object = to_rich ( self , console_width = console_width )
76 else :
77 rich_object = Text(self ._noninteractive_repr())
File ~/work/ibis/ibis/ibis/expr/types/rich.py:66 , in to_rich (expr, max_rows, max_columns, max_length, max_string, max_depth, console_width)
63 from ibis . expr . types . _rich import to_rich_scalar, to_rich_table
65 if isinstance (expr, ir.Scalar):
---> 66 return to_rich_scalar (
67 expr , max_length = max_length , max_string = max_string , max_depth = max_depth
68 )
69 else :
70 return to_rich_table(
71 expr,
72 max_rows=max_rows,
(...) 77 console_width=console_width,
78 )
File ~/work/ibis/ibis/ibis/expr/types/_rich.py:270 , in to_rich_scalar (expr, max_length, max_string, max_depth)
263 def to_rich_scalar (
264 expr: Scalar,
265 *,
(...) 268 max_depth: int | None = None ,
269 ) -> Panel:
--> 270 value = expr . to_pyarrow ( ) .as_py()
272 if value is None :
273 formatted_value = Text.styled(" NULL " , style=" dim " )
File ~/work/ibis/ibis/ibis/expr/types/generic.py:1452 , in Scalar.to_pyarrow (self, params, limit, **kwargs)
1444 @experimental
1445 def to_pyarrow (
1446 self ,
(...) 1450 **kwargs: Any,
1451 ) -> pa.Scalar:
-> 1452 return super ( ) . to_pyarrow ( params = params , limit = limit , * * kwargs )
File ~/work/ibis/ibis/ibis/expr/types/core.py:604 , in Expr.to_pyarrow (self, params, limit, **kwargs)
574 @experimental
575 def to_pyarrow (
576 self ,
(...) 580 **kwargs: Any,
581 ) -> pa.Table | pa.Array | pa.Scalar:
582 """Execute expression to a pyarrow object.
583
584 This method is eager and will execute the associated expression
(...) 602 If the passed expression is a Scalar, a pyarrow scalar is returned.
603 """
--> 604 return self . _find_backend ( use_default = True ) . to_pyarrow (
605 self , params = params , limit = limit , * * kwargs
606 )
File ~/work/ibis/ibis/ibis/backends/duckdb/__init__.py:1400 , in Backend.to_pyarrow (self, expr, params, limit, **kwargs)
1387 def to_pyarrow (
1388 self ,
1389 expr: ir.Expr,
(...) 1394 **kwargs: Any,
1395 ) -> pa.Table:
1396 from ibis . backends . duckdb . converter import DuckDBPyArrowData
1398 table = self . _to_duckdb_relation (
1399 expr , params = params , limit = limit , * * kwargs
-> 1400 ) . to_arrow_table ( )
1401 return expr.__pyarrow_result__(table, data_mapper=DuckDBPyArrowData)
ConversionException : Conversion Error: Could not convert string 'a string' to INT64
LINE 1: SELECT CAST('a string' AS BIGINT) AS "Cast('a string', int64)"
^
coalesce
Return the first non-null value from args.
Parameters
args
Value
Arguments from which to choose the first non-null value
()
Returns
Value
Coalesced expression
Examples
>>> import ibis
>>> ibis.coalesce(None , 4 , 5 ).name("x" )
collect
collect(where= None , order_by= None , include_null= False , distinct= False )
Aggregate this expression’s elements into an array.
This function is called array_agg, list_agg, or list in other systems.
Parameters
where
ir.BooleanValue | None
An optional filter expression. If provided, only rows where where is True will be included in the aggregate.
None
order_by
Any
An ordering key (or keys) to use to order the rows before aggregating. If not provided, the order of the items in the result is undefined and backend specific.
None
include_null
bool
Whether to include null values when performing this aggregation. Set to True to include nulls in the result.
False
distinct
bool
Whether to collect only distinct elements.
False
Returns
ArrayScalar
An array of all the collected elements.
Examples
Basic collect usage
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"key" : list ("aaabb" ), "value" : [1 , 1 , 2 , 3 , 5 ]})
>>> t
┏━━━━━━━━┳━━━━━━━┓
┃ key ┃ value ┃
┡━━━━━━━━╇━━━━━━━┩
│ string │ int64 │
├────────┼───────┤
│ a │ 1 │
│ a │ 1 │
│ a │ 2 │
│ b │ 3 │
│ b │ 5 │
└────────┴───────┘
Collect all elements into an array scalar:
>>> t.value.collect().to_pandas()
Collect only unique elements:
>>> t.value.collect(distinct= True ).to_pandas()
Collect elements in a specified order:
>>> t.value.collect(order_by= _.value.desc()).to_pandas()
Collect elements per group, filtering out values <= 1:
>>> t.group_by("key" ).agg(v= t.value.collect(where= _.value > 1 )).order_by("key" )
┏━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ key ┃ v ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ array<int64> │
├────────┼──────────────┤
│ a │ [ 2 ] │
│ b │ [ 3 , 5 ] │
└────────┴──────────────┘
compile
compile (limit= None , params= None , pretty= False )
Compile expr to a SQL string (for SQL backends) or a LazyFrame (for the polars backend).
Parameters
limit
int | None
An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py.
None
params
Mapping [ir.Value, Any ] | None
Mapping of scalar parameter expressions to value
None
pretty
bool
In case of SQL backends, return a pretty formatted SQL query.
False
Returns
str | pl.LazyFrame
A SQL string or a LazyFrame object, depending on the backend of self.
Examples
>>> import ibis
>>> d = {"a" : [1 , 2 , 3 ], "b" : [4 , 5 , 6 ]}
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table("t" , d)
>>> expr = t.mutate(c= t.a + t.b)
>>> expr.compile ()
'SELECT "t0"."a", "t0"."b", "t0"."a" + "t0"."b" AS "c" FROM "memory"."main"."t" AS "t0"'
If you want to see the pretty formatted SQL query, set pretty to True.
>>> expr.compile (pretty= True )
'SELECT\n "t0"."a",\n "t0"."b",\n "t0"."a" + "t0"."b" AS "c"\nFROM "memory"."main"."t" AS "t0"'
If the expression does not have a backend, an error will be raised.
>>> t = ibis.memtable(d)
>>> expr = t.mutate(c= t.a + t.b)
>>> expr.compile () # quartodoc: +EXPECTED_FAILURE
---------------------------------------------------------------------------
IbisError Traceback (most recent call last)
Cell In[308] , line 3
1 t = ibis.memtable(d)
2 expr = t.mutate(c=t.a + t.b)
----> 3 expr . compile ( ) # quartodoc: +EXPECTED_FAILURE
File ~/work/ibis/ibis/ibis/expr/types/core.py:531 , in Expr.compile (self, limit, params, pretty)
478 def compile (
479 self ,
480 *,
(...) 483 pretty: bool = False ,
484 ) -> str | pl.LazyFrame:
485 r """Compile `expr` to a SQL string (for SQL backends) or a LazyFrame (for the polars backend).
486
487 Parameters
(...) 529 [`Table.to_sql()`](./expression-tables.qmd#ibis.expr.types.relations.Table.to_sql)
530 """
--> 531 return self . _find_backend ( ) .compile(
532 self , limit=limit, params=params, pretty=pretty
533 )
File ~/work/ibis/ibis/ibis/expr/types/core.py:337 , in Expr._find_backend (self, use_default)
335 default = _default_backend() if use_default else None
336 if default is None :
--> 337 raise IbisError(
338 " Expression depends on no backends, and found no default "
339 )
340 return default
342 if len (backends) > 1 :
IbisError : Expression depends on no backends, and found no default
count
Compute the number of rows in an expression.
Parameters
where
ir.BooleanValue | None
Filter expression
None
Returns
IntegerScalar
Number of elements in an expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "id" : [1 , 2 , 3 , 4 , 5 , 6 ],
... "color" : ["Red" , "Green" , "Blue" , "Blue" , "Red" , "Blue" ],
... }
... )
>>> t.count()
>>> t.count(where= t.color == "Blue" )
cume_dist
Return the cumulative distribution over a window.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(cume_dist= t.values.cume_dist())
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ values ┃ cume_dist ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ int64 │ float64 │
├────────┼───────────┤
│ 1 │ 0.333333 │
│ 1 │ 0.333333 │
│ 2 │ 0.833333 │
│ 2 │ 0.833333 │
│ 2 │ 0.833333 │
│ 3 │ 1.000000 │
└────────┴───────────┘
cummax
cummax(where= None , group_by= None , order_by= None )
Return the cumulative max over a window.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "id" : [1 , 2 , 3 , 4 , 5 , 6 ],
... "grouper" : ["a" , "a" , "a" , "b" , "b" , "c" ],
... "values" : [3 , 2 , 1 , 2 , 3 , 2 ],
... }
... )
>>> t.mutate(cummax= t.values.cummax())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ id ┃ grouper ┃ values ┃ cummax ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │
├───────┼─────────┼────────┼────────┤
│ 1 │ a │ 3 │ 3 │
│ 2 │ a │ 2 │ 3 │
│ 3 │ a │ 1 │ 3 │
│ 4 │ b │ 2 │ 3 │
│ 5 │ b │ 3 │ 3 │
│ 6 │ c │ 2 │ 3 │
└───────┴─────────┴────────┴────────┘
>>> t.mutate(cummax= t.values.cummax(where= t.grouper != "c" , group_by= t.grouper)).order_by(
... t.id
... )
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ id ┃ grouper ┃ values ┃ cummax ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │
├───────┼─────────┼────────┼────────┤
│ 1 │ a │ 3 │ 3 │
│ 2 │ a │ 2 │ 3 │
│ 3 │ a │ 1 │ 3 │
│ 4 │ b │ 2 │ 2 │
│ 5 │ b │ 3 │ 3 │
│ 6 │ c │ 2 │ NULL │
└───────┴─────────┴────────┴────────┘
cummin
cummin(where= None , group_by= None , order_by= None )
Return the cumulative min over a window.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "id" : [1 , 2 , 3 , 4 , 5 , 6 ],
... "grouper" : ["a" , "a" , "a" , "b" , "b" , "c" ],
... "values" : [3 , 2 , 1 , 2 , 3 , 2 ],
... }
... )
>>> t.mutate(cummin= t.values.cummin())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ id ┃ grouper ┃ values ┃ cummin ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │
├───────┼─────────┼────────┼────────┤
│ 1 │ a │ 3 │ 3 │
│ 2 │ a │ 2 │ 2 │
│ 3 │ a │ 1 │ 1 │
│ 4 │ b │ 2 │ 1 │
│ 5 │ b │ 3 │ 1 │
│ 6 │ c │ 2 │ 1 │
└───────┴─────────┴────────┴────────┘
>>> t.mutate(cummin= t.values.cummin(where= t.grouper != "c" , group_by= t.grouper)).order_by(
... t.id
... )
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ id ┃ grouper ┃ values ┃ cummin ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │
├───────┼─────────┼────────┼────────┤
│ 1 │ a │ 3 │ 3 │
│ 2 │ a │ 2 │ 2 │
│ 3 │ a │ 1 │ 1 │
│ 4 │ b │ 2 │ 2 │
│ 5 │ b │ 3 │ 2 │
│ 6 │ c │ 2 │ NULL │
└───────┴─────────┴────────┴────────┘
dense_rank
Position of first element within each group of equal values.
Values are returned in sorted order and duplicate values are ignored.
Equivalent to SQL’s DENSE_RANK().
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(rank= t.values.dense_rank())
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 3 │ 2 │
└────────┴───────┘
desc
Sort an expression in descending order.
Parameters
nulls_first
bool
Whether to sort NULL values first.
False
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 2 , 3 , None ]})
>>> t.order_by(t.a.desc())
┏━━━━━━━━━┓
┃ a ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│ 3.0 │
│ 2.0 │
│ 1.0 │
│ NULL │
└─────────┘
>>> t.order_by(t.a.desc(nulls_first= True ))
┏━━━━━━━━━┓
┃ a ┃
┡━━━━━━━━━┩
│ float64 │
├─────────┤
│ NULL │
│ 3.0 │
│ 2.0 │
│ 1.0 │
└─────────┘
equals
Return whether this expression is structurally equivalent to other.
If you want to produce an equality expression, use == syntax.
Parameters
other
Another expression
required
Examples
>>> import ibis
>>> t1 = ibis.table(dict (a= "int" ), name= "t" )
>>> t2 = ibis.table(dict (a= "int" ), name= "t" )
>>> t1.equals(t2)
>>> v = ibis.table(dict (a= "string" ), name= "v" )
>>> t1.equals(v)
execute
execute(limit= 'default' , params= None , ** kwargs)
Execute an expression against its backend if one exists.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py.
'default'
params
Mapping [ir.Value, Any ] | None
Mapping of scalar parameter expressions to value
None
kwargs
Any
Keyword arguments
{}
Examples
>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.execute()
0
Adelie
Torgersen
39.1
18.7
181.0
3750.0
male
2007
1
Adelie
Torgersen
39.5
17.4
186.0
3800.0
female
2007
2
Adelie
Torgersen
40.3
18.0
195.0
3250.0
female
2007
3
Adelie
Torgersen
NaN
NaN
NaN
NaN
None
2007
4
Adelie
Torgersen
36.7
19.3
193.0
3450.0
female
2007
...
...
...
...
...
...
...
...
...
339
Chinstrap
Dream
55.8
19.8
207.0
4000.0
male
2009
340
Chinstrap
Dream
43.5
18.1
202.0
3400.0
female
2009
341
Chinstrap
Dream
49.6
18.2
193.0
3775.0
male
2009
342
Chinstrap
Dream
50.8
19.0
210.0
4100.0
male
2009
343
Chinstrap
Dream
50.2
18.7
198.0
3775.0
female
2009
344 rows × 8 columns
Scalar parameters can be supplied dynamically during execution.
>>> species = ibis.param("string" )
>>> expr = t.filter (t.species == species).order_by(t.bill_length_mm)
>>> expr.execute(limit= 3 , params= {species: "Gentoo" })
0
Gentoo
Biscoe
40.9
13.7
214
4650
female
2007
1
Gentoo
Biscoe
41.7
14.7
210
4700
female
2009
2
Gentoo
Biscoe
42.0
13.5
210
4150
female
2007
fill_null
Replace NULLs with the given value. Does NOT affect NaN and inf values.
This only replaces genuine NULL values, it does NOT affect NaN and inf values for floating point types.
Parameters
fill_value
Scalar
Value with which to replace NULL values in self
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"f" : [None , "-inf" , "3.0" , "inf" , "nan" ]})
>>> t = t.mutate(f= ibis._.f.cast(float ))
>>> t = t.mutate(filled= t.f.fill_null(99 ))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓
┃ f ┃ filled ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ float64 │ float64 │
├─────────┼─────────┤
│ NULL │ 99.0 │
│ -inf │ -inf │
│ 3.0 │ 3.0 │
│ inf │ inf │
│ nan │ nan │
└─────────┴─────────┘
If you want to fill all NaN and inf values as well, use something like the following:
>>> t.mutate(filled2= ibis.or_(t.f.isnull(), t.f.isnan(), t.f.isinf()).ifelse(99 , t.f))
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ f ┃ filled ┃ filled2 ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ float64 │ float64 │ float64 │
├─────────┼─────────┼─────────┤
│ NULL │ 99.0 │ 99.0 │
│ -inf │ -inf │ 99.0 │
│ 3.0 │ 3.0 │ 3.0 │
│ inf │ inf │ 99.0 │
│ nan │ nan │ 99.0 │
└─────────┴─────────┴─────────┘
Returns
Value
self filled with fill_value where it is NULL
fillna
DEPRECATED: use fill_null instead, which acts exactly the same.
first
first(where= None , order_by= None , include_null= False )
Return the first value of a column.
Parameters
where
ir.BooleanValue | None
An optional filter expression. If provided, only rows where where is True will be included in the aggregate.
None
order_by
Any
An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of first is undefined and will be backend specific.
None
include_null
bool
Whether to include null values when performing this aggregation. Set to True to include nulls in the result.
False
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━━┓
┃ chars ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a │
│ b │
│ c │
│ d │
└────────┘
>>> t.chars.first(where= t.chars != "a" )
get_backend
Get the current Ibis backend of the expression.
Returns
BaseBackend
The Ibis backend.
Examples
>>> import ibis
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table("t" , {"id" : [1 , 2 , 3 ]})
>>> t.get_backend()
<ibis.backends.duckdb.Backend at 0x7fff107cf200>
get_name
Return the name of this expression.
group_concat
group_concat(sep= ',' , where= None , order_by= None )
Concatenate values using the indicated separator to produce a string.
Parameters
sep
str
The separator to use to join strings.
','
where
ir.BooleanValue | None
An optional filter expression. If provided, only rows where where is True will be included in the aggregate.
None
order_by
Any
An ordering key (or keys) to use to order the rows before aggregating. If not provided, the order of the items in the result is undefined and backend specific.
None
Returns
StringScalar
Concatenated string expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5 )
>>> t[["bill_length_mm" , "bill_depth_mm" ]]
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ float64 │ float64 │
├────────────────┼───────────────┤
│ 39.1 │ 18.7 │
│ 39.5 │ 17.4 │
│ 40.3 │ 18.0 │
│ NULL │ NULL │
│ 36.7 │ 19.3 │
└────────────────┴───────────────┘
>>> t.bill_length_mm.group_concat()
┌─────────────────────┐
│ 39.1,39.5,40.3,36.7 │
└─────────────────────┘
>>> t.bill_length_mm.group_concat(sep= ": " )
┌────────────────────────┐
│ 39.1: 39.5: 40.3: 36.7 │
└────────────────────────┘
>>> t.bill_length_mm.group_concat(sep= ": " , where= t.bill_depth_mm > 18 )
┌────────────┐
│ 39.1: 36.7 │
└────────────┘
hash
Compute an integer hash value.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> ibis.literal("hello" ).hash ()
┌─────────────────────┐
│ 8628213525773890682 │
└─────────────────────┘
identical_to
Like ==, but always returns non-NULL, even when comparing NULLs.
With regular equality, NULL == <anything> is always NULL, even for NULL == NULL.
In contrast, identical_to treats NULL as a singleton value, so that
NULL.identical_to(NULL) is True.
NULL.identical_to(<non-null>) is False.
This corresponds to IS NOT DISTINCT FROM in SQL.
Parameters
other
Value
Expression to compare to
required
Returns
BooleanValue
Whether this expression is not distinct from other
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... [
... (1 , 1 ),
... (1 , 2 ),
... (1 , None ),
... (None , None ),
... ],
... schema= {"a" : "int64" , "b" : "int64" },
... )
>>> t.mutate(
... eq= _.a == _.b,
... neq= _.a != _.b,
... identical= _.a.identical_to(_.b),
... not_identical=~ _.a.identical_to(_.b),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ eq ┃ neq ┃ identical ┃ not_identical ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │ boolean │ boolean │
├───────┼───────┼─────────┼─────────┼───────────┼───────────────┤
│ 1 │ 1 │ True │ False │ True │ False │
│ 1 │ 2 │ False │ True │ False │ True │
│ 1 │ NULL │ NULL │ NULL │ False │ True │
│ NULL │ NULL │ NULL │ NULL │ True │ False │
└───────┴───────┴─────────┴─────────┴───────────┴───────────────┘
isin
Check whether this expression is in values.
NULL values in the input are propagated in the output. If the values argument contains any NULL values, then ibis follows the SQL behavior of returning NULL (not False) when self is not present. See examples below for details.
Parameters
values
ir.ArrayValue | ir.Column | Iterable [Value ]
Values or expression to check for membership.
required
Returns
BooleanValue
True if self is contained in values, False otherwise.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "a" : [1 , 2 , 3 , None ],
... "b" : [1 , 2 , 9 , None ],
... },
... schema= {"a" : int , "b" : int },
... )
Checking for values in literals:
>>> t.mutate(
... a_in_12= t.a.isin([1 , 2 ]),
... a_in_12None= t.a.isin([1 , 2 , None ]),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_in_12 ┃ a_in_12None ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │
├───────┼───────┼─────────┼─────────────┤
│ 1 │ 1 │ True │ True │
│ 2 │ 2 │ True │ True │
│ 3 │ 9 │ False │ NULL │
│ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴─────────┴─────────────┘
Checking for values in columns of the same table:
>>> t.mutate(
... a_in_b= t.a.isin(t.b),
... a_in_b_no_null= t.a.isin(t.b.fill_null(0 )),
... a_in_b_plus_1= t.a.isin(t.b + 1 ),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_in_b ┃ a_in_b_no_null ┃ a_in_b_plus_1 ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │ boolean │
├───────┼───────┼─────────┼────────────────┼───────────────┤
│ 1 │ 1 │ True │ True │ NULL │
│ 2 │ 2 │ True │ True │ True │
│ 3 │ 9 │ NULL │ False │ True │
│ NULL │ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴─────────┴────────────────┴───────────────┘
Checking for values in a column from a different table:
>>> t2 = ibis.memtable({"x" : [1 , 2 , 99 ], "y" : [1 , 2 , None ]})
>>> t.mutate(
... a_in_x= t.a.isin(t2.x),
... a_in_y= t.a.isin(t2.y),
... a_in_y_plus_1= t.a.isin(t2.y + 1 ),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_in_x ┃ a_in_y ┃ a_in_y_plus_1 ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │ boolean │
├───────┼───────┼─────────┼─────────┼───────────────┤
│ 1 │ 1 │ True │ True │ NULL │
│ 2 │ 2 │ True │ True │ True │
│ 3 │ 9 │ False │ NULL │ True │
│ NULL │ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴─────────┴─────────┴───────────────┘
isnull
Whether this expression is NULL. Does NOT detect NaN and inf values.
For FloatingValue types, use FloatingValue.isnan() and FloatingValue.isinf() to detect NaN and inf values.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"f" : [None , "-inf" , "3.0" , "inf" , "nan" ]})
>>> t = t.mutate(f= ibis._.f.cast(float ))
>>> t.mutate(
... isnull= t.f.isnull(),
... isnan= t.f.isnan(),
... isinf= t.f.isinf(),
... )
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ f ┃ isnull ┃ isnan ┃ isinf ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ float64 │ boolean │ boolean │ boolean │
├─────────┼─────────┼─────────┼─────────┤
│ NULL │ True │ NULL │ NULL │
│ -inf │ False │ False │ True │
│ 3.0 │ False │ False │ False │
│ inf │ False │ False │ True │
│ nan │ False │ True │ False │
└─────────┴─────────┴─────────┴─────────┘
lag
lag(offset= None , default= None )
Return the row located at offset rows before the current row.
Parameters
offset
int | ir.IntegerValue | None
Index of row to select
None
default
Value | None
Value used if no row exists at offset
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {"year" : [2007 , 2008 , 2009 , 2010 ], "total" : [1899.6 , 1928.2 , 2037.9 , 1955.2 ]}
... )
>>> t.mutate(total_lead= t.total.lag())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ year ┃ total ┃ total_lead ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼─────────┼────────────┤
│ 2007 │ 1899.6 │ NULL │
│ 2008 │ 1928.2 │ 1899.6 │
│ 2009 │ 2037.9 │ 1928.2 │
│ 2010 │ 1955.2 │ 2037.9 │
└───────┴─────────┴────────────┘
>>> t.mutate(total_lead= t.total.lag(2 , 0 ))
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ year ┃ total ┃ total_lead ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼─────────┼────────────┤
│ 2007 │ 1899.6 │ 0.0 │
│ 2008 │ 1928.2 │ 0.0 │
│ 2009 │ 2037.9 │ 1899.6 │
│ 2010 │ 1955.2 │ 1928.2 │
└───────┴─────────┴────────────┘
last
last(where= None , order_by= None , include_null= False )
Return the last value of a column.
Parameters
where
ir.BooleanValue | None
An optional filter expression. If provided, only rows where where is True will be included in the aggregate.
None
order_by
Any
An ordering key (or keys) to use to order the rows before aggregating. If not provided, the meaning of last is undefined and will be backend specific.
None
include_null
bool
Whether to include null values when performing this aggregation. Set to True to include nulls in the result.
False
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━━┓
┃ chars ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a │
│ b │
│ c │
│ d │
└────────┘
>>> t.chars.last(where= t.chars != "d" )
lead
lead(offset= None , default= None )
Return the row located at offset rows after the current row.
Parameters
offset
int | ir.IntegerValue | None
Index of row to select
None
default
Value | None
Value used if no row exists at offset
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {"year" : [2007 , 2008 , 2009 , 2010 ], "total" : [1899.6 , 1928.2 , 2037.9 , 1955.2 ]}
... )
>>> t.mutate(total_lead= t.total.lead())
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ year ┃ total ┃ total_lead ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼─────────┼────────────┤
│ 2007 │ 1899.6 │ 1928.2 │
│ 2008 │ 1928.2 │ 2037.9 │
│ 2009 │ 2037.9 │ 1955.2 │
│ 2010 │ 1955.2 │ NULL │
└───────┴─────────┴────────────┘
>>> t.mutate(total_lead= t.total.lead(2 , 0 ))
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
┃ year ┃ total ┃ total_lead ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼─────────┼────────────┤
│ 2007 │ 1899.6 │ 2037.9 │
│ 2008 │ 1928.2 │ 1955.2 │
│ 2009 │ 2037.9 │ 0.0 │
│ 2010 │ 1955.2 │ 0.0 │
└───────┴─────────┴────────────┘
max
Return the maximum of a column.
Parameters
where
ir.BooleanValue | None
Filter in values when where is True
None
Returns
Scalar
The maximum value in self
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.max ()
┌──────┐
│ 6300 │
└──────┘
>>> t.body_mass_g.max (where= t.species == "Chinstrap" )
┌──────┐
│ 4800 │
└──────┘
min
Return the minimum of a column.
Parameters
where
ir.BooleanValue | None
Filter in values when where is True
None
Returns
Scalar
The minimum value in self
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.min ()
┌──────┐
│ 2700 │
└──────┘
>>> t.body_mass_g.min (where= t.species == "Adelie" )
┌──────┐
│ 2850 │
└──────┘
mode
Return the mode of a column.
Parameters
where
ir.BooleanValue | None
Filter in values when where is True
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.mode()
┌──────┐
│ 3800 │
└──────┘
>>> t.body_mass_g.mode(where= (t.species == "Gentoo" ) & (t.sex == "male" ))
┌──────┐
│ 5550 │
└──────┘
name
Rename an expression to name.
Parameters
name
str
The new name of the expression
required
Returns
Value
self with name name
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 2 ]})
>>> t.a
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
┏━━━━━━━┓
┃ b ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
notin
Check whether this expression is not in values.
Opposite of Value.isin() .
NULL values in the input are propagated in the output. If the values argument contains any NULL values, then ibis follows the SQL behavior of returning NULL (not False) when self is present. See examples below for details.
Parameters
values
ir.ArrayValue | ir.Column | Iterable [Value ]
Values or expression to check for lack of membership.
required
Returns
BooleanValue
True if self is not in values, False otherwise.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "a" : [1 , 2 , 3 , None ],
... "b" : [1 , 2 , 9 , None ],
... },
... schema= {"a" : int , "b" : int },
... )
Checking for values in literals:
>>> t.mutate(
... a_notin_12= t.a.notin([1 , 2 ]),
... a_notin_12None= t.a.notin([1 , 2 , None ]),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_notin_12 ┃ a_notin_12None ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │
├───────┼───────┼────────────┼────────────────┤
│ 1 │ 1 │ False │ False │
│ 2 │ 2 │ False │ False │
│ 3 │ 9 │ True │ NULL │
│ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴────────────┴────────────────┘
Checking for values in columns of the same table:
>>> t.mutate(
... a_notin_b= t.a.notin(t.b),
... a_notin_b_no_null= t.a.notin(t.b.fill_null(0 )),
... a_notin_b_plus_1= t.a.notin(t.b + 1 ),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_notin_b ┃ a_notin_b_no_null ┃ a_notin_b_plus_1 ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │ boolean │
├───────┼───────┼───────────┼───────────────────┼──────────────────┤
│ 1 │ 1 │ False │ False │ NULL │
│ 2 │ 2 │ False │ False │ False │
│ 3 │ 9 │ NULL │ True │ False │
│ NULL │ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴───────────┴───────────────────┴──────────────────┘
Checking for values in a column from a different table:
>>> t2 = ibis.memtable({"x" : [1 , 2 , 99 ], "y" : [1 , 2 , None ]})
>>> t.mutate(
... a_notin_x= t.a.notin(t2.x),
... a_notin_y= t.a.notin(t2.y),
... a_notin_y_plus_1= t.a.notin(t2.y + 1 ),
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ a ┃ b ┃ a_notin_x ┃ a_notin_y ┃ a_notin_y_plus_1 ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ boolean │ boolean │ boolean │
├───────┼───────┼───────────┼───────────┼──────────────────┤
│ 1 │ 1 │ False │ False │ NULL │
│ 2 │ 2 │ False │ False │ False │
│ 3 │ 9 │ True │ NULL │ False │
│ NULL │ NULL │ NULL │ NULL │ NULL │
└───────┴───────┴───────────┴───────────┴──────────────────┘
notnull
Return whether this expression is not NULL.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().limit(5 )
>>> t.bill_depth_mm
┏━━━━━━━━━━━━━━━┓
┃ bill_depth_mm ┃
┡━━━━━━━━━━━━━━━┩
│ float64 │
├───────────────┤
│ 18.7 │
│ 17.4 │
│ 18.0 │
│ NULL │
│ 19.3 │
└───────────────┘
>>> t.bill_depth_mm.notnull()
┏━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ NotNull(bill_depth_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├────────────────────────┤
│ True │
│ True │
│ True │
│ False │
│ True │
└────────────────────────┘
nth
Return the nth value (0-indexed) over a window.
.nth(0) is equivalent to .first(). Negative will result in NULL. If the value of n is greater than the number of rows in the window, NULL will be returned.
Parameters
n
int | ir.IntegerValue
Desired rank value
required
Returns
Column
The nth value over a window
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 3 , 4 , 5 , 6 ]})
>>> t.mutate(nth= t.values.nth(2 ))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ nth ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 3 │
│ 2 │ 3 │
│ 3 │ 3 │
│ 4 │ 3 │
│ 5 │ 3 │
│ 6 │ 3 │
└────────┴───────┘
>>> t.mutate(nth= t.values.nth(7 ))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ nth ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ NULL │
│ 2 │ NULL │
│ 3 │ NULL │
│ 4 │ NULL │
│ 5 │ NULL │
│ 6 │ NULL │
└────────┴───────┘
ntile
Return the integer number of a partitioning of the column values.
Parameters
buckets
int | ir.IntegerValue
Number of buckets to partition into
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(ntile= t.values.ntile(3 ))
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ ntile ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 1 │
│ 2 │ 1 │
│ 2 │ 2 │
│ 3 │ 2 │
└────────┴───────┘
nullif
Set values to null if they equal the values null_if_expr.
Commonly used to avoid divide-by-zero problems by replacing zero with NULL in the divisor.
Equivalent to (self == null_if_expr).ifelse(ibis.null(), self).
Parameters
null_if_expr
Value
Expression indicating what values should be NULL
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> vals = ibis.examples.penguins.fetch().head(5 ).sex
>>> vals
┏━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male │
│ female │
│ female │
│ NULL │
│ female │
└────────┘
┏━━━━━━━━━━━━━━━━━━━━━┓
┃ NullIf(sex, 'male') ┃
┡━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────────┤
│ NULL │
│ female │
│ female │
│ NULL │
│ female │
└─────────────────────┘
nunique
Compute the number of distinct rows in an expression.
Parameters
where
ir.BooleanValue | None
Filter expression
None
Returns
IntegerScalar
Number of distinct elements in an expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.body_mass_g.nunique()
>>> t.body_mass_g.nunique(where= t.species == "Adelie" )
over
over(window= None , * , rows= None , range = None , group_by= None , order_by= None )
Construct a window expression.
Parameters
window
Window specification
None
rows
Whether to use the ROWS window clause
None
range
Whether to use the RANGE window clause
None
group_by
Grouping key
None
order_by
Ordering key
None
Returns
Value
A window function expression
percent_rank
Return the relative rank of the values in the column.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(percent_rank= t.values.percent_rank())
┏━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ values ┃ percent_rank ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━┩
│ int64 │ float64 │
├────────┼──────────────┤
│ 1 │ 0.0 │
│ 1 │ 0.0 │
│ 2 │ 0.4 │
│ 2 │ 0.4 │
│ 2 │ 0.4 │
│ 3 │ 1.0 │
└────────┴──────────────┘
pipe
pipe(f, / , * args, ** kwargs)
Compose f with self.
Parameters
f
If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword
required
args
Any
Positional arguments to f
()
kwargs
Any
Keyword arguments to f
{}
Examples
>>> import ibis
>>> t = ibis.memtable(
... {
... "a" : [5 , 10 , 15 ],
... "b" : ["a" , "b" , "c" ],
... }
... )
>>> f = lambda a: (a + 1 ).name("a" )
>>> g = lambda a: (a * 2 ).name("a" )
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 12 │
│ 22 │
│ 32 │
└───────┘
>>> result2 = g(f(t.a)) # equivalent to the above
>>> result1.equals(result2)
Returns
Expr
Result type of passed function
preview
preview(
max_rows= None ,
max_length= None ,
max_string= None ,
max_depth= None ,
console_width= None ,
)
Print a subset as a single-column Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive.
Parameters
max_rows
int | None
Maximum number of rows to display
None
max_length
int | None
Maximum length for pretty-printed arrays and maps.
None
max_string
int | None
Maximum length for pretty-printed strings.
None
max_depth
int | None
Maximum depth for nested data types.
None
console_width
int | float | None
Width of the console in characters. If not specified, the width will be inferred from the console.
None
Examples
>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.island.preview(max_rows= 3 , max_string= 5 )
┏━━━━━━━━┓
┃ island ┃
┡━━━━━━━━┩
│ stri… │
├────────┤
│ Torg… │
│ Torg… │
│ Torg… │
│ … │
└────────┘
quantile
quantile(quantile, * , where= None )
Return value at the given quantile.
The output of this method is a continuous quantile if the input is numeric, otherwise the output is a discrete quantile.
Parameters
quantile
float | ir.NumericValue | Sequence [ir.NumericValue | float ]
0 <= quantile <= 1, or an array of such values indicating the quantile or quantiles to compute
required
where
ir.BooleanValue | None
Boolean filter for input values
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
Compute the 99th percentile of bill_depth
>>> t.bill_depth_mm.quantile(0.99 )
┌──────┐
│ 21.1 │
└──────┘
>>> t.group_by(t.species).agg(p99_bill_depth= t.bill_depth_mm.quantile(0.99 )).order_by(
... ibis.desc("p99_bill_depth" )
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ p99_bill_depth ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Adelie │ 21.200 │
│ Chinstrap │ 20.733 │
│ Gentoo │ 17.256 │
└───────────┴────────────────┘
In addition to numeric types, any orderable non-numeric types such as strings and dates work with quantile.
Let’s compute the 99th percentile of the species column
>>> t.group_by(t.island).agg(p99_species= t.species.quantile(0.99 )).order_by(
... ibis.desc("p99_species" )
... )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ island ┃ p99_species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ string │
├───────────┼─────────────┤
│ Biscoe │ Gentoo │
│ Dream │ Chinstrap │
│ Torgersen │ Adelie │
└───────────┴─────────────┘
rank
Compute position of first element within each equal-value group in sorted order.
Equivalent to SQL’s RANK() window function.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values" : [1 , 2 , 1 , 2 , 3 , 2 ]})
>>> t.mutate(rank= t.values.rank())
┏━━━━━━━━┳━━━━━━━┓
┃ values ┃ rank ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├────────┼───────┤
│ 1 │ 0 │
│ 1 │ 0 │
│ 2 │ 2 │
│ 2 │ 2 │
│ 2 │ 2 │
│ 3 │ 5 │
└────────┴───────┘
substitute
substitute(value, replacement= None , else_= None )
Replace values given in values with replacement.
This is similar to the pandas replace method.
Parameters
value
Value | dict
Expression or dict.
required
replacement
Value | None
If an expression is passed to value, this must be passed.
None
else_
Value | None
If an original value does not match value, then else_ is used. The default of None means leave the original value unchanged.
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.island.value_counts().order_by("island" )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ island ┃ island_count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ int64 │
├───────────┼──────────────┤
│ Biscoe │ 168 │
│ Dream │ 124 │
│ Torgersen │ 52 │
└───────────┴──────────────┘
>>> t.island.substitute({"Torgersen" : "torg" , "Biscoe" : "bisc" }).name(
... "island"
... ).value_counts().order_by("island" )
┏━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ island ┃ island_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ int64 │
├────────┼──────────────┤
│ Dream │ 124 │
│ bisc │ 168 │
│ torg │ 52 │
└────────┴──────────────┘
to_csv
to_csv(path, / , * , params= None , ** kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html for details.
Parameters
path
str | Path
The data target. A string or Path where the CSV file will be written.
required
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.csv.CSVWriter
{}
to_delta
to_delta(path, / , * , params= None , ** kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
path
str | Path
The data target. A string or Path to the Delta Lake table directory.
required
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to deltalake.writer.write_deltalake method
{}
to_json
to_json(path, / , ** kwargs)
Write the results of expr to a json file of [{column -> value}, …] objects.
This method is eager and will execute the associated expression immediately.
Parameters
path
str | Path
The data target. A string or Path where the JSON file will be written.
required
kwargs
Any
Additional, backend-specific keyword arguments.
{}
to_list
Convert a column expression to a list.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.bill_length_mm.to_list(limit= 5 )
[39.1, 39.5, 40.3, None, 36.7]
to_pandas
to_pandas(params= None , limit= None , ** kwargs)
Convert a table expression to a pandas DataFrame.
Parameters
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py.
None
kwargs
Any
Keyword arguments
{}
Returns
DataFrame
The result of executing the expression as a pandas DataFrame
to_pandas_batches
to_pandas_batches(limit= None , params= None , chunk_size= 1000000 , ** kwargs)
Execute expression and return an iterator of pandas DataFrames.
This method is eager and will execute the associated expression immediately.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py.
None
params
Mapping [ir.Value, Any ] | None
Mapping of scalar parameter expressions to value.
None
chunk_size
int
Maximum number of rows in each returned DataFrame.
1000000
kwargs
Any
Keyword arguments
{}
to_parquet
to_parquet(path, / , * , params= None , ** kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.
Parameters
path
str | Path
A string or Path where the Parquet file will be written.
required
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.parquet.ParquetWriter
{}
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())
Partition on a single column.
>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by= "year" )
Partition on multiple columns.
>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by= ("year" , "island" ))
to_parquet_dir
to_parquet_dir(directory, / , * , params= None , ** kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html for details.
Parameters
directory
str | Path
The data target. A string or Path to the directory where the parquet file will be written.
required
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.dataset.write_dataset
{}
to_polars
to_polars(params= None , limit= None , ** kwargs)
Execute expression and return results as a polars dataframe.
This method is eager and will execute the associated expression immediately.
Parameters
params
Mapping [ir.Scalar, Any ] | None
Mapping of scalar parameter expressions to value.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py.
None
kwargs
Any
Keyword arguments
{}
Returns
DataFrame
A polars dataframe holding the results of the executed expression.
to_pyarrow_batches
to_pyarrow_batches(limit= None , params= None , chunk_size= 1000000 , ** kwargs)
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py.
None
params
Mapping [ir.Value, Any ] | None
Mapping of scalar parameter expressions to value.
None
chunk_size
int
Maximum number of rows in each returned record batch.
1000000
kwargs
Any
Keyword arguments
{}
Returns
results
RecordBatchReader
to_sql
to_sql(dialect= None , pretty= True , ** kwargs)
Compile to a formatted SQL string.
Parameters
dialect
str | None
SQL dialect to use for compilation. Uses the dialect bound to self if not specified, or the default dialect if no dialect is bound.
None
pretty
bool
Whether to use pretty formatting.
True
kwargs
Scalar parameters
{}
Examples
>>> import ibis
>>> t = ibis.table({"a" : "int" , "b" : "int" }, name= "t" )
>>> expr = t.mutate(c= t.a + t.b)
>>> expr.to_sql()
SELECT
"t0" ."a" ,
"t0" ."b" ,
"t0" ."a" + "t0" ."b" AS "c"
FROM "t" AS "t0"
You can also specify the SQL dialect to use for compilation:
>>> expr.to_sql(dialect= "mysql" )
SELECT
`t0`.`a`,
`t0`.`b`,
`t0`.`a` + `t0`.`b` AS `c`
FROM `t` AS `t0`
to_torch
to_torch(params= None , limit= None , ** kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
params
Mapping [ir.Scalar, Any ] | None
Parameters to substitute into the expression.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None means no limit.
None
kwargs
Any
Keyword arguments passed into the backend’s to_torch implementation.
{}
Returns
dict [str , torch.Tensor]
A dictionary of torch tensors, keyed by column name.
to_xlsx
to_xlsx(path, / , * , sheet= 'Sheet1' , header= False , params= None , ** kwargs)
Write a table to an Excel file.
Parameters
path
str | Path
Excel output path.
required
sheet
str
The name of the sheet to write to, eg ‘Sheet3’.
'Sheet1'
header
bool
Whether to include the column names as the first row.
False
params
Mapping [ir.Scalar, Any ] | None
Additional Ibis expression parameters to pass to the backend’s write function.
None
kwargs
Any
Additional arguments passed to the backend’s write function.
{}
Notes
Requires DuckDB >= 1.2.0.
Examples
>>> import os
>>> import ibis
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table(
... "t" ,
... ibis.memtable({"a" : [1 , 2 , 3 ], "b" : ["a" , "b" , "c" ]}),
... temp= True ,
... )
>>> t.to_xlsx("/tmp/test.xlsx" )
>>> os.path.exists("/tmp/test.xlsx" )
topk
topk(k= None , by= None , * , name= None )
Computes a Table of the top values by a metric (defaults to count).
Parameters
k
int | None
The number of rows to return. If None, all values are returned in descending order.
None
by
ir.Value | None
The metric to compute “top” by. Defaults to count.
None
name
str | None
The name to use for the metric column. If not provided, a suitable name will be generated.
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.diamonds.fetch()
Compute the top 3 diamond colors by frequency:
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color ┃ color_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ int64 │
├────────┼─────────────┤
│ G │ 11292 │
│ E │ 9797 │
│ F │ 9542 │
└────────┴─────────────┘
Compute the top 3 diamond colors by mean price:
>>> t.color.topk(3 , by= t.price.mean())
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ color ┃ Mean(price) ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────────┤
│ J │ 5323.818020 │
│ I │ 5091.874954 │
│ H │ 4486.669196 │
└────────┴─────────────┘
Rank all the colors by max carat:
>>> t.color.topk(k= 5 , by= t.carat.max (), name= "max_carat" )
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ color ┃ max_carat ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │
├────────┼───────────┤
│ J │ 5.01 │
│ H │ 4.13 │
│ I │ 4.01 │
│ D │ 3.40 │
│ E │ 3.05 │
└────────┴───────────┘
try_cast
Try cast expression to indicated data type.
If the cast fails for a row, the value is returned as null or NaN depending on target_type and backend behavior.
Parameters
target_type
Any
Type to try cast to. Anything accepted by ibis.dtype()
required
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"numbers" : [1 , 2 , 3 , 4 ], "strings" : ["1.0" , "2" , "hello" , "world" ]})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓
┃ numbers ┃ strings ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ string │
├─────────┼─────────┤
│ 1 │ 1.0 │
│ 2 │ 2 │
│ 3 │ hello │
│ 4 │ world │
└─────────┴─────────┘
>>> t = t.mutate(numbers_to_strings= _.numbers.try_cast("string" ))
>>> t = t.mutate(strings_to_numbers= _.strings.try_cast("int" ))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ numbers ┃ strings ┃ numbers_to_strings ┃ strings_to_numbers ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ string │ int64 │
├─────────┼─────────┼────────────────────┼────────────────────┤
│ 1 │ 1.0 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 2 │
│ 3 │ hello │ 3 │ NULL │
│ 4 │ world │ 4 │ NULL │
└─────────┴─────────┴────────────────────┴────────────────────┘
type
Return the DataType of self.
Examples
>>> from datetime import datetime
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "int_col" : [1 ],
... "timestamp_col" : [
... datetime(2024 , 11 , 2 , 10 , 5 , 2 ),
... ],
... "string_col" : ["a" ],
... }
... )
>>> t.timestamp_col.type ()
Timestamp(timezone=None, scale=None, nullable=True)
String(length=None, nullable=True)
typeof
Return the string name of the datatype of self.
The values of the returned strings are necessarily backend dependent. e.g. duckdb may say “DOUBLE”, while sqlite may say “real”.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> vals = ibis.examples.penguins.fetch().head(5 ).bill_length_mm
>>> vals
┏━━━━━━━━━━━━━━━━┓
┃ bill_length_mm ┃
┡━━━━━━━━━━━━━━━━┩
│ float64 │
├────────────────┤
│ 39.1 │
│ 39.5 │
│ 40.3 │
│ NULL │
│ 36.7 │
└────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ TypeOf(bill_length_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├────────────────────────┤
│ DOUBLE │
│ DOUBLE │
│ DOUBLE │
│ DOUBLE │
│ DOUBLE │
└────────────────────────┘
Different backends have different names for their native types
>>> ibis.duckdb.connect ().execute(ibis.literal(5.4 ).typeof())
>>> ibis.sqlite.connect ().execute(ibis.literal(5.4 ).typeof())
unbind
Return an expression built on UnboundTable instead of backend-specific objects.
Examples
>>> import ibis
>>> import pandas as pd
>>> duckdb_con = ibis.duckdb.connect ()
>>> polars_con = ibis.polars.connect ()
>>> for backend in (duckdb_con, polars_con):
... t = backend.create_table("t" , pd.DataFrame({"a" : [1 , 2 , 3 ]}))
>>> bound_table = duckdb_con.table("t" )
>>> bound_table.get_backend().name
>>> unbound_table = bound_table.unbind()
>>> polars_con.execute(unbound_table)
value_counts
Compute a frequency table.
Parameters
name
str | None
The name to use for the frequency column. If not provided, a suitable name will be generated.
None
Returns
Table
The frequency table.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"chars" : ["a" , "a" , "b" , "c" , "c" , "c" , "d" , "d" , "d" , "d" ]})
Compute the count of each unique value in “chars”, ordered by “chars”:
>>> t.chars.value_counts().order_by("chars" )
┏━━━━━━━━┳━━━━━━━━━━━━━┓
┃ chars ┃ chars_count ┃
┡━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ int64 │
├────────┼─────────────┤
│ a │ 2 │
│ b │ 1 │
│ c │ 3 │
│ d │ 4 │
└────────┴─────────────┘
Compute the count of each unique value in “chars” as a column named “freq”, ordered by “freq”:
>>> t.chars.value_counts(name= "freq" ).order_by("freq" )
┏━━━━━━━━┳━━━━━━━┓
┃ chars ┃ freq ┃
┡━━━━━━━━╇━━━━━━━┩
│ string │ int64 │
├────────┼───────┤
│ b │ 1 │
│ a │ 2 │
│ c │ 3 │
│ d │ 4 │
└────────┴───────┘
visualize
visualize(
format = 'svg' ,
* ,
label_edges= False ,
verbose= False ,
node_attr= None ,
node_attr_getter= None ,
edge_attr= None ,
edge_attr_getter= None ,
)
Visualize an expression as a GraphViz graph in the browser.
Parameters
format
str
Image output format. These are specified by the graphviz Python library.
'svg'
label_edges
bool
Show operation input names as edge labels
False
verbose
bool
Print the graphviz DOT code to stderr if True
False
node_attr
Mapping [str , str ] | None
Mapping of (attribute, value) pairs set for all nodes. Options are specified by the graphviz Python library.
None
node_attr_getter
NodeAttributeGetter | None
Callback taking a node and returning a mapping of (attribute, value) pairs for that node. Options are specified by the graphviz Python library.
None
edge_attr
Mapping [str , str ] | None
Mapping of (attribute, value) pairs set for all edges. Options are specified by the graphviz Python library.
None
edge_attr_getter
EdgeAttributeGetter | None
Callback taking two adjacent nodes and returning a mapping of (attribute, value) pairs for the edge between those nodes. Options are specified by the graphviz Python library.
None
Examples
Open the visualization of an expression in default browser:
>>> import ibis
>>> import ibis.expr.operations as ops
>>> left = ibis.table(dict (a= "int64" , b= "string" ), name= "left" )
>>> right = ibis.table(dict (b= "string" , c= "int64" , d= "string" ), name= "right" )
>>> expr = left.inner_join(right, "b" ).select(left.a, b= right.c, c= right.d)
>>> expr.visualize(
... format = "svg" ,
... label_edges= True ,
... node_attr= {"fontname" : "Roboto Mono" , "fontsize" : "10" },
... node_attr_getter= lambda node: isinstance (node, ops.Field) and {"shape" : "oval" },
... edge_attr= {"fontsize" : "8" },
... edge_attr_getter= lambda u, v: isinstance (u, ops.Field) and {"color" : "red" },
... ) # quartodoc: +SKIP