Table expressions

Tables are one of the core data structures in Ibis.

Table

Table(self, arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

See the user guide for more info.

Attributes

Name Description
columns The list of column names in this table.

Methods

Name Description
aggregate Aggregate a table with a given set of reductions grouping by by.
alias Create a table expression with a specific name alias.
as_scalar Inform ibis that the table expression should be treated as a scalar.
as_table Promote the expression to a table.
asof_join Perform an “as-of” join between left and right.
cache Cache the provided expression.
cast Cast the columns of a table.
count Compute the number of rows in the table.
cross_join Compute the cross join of a sequence of tables.
difference Compute the set difference of multiple table expressions.
distinct Return a Table with duplicate rows removed.
drop Remove fields from a table.
dropna Remove rows with null values from the table.
fillna Fill null values in a table expression.
filter Select rows from table based on predicates.
get_name Return the fully qualified name of the table.
group_by Create a grouped table expression.
head Select the first n rows of a table.
info Return summary information about a table.
intersect Compute the set intersection of multiple table expressions.
join Perform a join between two tables.
limit Select n rows from self starting at offset.
mutate Add columns to a table expression.
nunique Compute the number of unique rows in the table.
order_by Sort a table by one or more expressions.
pivot_longer Transform a table from wider to longer.
pivot_wider Pivot a table to a wider format.
relabel Deprecated in favor of Table.rename.
relocate Relocate columns before or after other specified columns.
rename Rename columns in the table.
rowid A unique integer per row.
sample Sample a fraction of rows from a table.
schema Return the Schema for this table.
select Compute a new table expression using exprs and named_exprs.
sql Run a SQL query against a table expression.
to_array View a single column table as an array.
to_pandas Convert a table expression to a pandas DataFrame.
try_cast Cast the columns of a table.
union Compute the set union of multiple table expressions.
unpack Project the struct fields of each of columns into self.
view Create a new table expression distinct from the current one.
window_by Create a windowing table-valued function (TVF) expression.

aggregate

aggregate(metrics=(), by=(), having=(), **kwargs)

Aggregate a table with a given set of reductions grouping by by.

Parameters

Name Type Description Default
metrics Sequence[ir.Scalar] | None Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum or literal values like ibis.literal(1). ()
by Sequence[ir.Value] | None Grouping expressions. ()
having Sequence[ir.BooleanValue] | None Post-aggregation filters. The shape requirements are the same metrics, but the output type for having is boolean. ::: {.callout-warning} ## Expressions like x is None return bool and will not generate a SQL comparison to NULL ::: ()
kwargs ir.Value Named aggregate expressions {}

Returns

Type Description
Table An aggregate table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit   price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64 │
├────────┼─────────┤
│ apple 0.50 │
│ apple 0.50 │
│ banana0.25 │
│ orange0.33 │
└────────┴─────────┘
>>> t.aggregate(
...     by=["fruit"],
...     total_cost=_.price.sum(),
...     avg_cost=_.price.mean(),
...     having=_.price.sum() < 0.5,
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit   total_cost  avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├────────┼────────────┼──────────┤
│ banana0.250.25 │
│ orange0.330.33 │
└────────┴────────────┴──────────┘

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters

Name Type Description Default
alias str Name of the child expression required

Returns

Type Description
Table An table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

Type Description
Scalar A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.group_by("species").count()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species  CountStar() ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringint64       │
├─────────┼─────────────┤
│ Adelie 44 │
│ Gentoo 124 │
└─────────┴─────────────┘

as_table

as_table()

Promote the expression to a table.

This method is a no-op for table expressions.

Returns

Type Description
Table A table expression

Examples

>>> t = ibis.table(dict(a="int"), name="t")
>>> s = t.as_table()
>>> t is s
True

asof_join

asof_join(left, right, on, predicates=(), by=(), tolerance=None, *, lname='', rname='{name}_right')

Perform an “as-of” join between left and right.

Similar to a left join except that the match is done on nearest key rather than equal keys.

Optionally, match keys with by before joining with predicates.

Parameters

Name Type Description Default
left Table Table expression required
right Table Table expression required
on str | ir.BooleanColumn Closest match inequality condition required
predicates str | ir.Column | Sequence[str | ir.Column] Additional join predicates ()
by str | ir.Column | Sequence[str | ir.Column] Additional equality join predicates ()
tolerance str | ir.IntervalScalar | None Amount of time to look behind when joining None
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Returns

Type Description
Table Table expression

cache

cache()

Cache the provided expression.

All subsequent operations on the returned expression will be performed on the cached data. Use the with statement to limit the lifetime of a cached table.

This method is idempotent: calling it multiple times in succession will return the same value as the first call.

This method eagerly evaluates the expression prior to caching

Subsequent evaluations will not recompute the expression so method chaining will not incur the overhead of caching more than once.

Returns

Type Description
Table Cached table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> cached_penguins = t.mutate(computation="Heavy Computation").cache()
>>> cached_penguins
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   computation       ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64string            │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼───────────────────┤
│ Adelie Torgersen39.118.71813750male  2007Heavy Computation │
│ Adelie Torgersen39.517.41863800female2007Heavy Computation │
│ Adelie Torgersen40.318.01953250female2007Heavy Computation │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007Heavy Computation │
│ Adelie Torgersen36.719.31933450female2007Heavy Computation │
│ Adelie Torgersen39.320.61903650male  2007Heavy Computation │
│ Adelie Torgersen38.917.81813625female2007Heavy Computation │
│ Adelie Torgersen39.219.61954675male  2007Heavy Computation │
│ Adelie Torgersen34.118.11933475NULL2007Heavy Computation │
│ Adelie Torgersen42.020.21904250NULL2007Heavy Computation │
│                  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴───────────────────┘

Explicit cache cleanup

>>> with t.mutate(computation="Heavy Computation").cache() as cached_penguins:
...     cached_penguins

cast

cast(schema)

Cast the columns of a table.

Similar to pandas.DataFrame.astype.

If you need to cast columns to a single type, use selectors.

Parameters

Name Type Description Default
schema SchemaLike Mapping, schema or iterable of pairs to use for casting required

Returns

Type Description
Table Casted table

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  int64
  body_mass_g        int64
  sex                string
  year               int64
}
>>> cols = ["body_mass_g", "bill_length_mm"]
>>> t[cols].head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g  bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ int64float64        │
├─────────────┼────────────────┤
│        375039.1 │
│        380039.5 │
│        325040.3 │
│        NULLNULL │
│        345036.7 │
└─────────────┴────────────────┘

Columns not present in the input schema will be passed through unchanged

>>> t.columns
['species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year']
>>> expr = t.cast({"body_mass_g": "float64", "bill_length_mm": "int"})
>>> expr.select(*cols).head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g  bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64int64          │
├─────────────┼────────────────┤
│      3750.039 │
│      3800.040 │
│      3250.040 │
│        NULLNULL │
│      3450.037 │
└─────────────┴────────────────┘

Columns that are in the input schema but not in the table raise an error

>>> t.cast({"foo": "string"})  
IbisError: Cast schema has fields that are not in the table: ['foo']

count

count(where=None)

Compute the number of rows in the table.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression to filter rows when counting. None

Returns

Type Description
IntegerScalar Number of rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "baz"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘
>>> t.count()

3
>>> t.count(t.a != "foo")

2
>>> type(t.count())
ibis.expr.types.numeric.IntegerScalar

cross_join

cross_join(left, right, *rest, lname='', rname='{name}_right')

Compute the cross join of a sequence of tables.

Parameters

Name Type Description Default
left Table Left table required
right Table Right table required
rest Table Additional tables to cross join ()
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Returns

Type Description
Table Cross join of left, right and rest

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.count()

344
>>> agg = t.drop("year").agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   bill_length_mm_right  bill_depth_mm_right  flipper_length_mm_right  body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64float64float64float64float64           │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie Torgersen39.118.71813750male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.517.41863800female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen40.318.01953250female200743.9219317.15117200.9152054201.754386 │
│ Adelie TorgersenNULLNULLNULLNULLNULL200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen36.719.31933450female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.320.61903650male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen38.917.81813625female200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen39.219.61954675male  200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen34.118.11933475NULL200743.9219317.15117200.9152054201.754386 │
│ Adelie Torgersen42.020.21904250NULL200743.9219317.15117200.9152054201.754386 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘
>>> expr.columns
['species',
 'island',
 'bill_length_mm',
 'bill_depth_mm',
 'flipper_length_mm',
 'body_mass_g',
 'sex',
 'year',
 'bill_length_mm_right',
 'bill_depth_mm_right',
 'flipper_length_mm_right',
 'body_mass_g_right']
>>> expr.count()

344

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only diff distinct rows not occurring in the calling table True

See Also

ibis.difference

Returns

Type Description
Table The rows present in self that are not present in tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.difference(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

Name Type Description Default
on str | Iterable[str] | s.Selector | None Only consider certain columns for identifying duplicates. By default deduplicate all of the columns. None
keep Literal[‘first’, ‘last’] | None Determines which duplicates to keep. - "first": Drop duplicates except for the first occurrence. - "last": Drop duplicates except for the last occurrence. - None: Drop all duplicates 'first'

Examples

>>> import ibis
>>> import ibis.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species    island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringstring    │
├───────────┼───────────┤
│ Adelie   Biscoe    │
│ Adelie   Dream     │
│ Adelie   Torgersen │
│ ChinstrapDream     │
│ Gentoo   Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   37.818.31743400female2007 │
│ Adelie   Dream    39.516.71783250female2007 │
│ Adelie   Torgersen39.118.71813750male  2007 │
│ ChinstrapDream    46.517.91923500female2007 │
│ Gentoo   Biscoe   46.113.22114500female2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   42.718.31964075male  2009 │
│ Adelie   Dream    41.518.52014000male  2009 │
│ Adelie   Torgersen43.119.21973500male  2009 │
│ ChinstrapDream    50.218.71983775female2009 │
│ Gentoo   Biscoe   49.916.12135400male  2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()

273
>>> t.count()

344

You can pass selectors to on

>>> t.distinct(on=~s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   37.718.71803600male  2007 │
│ Gentoo   Biscoe   50.016.32305700male  2007 │
│ Gentoo   Biscoe   44.514.32164100NULL2007 │
│ Adelie   Biscoe   37.818.31743400female2007 │
│ Gentoo   Biscoe   46.113.22114500female2007 │
│ Adelie   Torgersen39.517.41863800female2007 │
│ Adelie   Dream    39.516.71783250female2007 │
│ ChinstrapDream    50.019.51963900male  2007 │
│ Adelie   Torgersen39.118.71813750male  2007 │
│ Adelie   TorgersenNULLNULLNULLNULLNULL2007 │
│  │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

The only valid values of keep are "first", "last" and [`None][None]

>>> t.distinct(on="species", keep="second")  
IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

drop

drop(*fields)

Remove fields from a table.

Parameters

Name Type Description Default
fields str | Selector Fields to drop. Strings and selectors are accepted. ()

Returns

Type Description
Table A table with all columns matching fields removed.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop one or more columns

>>> t.drop("species").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringfloat64float64int64int64stringint64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen39.118.71813750male  2007 │
│ Torgersen39.517.41863800female2007 │
│ Torgersen40.318.01953250female2007 │
│ TorgersenNULLNULLNULLNULLNULL2007 │
│ Torgersen36.719.31933450female2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop("species", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringfloat64int64int64stringint64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen18.71813750male  2007 │
│ Torgersen17.41863800female2007 │
│ Torgersen18.01953250female2007 │
│ TorgersenNULLNULLNULLNULL2007 │
│ Torgersen19.31933450female2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop with selectors, mix and match

>>> import ibis.selectors as s
>>> t.drop("species", s.startswith("bill_")).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island     flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringint64int64stringint64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen1813750male  2007 │
│ Torgersen1863800female2007 │
│ Torgersen1953250female2007 │
│ TorgersenNULLNULLNULL2007 │
│ Torgersen1933450female2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘

dropna

dropna(subset=None, how='any')

Remove rows with null values from the table.

Parameters

Name Type Description Default
subset Sequence[str] | str | None Columns names to consider when dropping nulls. By default all columns are considered. None
how Literal[‘any’, ‘all’] Determine whether a row is removed if there is at least one null value in the row ('any'), or if all row values are null ('all'). 'any'

Returns

Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.count()

344
>>> t.dropna(["bill_length_mm", "body_mass_g"]).count()

342
>>> t.dropna(how="all").count()  # no rows where all columns are null

344

fillna

fillna(replacements)

Fill null values in a table expression.

There is potential lack of type stability with the fillna API

For example, different library versions may impact whether a given backend promotes integer replacement values to floats.

Parameters

Name Type Description Default
replacements ir.Scalar | Mapping[str, ir.Scalar] Value with which to fill nulls. If replacements is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value. required

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.sex
┏━━━━━━━━┓
┃ sex    ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male   │
│ female │
│ female │
│ NULL   │
│ female │
│ male   │
│ female │
│ male   │
│ NULL   │
│ NULL   │
│       │
└────────┘
>>> t.fillna({"sex": "unrecorded"}).sex
┏━━━━━━━━━━━━┓
┃ sex        ┃
┡━━━━━━━━━━━━┩
│ string     │
├────────────┤
│ male       │
│ female     │
│ female     │
│ unrecorded │
│ female     │
│ male       │
│ female     │
│ male       │
│ unrecorded │
│ unrecorded │
│           │
└────────────┘

Returns

Type Description
Table Table expression

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

Name Type Description Default
predicates ir.BooleanValue | Sequence[ir.BooleanValue] | IfAnyAll Boolean value expressions used to select rows in table. ()

Returns

Type Description
Table Filtered table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().dropna(
...     "sex"
... ).order_by("sex")
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex     sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ stringint64     │
├────────┼───────────┤
│ female22 │
│ male  68 │
└────────┴───────────┘

get_name

get_name()

Return the fully qualified name of the table.

group_by

group_by(by=(), **key_exprs)

Create a grouped table expression.

Similar to SQL’s GROUP BY statement, or pandas .groupby() method.

Parameters

Name Type Description Default
by str | ir.Value | Iterable[str] | Iterable[ir.Value] | None Grouping expressions ()
key_exprs str | ir.Value | Iterable[str] | Iterable[ir.Value] Named grouping expressions {}

Returns

Type Description
GroupedTable A grouped table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "fruit": ["apple", "apple", "banana", "orange"],
...         "price": [0.5, 0.5, 0.25, 0.33],
...     }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit   price   ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64 │
├────────┼─────────┤
│ apple 0.50 │
│ apple 0.50 │
│ banana0.25 │
│ orange0.33 │
└────────┴─────────┘
>>> t.group_by("fruit").agg(total_cost=_.price.sum(), avg_cost=_.price.mean()).order_by(
...     "fruit"
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit   total_cost  avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├────────┼────────────┼──────────┤
│ apple 1.000.50 │
│ banana0.250.25 │
│ orange0.330.33 │
└────────┴────────────┴──────────┘

head

head(n=5)

Select the first n rows of a table.

The result set is not deterministic without a call to order_by.

Parameters

Name Type Description Default
n int Number of rows to include 5

Returns

Type Description
Table self limited to n rows

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
│     2a      │
└───────┴────────┘
>>> t.head(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
└───────┴────────┘

See Also

Table.limit Table.order_by

info

info()

Return summary information about a table.

Returns

Type Description
Table Summary of self

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.info()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┓
┃ name               type     nullable  nulls  non_nulls  null_frac  pos  ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━┩
│ stringstringbooleanint64int64float64int8 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼──────┤
│ species          string  │ True     │     03440.0000000 │
│ island           string  │ True     │     03440.0000001 │
│ bill_length_mm   float64 │ True     │     23420.0058142 │
│ bill_depth_mm    float64 │ True     │     23420.0058143 │
│ flipper_length_mmint64   │ True     │     23420.0058144 │
│ body_mass_g      int64   │ True     │     23420.0058145 │
│ sex              string  │ True     │    113330.0319776 │
│ year             int64   │ True     │     03440.0000007 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴──────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows True

Returns

Type Description
Table A new table containing the intersection of all input tables.

See Also

ibis.intersect

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.intersect(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

join

join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')

Perform a join between two tables.

Parameters

Name Type Description Default
left Table Left table to join required
right Table Right table to join required
predicates str | Sequence[str | ir.BooleanColumn | Literal[True] | Literal[False] | tuple[str | ir.Column | ir.Deferred, str | ir.Column | ir.Deferred]] Condition(s) to join on. See examples for details. ()
how JoinKind Join method, e.g. "inner" or "left". 'inner'
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId  title                               genres                                      ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64stringstring                                      │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│       1Toy Story (1995)                  Adventure|Animation|Children|Comedy|Fantasy │
│       2Jumanji (1995)                    Adventure|Children|Fantasy                  │
│       3Grumpier Old Men (1995)           Comedy|Romance                              │
│       4Waiting to Exhale (1995)          Comedy|Drama|Romance                        │
│       5Father of the Bride Part II (1995)Comedy                                      │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64int64float64 │
├────────┼─────────┼─────────┤
│      114.0 │
│      134.0 │
│      164.0 │
│      1475.0 │
│      1505.0 │
└────────┴─────────┴─────────┘

Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).

>>> ratings.join(movies, "movieId", how="left").head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   movieId_right  title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64int64stringstring                                      │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.01Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.03Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.06Heat (1995)                Action|Crime|Thriller                       │
│      1475.047Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.050Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.

>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64stringstring                                      │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.0Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.0Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.0Heat (1995)                Action|Crime|Thriller                       │
│      1475.0Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.0Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId  movieId  tag              timestamp  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64int64stringint64      │
├────────┼─────────┼─────────────────┼────────────┤
│      260756funny          1445714994 │
│      260756Highly quotable1445714996 │
│      260756will ferrell   1445714992 │
│      289774Boxing story   1445715207 │
│      289774MMA            1445715200 │
└────────┴─────────┴─────────────────┴────────────┘

You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:

>>> tags.join(ratings, ["userId", "movieId"]).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  tag              timestamp   rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64int64stringint64float64 │
├────────┼─────────┼─────────────────┼────────────┼─────────┤
│      260756will ferrell   14457149925.0 │
│      289774Tom Hardy      14457152055.0 │
│      2106782Martin Scorsese14457150565.0 │
│      748516way too long   11696873251.0 │
│     18431mafia          14621387554.0 │
└────────┴─────────┴─────────────────┴────────────┴─────────┘

To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.

For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be

  • a Column
  • Deferred expression
  • lambda of the form (Table) -> Column

For example, to find all movies pairings that received the same (ignoring case) tags:

>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
...     view,
...     [
...         movie_tags.movieId != view.movieId,
...         (_.tag.lower(), lambda t: t.tag.lower()),
...     ],
... ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId  tag                movieId_right  tag_right         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64string            │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│   60756funny            1732funny             │
│   60756Highly quotable  1732Highly quotable   │
│   89774Tom Hardy        139385tom hardy         │
│  106782drugs            1732drugs             │
│  106782Leonardo DiCaprio5989Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.

Parameters

Name Type Description Default
n int | None Number of rows to include. If None, the entire table is selected starting from offset. required
offset int Number of rows to skip first 0

Returns

Type Description
Table The first n rows of self starting at offset

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
│     2a      │
└───────┴────────┘
>>> t.limit(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2a      │
└───────┴────────┘

See Also

Table.order_by

mutate

mutate(*exprs, **mutations)

Add columns to a table expression.

Parameters

Name Type Description Default
exprs Sequence[ir.Expr] | None List of named expressions to add as columns ()
mutations ir.Value Named expressions using keyword arguments {}

Returns

Type Description
Table Table expression with additional columns

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().select("species", "year", "bill_length_mm")
>>> t
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species  year   bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringint64float64        │
├─────────┼───────┼────────────────┤
│ Adelie 200739.1 │
│ Adelie 200739.5 │
│ Adelie 200740.3 │
│ Adelie 2007NULL │
│ Adelie 200736.7 │
│ Adelie 200739.3 │
│ Adelie 200738.9 │
│ Adelie 200739.2 │
│ Adelie 200734.1 │
│ Adelie 200742.0 │
│  │
└─────────┴───────┴────────────────┘

Add a new column from a per-element expression

>>> t.mutate(next_year=_.year + 1).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species  year   bill_length_mm  next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringint64float64int64     │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie 200739.12008 │
│ Adelie 200739.52008 │
│ Adelie 200740.32008 │
│ Adelie 2007NULL2008 │
│ Adelie 200736.72008 │
└─────────┴───────┴────────────────┴───────────┘

Add a new column based on an aggregation. Note the automatic broadcasting.

>>> t.select("species", bill_demean=_.bill_length_mm - _.bill_length_mm.mean()).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species  bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ stringfloat64     │
├─────────┼─────────────┤
│ Adelie -4.82193 │
│ Adelie -4.42193 │
│ Adelie -3.62193 │
│ Adelie NULL │
│ Adelie -7.22193 │
└─────────┴─────────────┘

Mutate across multiple columns

>>> t.mutate(s.across(s.numeric() & ~s.c("year"), _ - _.mean())).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species  year   bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringint64float64        │
├─────────┼───────┼────────────────┤
│ Adelie 2007-4.82193 │
│ Adelie 2007-4.42193 │
│ Adelie 2007-3.62193 │
│ Adelie 2007NULL │
│ Adelie 2007-7.22193 │
└─────────┴───────┴────────────────┘

nunique

nunique(where=None)

Compute the number of unique rows in the table.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression to filter rows when counting. None

Returns

Type Description
IntegerScalar Number of unique rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "bar"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ bar    │
└────────┘
>>> t.nunique()

2
>>> t.nunique(t.a != "foo")

1

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

Name Type Description Default
by str | ir.Column | s.Selector | Sequence[str] | Sequence[ir.Column] | Sequence[s.Selector] | None Expressions to sort the table by. ()

Returns

Type Description
Table Sorted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     2B     6 │
│     3D     7 │
│     3a     4 │
│     1c     5 │
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(ibis.desc("b"))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     2B     6 │
│     3D     7 │
│     3a     4 │
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     2B     6 │
│     3D     7 │
│     1c     5 │
│     3a     4 │
└───────┴────────┴───────┘

pivot_longer

pivot_longer(col, *, names_to='name', names_pattern='(.+)', names_transform=None, values_to='value', values_transform=None)

Transform a table from wider to longer.

Parameters

Name Type Description Default
col str | s.Selector String column name or selector. required
names_to str | Iterable[str] A string or iterable of strings indicating how to name the new pivoted columns. 'name'
names_pattern str | re.Pattern Pattern to use to extract column names from the input. By default the entire column name is extracted. '(.+)'
names_transform Callable[[str], ir.Value] | Mapping[str, Callable[[str], ir.Value]] | None Function or mapping of a name in names_to to a function to transform a column name to a value. None
values_to str Name of the pivoted value column. 'value'
values_transform Callable[[ir.Value], ir.Value] | Deferred | None Apply a function to the value column. This can be a lambda or deferred expression. None

Returns

Type Description
Table Pivoted table

Examples

Basic usage

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> relig_income = ibis.examples.relig_income_raw.fetch()
>>> relig_income
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion                 <$10k  $10-20k  $20-30k  $30-40k  $40-50k  $50-75k  $75-100k  $100-150k  >150k  Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ stringint64int64int64int64int64int64int64int64int64int64              │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic               27346081761371221098496 │
│ Atheist                12273752357073597476 │
│ Buddhist               27213034335862395354 │
│ Catholic               41861773267063811169497926331489 │
│ Don’t know/refused     151415111035211718116 │
│ Evangelical Prot       575869106498288114869497234141529 │
│ Hindu                  1979113447485437 │
│ Historically Black Prot2282442362381972231318178339 │
│ Jehovah's Witness      2027242421301511637 │
│ Jewish                 1919252530956987151162 │
│  │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘

Here we convert column names not matching the selector for the religion column and convert those names into values

>>> relig_income.pivot_longer(~s.c("religion"), names_to="income", values_to="count")
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion  income              count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ stringstringint64 │
├──────────┼────────────────────┼───────┤
│ Agnostic<$10k             27 │
│ Agnostic$10-20k           34 │
│ Agnostic$20-30k           60 │
│ Agnostic$30-40k           81 │
│ Agnostic$40-50k           76 │
│ Agnostic$50-75k           137 │
│ Agnostic$75-100k          122 │
│ Agnostic$100-150k         109 │
│ Agnostic>150k             84 │
│ AgnosticDon't know/refused96 │
│  │
└──────────┴────────────────────┴───────┘

Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to value.

>>> world_bank_pop = ibis.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country  indicator    2000          2001          2002          2003          2004          2005          2006          2007          2008          2009          2010          2011          2012          2013          2014          2015          2016          2017         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ stringstringfloat64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64float64      │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW    SP.URB.TOTL4.162500e+044.202500e+044.219400e+044.227700e+044.231700e+044.239900e+044.255500e+044.272900e+044.290600e+044.307900e+044.320600e+044.349300e+044.386400e+044.422800e+044.458800e+044.494300e+044.529700e+044.564800e+04 │
│ ABW    SP.URB.GROW1.664222e+009.563731e-014.013352e-011.965172e-019.456936e-021.935880e-013.672580e-014.080490e-014.133830e-014.023963e-012.943735e-016.620631e-018.493932e-018.264135e-018.106692e-017.930256e-017.845785e-017.718989e-01 │
│ ABW    SP.POP.TOTL8.910100e+049.069100e+049.178100e+049.270100e+049.354000e+049.448300e+049.560600e+049.678700e+049.799600e+049.921200e+041.003410e+051.012880e+051.021120e+051.028800e+051.035940e+051.042570e+051.048740e+051.054390e+05 │
│ ABW    SP.POP.GROW2.539234e+001.768757e+001.194718e+009.973955e-019.009892e-011.003077e+001.181566e+001.227711e+001.241397e+001.233231e+001.131541e+009.393559e-018.102306e-017.493010e-016.916153e-016.379592e-015.900625e-015.372957e-01 │
│ AFE    SP.URB.TOTL1.155517e+081.197755e+081.242275e+081.288340e+081.336475e+081.387456e+081.440267e+081.492313e+081.553838e+081.617762e+081.684561e+081.754157e+081.825587e+081.901087e+081.980733e+082.065563e+082.150833e+082.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
>>> world_bank_pop.pivot_longer(s.matches(r"\d{4}"), names_to="year").head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country  indicator    year    value   ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ stringstringstringfloat64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW    SP.URB.TOTL2000  41625.0 │
│ ABW    SP.URB.TOTL2001  42025.0 │
│ ABW    SP.URB.TOTL2002  42194.0 │
│ ABW    SP.URB.TOTL2003  42277.0 │
│ ABW    SP.URB.TOTL2004  42317.0 │
└─────────┴─────────────┴────────┴─────────┘

pivot_longer has some preprocessing capabiltiies like stripping a prefix and applying a function to column names

>>> billboard = ibis.examples.billboard.fetch()
>>> billboard
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist          track                    date_entered  wk1    wk2    wk3    wk4    wk5    wk6    wk7    wk8    wk9    wk10   wk11   wk12   wk13   wk14   wk15   wk16   wk17   wk18   wk19   wk20   wk21   wk22   wk23   wk24   wk25   wk26   wk27   wk28   wk29   wk30   wk31   wk32   wk33   wk34   wk35   wk36   wk37   wk38   wk39   wk40   wk41   wk42   wk43   wk44   wk45   wk46   wk47   wk48   wk49   wk50   wk51   wk52   wk53   wk54   wk55   wk56   wk57   wk58   wk59   wk60   wk61   wk62   wk63   wk64   wk65   wk66    wk67    wk68    wk69    wk70    wk71    wk72    wk73    wk74    wk75    wk76   ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringstringdateint64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64stringstringstringstringstringstringstringstringstringstringstring │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac         Baby Don't Cry (Keep...2000-02-2687827277879499NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 2Ge+her       The Hardest Part Of ...2000-09-02918792NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 3 Doors Down  Kryptonite             2000-04-088170686766575453515151514744382822181814127666554444333455991514131416172122242833424249NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 3 Doors Down  Loser                  2000-10-217676726967655559626161596166727675677370NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 504 Boyz      Wobble Wobble          2000-04-15573425171731364953576470757678859296NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ 98^0          Give Me Just One Nig...2000-08-1951393426261922367222936476766849394NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ A*Teens       Dancing Queen          2000-07-0897979695100NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Aaliyah       I Don't Wanna          2000-01-298462514138353538383637373849616362678386NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Aaliyah       Try Again              2000-03-1859533828211816141210986122223455691314162322333643NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│ Adams, YolandaOpen My Heart          2000-08-267676746968676158575966686167596367717989NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL   │
│       │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
>>> billboard.pivot_longer(
...     s.startswith("wk"),
...     names_to="week",
...     names_pattern=r"wk(.+)",
...     names_transform=int,
...     values_to="rank",
...     values_transform=_.cast("int"),
... ).dropna("rank")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist   track                    date_entered  week  rank  ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ stringstringdateint8int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26187 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26282 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26372 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26477 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26587 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26694 │
│ 2 Pac  Baby Don't Cry (Keep...2000-02-26799 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02191 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02287 │
│ 2Ge+herThe Hardest Part Of ...2000-09-02392 │
│  │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘

You can use regular expression capture groups to extract multiple variables stored in column names

>>> who = ibis.examples.who.fetch()
>>> who
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country      iso2    iso3    year   new_sp_m014  new_sp_m1524  new_sp_m2534  new_sp_m3544  new_sp_m4554  new_sp_m5564  new_sp_m65  new_sp_f014  new_sp_f1524  new_sp_f2534  new_sp_f3544  new_sp_f4554  new_sp_f5564  new_sp_f65  new_sn_m014  new_sn_m1524  new_sn_m2534  new_sn_m3544  new_sn_m4554  new_sn_m5564  new_sn_m65  new_sn_f014  new_sn_f1524  new_sn_f2534  new_sn_f3544  new_sn_f4554  new_sn_f5564  new_sn_f65  new_ep_m014  new_ep_m1524  new_ep_m2534  new_ep_m3544  new_ep_m4554  new_ep_m5564  new_ep_m65  new_ep_f014  new_ep_f1524  new_ep_f2534  new_ep_f3544  new_ep_f4554  new_ep_f5564  new_ep_f65  newrel_m014  newrel_m1524  newrel_m2534  newrel_m3544  newrel_m4554  newrel_m5564  newrel_m65  newrel_f014  newrel_f1524  newrel_f2534  newrel_f3544  newrel_f4554  newrel_f5564  newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringstringint64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64      │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ AfghanistanAF    AFG   1980NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1981NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1982NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1983NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1984NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1985NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1986NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1987NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1988NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│ AfghanistanAF    AFG   1989NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL │
│  │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘
>>> len(who.columns)
60
>>> who.pivot_longer(
...     s.r["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     values_to="count",
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country      iso2    iso3    year   diagnosis  gender  age     count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringstringint64stringstringstringint64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ AfghanistanAF    AFG   1980sp       m     014   NULL │
│ AfghanistanAF    AFG   1980sp       m     1524  NULL │
│ AfghanistanAF    AFG   1980sp       m     2534  NULL │
│ AfghanistanAF    AFG   1980sp       m     3544  NULL │
│ AfghanistanAF    AFG   1980sp       m     4554  NULL │
│ AfghanistanAF    AFG   1980sp       m     5564  NULL │
│ AfghanistanAF    AFG   1980sp       m     65    NULL │
│ AfghanistanAF    AFG   1980sp       f     014   NULL │
│ AfghanistanAF    AFG   1980sp       f     1524  NULL │
│ AfghanistanAF    AFG   1980sp       f     2534  NULL │
│  │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘

names_transform is flexible, and can be:

1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name

Let’s recode gender and age to numeric values using a mapping

>>> who.pivot_longer(
...     s.r["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_(.)(.*)",
...     names_transform=dict(
...         gender={"m": 1, "f": 2}.get,
...         age=dict(
...             zip(
...                 ["014", "1524", "2534", "3544", "4554", "5564", "65"],
...                 range(7),
...             )
...         ).get,
...     ),
...     values_to="count",
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country      iso2    iso3    year   diagnosis  gender  age   count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ stringstringstringint64stringint8int8int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ AfghanistanAF    AFG   1980sp       10NULL │
│ AfghanistanAF    AFG   1980sp       11NULL │
│ AfghanistanAF    AFG   1980sp       12NULL │
│ AfghanistanAF    AFG   1980sp       13NULL │
│ AfghanistanAF    AFG   1980sp       14NULL │
│ AfghanistanAF    AFG   1980sp       15NULL │
│ AfghanistanAF    AFG   1980sp       16NULL │
│ AfghanistanAF    AFG   1980sp       20NULL │
│ AfghanistanAF    AFG   1980sp       21NULL │
│ AfghanistanAF    AFG   1980sp       22NULL │
│  │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘

The number of match groups in names_pattern must match the length of names_to

>>> who.pivot_longer(  
...     s.r["new_sp_m014":"newrel_f65"],
...     names_to=["diagnosis", "gender", "age"],
...     names_pattern="new_?(.*)_.(.*)",
... )
IbisInputError: Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)

names_transform must be a mapping or callable

>>> who.pivot_longer(
...     s.r["new_sp_m014":"newrel_f65"], names_transform="upper"
... )  # quartodoc: +EXPECTED_FAILURE
IbisTypeError: `names_transform` must be a mapping or callable. Got <class 'str'>

pivot_wider

pivot_wider(id_cols=None, names_from='name', names_prefix='', names_sep='_', names_sort=False, names=None, values_from='value', values_fill=None, values_agg='arbitrary')

Pivot a table to a wider format.

Parameters

Name Type Description Default
id_cols s.Selector | None A set of columns that uniquely identify each observation. None
names_from str | Iterable[str] | s.Selector An argument describing which column or columns to use to get the name of the output columns. 'name'
names_prefix str String added to the start of every column name. ''
names_sep str If names_from or values_from contains multiple columns, this argument will be used to join their values together into a single string to use as a column name. '_'
names_sort bool If True columns are sorted. If False column names are ordered by appearance. False
names Iterable[str] | None An explicit sequence of values to look for in columns matching names_from. * When this value is None, the values will be computed from names_from. * When this value is not None, each element’s length must match the length of names_from. See examples below for more detail. None
values_from str | Iterable[str] | s.Selector An argument describing which column or columns to get the cell values from. 'value'
values_fill int | float | str | ir.Scalar | None A scalar value that specifies what each value should be filled with when missing. None
values_agg str | Callable[[ir.Value], ir.Scalar] | Deferred A function applied to the value in each cell in the output. 'arbitrary'

Returns

Type Description
Table Wider pivoted table

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True

Basic usage

>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish   station  seen  ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼─────────┼───────┤
│  4842Release1 │
│  4842I80_1  1 │
│  4842Lisbon 1 │
│  4842Rstr   1 │
│  4842Base_TD1 │
│  4842BCE    1 │
│  4842BCW    1 │
│  4842BCE2   1 │
│  4842BCW2   1 │
│  4842MAE    1 │
│      │
└───────┴─────────┴───────┘
>>> fish_encounters.pivot_wider(names_from="station", values_from="seen")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish   Lisbon  Base_TD  MAE    MAW    Release  I80_1  BCE    BCE2   Rstr   BCW    BCW2  ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64int64int64int64int64int64int64int64int64int64 │
├───────┼────────┼─────────┼───────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┤
│  484311111111111 │
│  48481NULLNULLNULL11NULLNULL1NULLNULL │
│  48651NULLNULLNULL11NULLNULLNULLNULLNULL │
│  484411111111111 │
│  484511NULLNULL11NULLNULL1NULLNULL │
│  4849NULLNULLNULLNULL11NULLNULLNULLNULLNULL │
│  485911NULLNULL11NULLNULL1NULLNULL │
│  486111111111111 │
│  484211111111111 │
│  48471NULLNULLNULL11NULLNULLNULLNULLNULL │
│      │
└───────┴────────┴─────────┴───────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┘

Fill missing pivoted values using values_fill

>>> fish_encounters.pivot_wider(
...     names_from="station", values_from="seen", values_fill=0
... )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish   Lisbon  Base_TD  MAE    MAW    Release  I80_1  BCE    BCE2   Rstr   BCW    BCW2  ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64int64int64int64int64int64int64int64int64int64int64int64 │
├───────┼────────┼─────────┼───────┼───────┼─────────┼───────┼───────┼───────┼───────┼───────┼───────┤
│  484311111111111 │
│  484810001100100 │
│  486510001100000 │
│  484411111111111 │
│  484511001100100 │
│  484900001100000 │
│  485911001100100 │
│  486111111111111 │
│  484211111111111 │
│  484710001100000 │
│      │
└───────┴────────┴─────────┴───────┴───────┴─────────┴───────┴───────┴───────┴───────┴───────┴───────┘

Compute multiple values columns

>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid   name        variable  estimate  moe   ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01    Alabama   income  24476136 │
│ 01    Alabama   rent    7473 │
│ 02    Alaska    income  32940508 │
│ 02    Alaska    rent    120013 │
│ 04    Arizona   income  27517148 │
│ 04    Arizona   rent    9724 │
│ 05    Arkansas  income  23789165 │
│ 05    Arkansas  rent    7095 │
│ 06    Californiaincome  29454109 │
│ 06    Californiarent    13583 │
│  │
└────────┴────────────┴──────────┴──────────┴───────┘
>>> us_rent_income.pivot_wider(
...     names_from="variable", values_from=["estimate", "moe"]
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid   name          estimate_income  moe_income  estimate_rent  moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringint64int64int64int64    │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05    Arkansas    237891657095 │
│ 06    California  2945410913583 │
│ 13    Georgia     270241069273 │
│ 15    Hawaii      32453218150718 │
│ 16    Idaho       252982087927 │
│ 30    Montana     262492067519 │
│ 38    North Dakota323362457759 │
│ 39    Ohio        27435947642 │
│ 40    Oklahoma    262071017663 │
│ 47    Tennessee   254531028084 │
│  │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘

The column name separator can be changed using the names_sep parameter

>>> us_rent_income.pivot_wider(
...     names_from="variable",
...     names_sep=".",
...     values_from=("estimate", "moe"),
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid   name          estimate.income  moe.income  estimate.rent  moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringint64int64int64int64    │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05    Arkansas    237891657095 │
│ 06    California  2945410913583 │
│ 13    Georgia     270241069273 │
│ 15    Hawaii      32453218150718 │
│ 16    Idaho       252982087927 │
│ 30    Montana     262492067519 │
│ 38    North Dakota323362457759 │
│ 39    Ohio        27435947642 │
│ 40    Oklahoma    262071017663 │
│ 47    Tennessee   254531028084 │
│  │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘

Supply an alternative function to summarize values

>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool", "tension", "breaks")
>>> warpbreaks
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool    tension  breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ stringstringint64  │
├────────┼─────────┼────────┤
│ A     L      26 │
│ A     L      30 │
│ A     L      54 │
│ A     L      25 │
│ A     L      70 │
│ A     L      52 │
│ A     L      51 │
│ A     L      26 │
│ A     L      67 │
│ A     M      18 │
│  │
└────────┴─────────┴────────┘
>>> warpbreaks.pivot_wider(names_from="wool", values_from="breaks", values_agg="mean")
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension  A          B         ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64float64   │
├─────────┼───────────┼───────────┤
│ L      44.55555628.222222 │
│ M      24.00000028.777778 │
│ H      24.55555618.777778 │
└─────────┴───────────┴───────────┘

Passing Deferred objects to values_agg is supported

>>> warpbreaks.pivot_wider(
...     names_from="tension",
...     values_from="breaks",
...     values_agg=_.sum(),
... ).order_by("wool")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool    L      M      H     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringint64int64int64 │
├────────┼───────┼───────┼───────┤
│ A     401216221 │
│ B     254259169 │
└────────┴───────┴───────┴───────┘

Use a custom aggregate function

>>> warpbreaks.pivot_wider(
...     names_from="wool",
...     values_from="breaks",
...     values_agg=lambda col: col.std() / col.mean(),
... )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension  A         B        ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ stringfloat64float64  │
├─────────┼──────────┼──────────┤
│ L      0.4061830.349325 │
│ M      0.3608440.327719 │
│ H      0.4183440.260590 │
└─────────┴──────────┴──────────┘

Generate some random data, setting the random seed for reproducibility

>>> import random
>>> random.seed(0)
>>> raw = ibis.memtable(
...     [
...         dict(
...             product=product,
...             country=country,
...             year=year,
...             production=random.random(),
...         )
...         for product in "AB"
...         for country in ["AI", "EI"]
...         for year in range(2000, 2015)
...     ]
... )
>>> production = raw.filter(((_.product == "A") & (_.country == "AI")) | (_.product == "B"))
>>> production
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product  country  year   production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringint64float64    │
├─────────┼─────────┼───────┼────────────┤
│ B      AI     20000.477010 │
│ B      AI     20010.865310 │
│ B      AI     20020.260492 │
│ B      AI     20030.805028 │
│ B      AI     20040.548699 │
│ B      AI     20050.014042 │
│ B      AI     20060.719705 │
│ B      AI     20070.398824 │
│ B      AI     20080.824845 │
│ B      AI     20090.668153 │
│  │
└─────────┴─────────┴───────┴────────────┘

Pivoting with multiple name columns

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   B_AI      B_EI      A_AI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64float64  │
├───────┼──────────┼──────────┼──────────┤
│  20010.8653100.1910670.757954 │
│  20030.8050280.2386160.258917 │
│  20090.6681530.5079410.583382 │
│  20120.8676030.5512670.281838 │
│  20040.5486990.9675400.511275 │
│  20060.7197050.4479700.783799 │
│  20070.3988240.0804460.303313 │
│  20080.8248450.3200550.476597 │
│  20110.4935780.1090580.504687 │
│  20000.4770100.8704710.844422 │
│      │
└───────┴──────────┴──────────┴──────────┘

Select a subset of names. This call incurs no computation when constructing the expression.

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     names=[("A", "AI"), ("B", "AI")],
...     values_from="production",
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   A_AI      B_AI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64  │
├───────┼──────────┼──────────┤
│  20010.7579540.865310 │
│  20030.2589170.805028 │
│  20090.5833820.668153 │
│  20120.2818380.867603 │
│  20040.5112750.548699 │
│  20060.7837990.719705 │
│  20070.3033130.398824 │
│  20080.4765970.824845 │
│  20110.5046870.493578 │
│  20000.8444220.477010 │
│      │
└───────┴──────────┴──────────┘

Sort the new columns’ names

>>> production.pivot_wider(
...     names_from=["product", "country"],
...     values_from="production",
...     names_sort=True,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year   A_AI      B_AI      B_EI     ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64float64  │
├───────┼──────────┼──────────┼──────────┤
│  20010.7579540.8653100.191067 │
│  20030.2589170.8050280.238616 │
│  20090.5833820.6681530.507941 │
│  20120.2818380.8676030.551267 │
│  20040.5112750.5486990.967540 │
│  20060.7837990.7197050.447970 │
│  20070.3033130.3988240.080446 │
│  20080.4765970.8248450.320055 │
│  20110.5046870.4935780.109058 │
│  20000.8444220.4770100.870471 │
│      │
└───────┴──────────┴──────────┴──────────┘

relabel

relabel(substitutions)

Deprecated in favor of Table.rename.

relocate

relocate(*columns, before=None, after=None, **kwargs)

Relocate columns before or after other specified columns.

Parameters

Name Type Description Default
columns str | s.Selector Columns to relocate. Selectors are accepted. ()
before str | s.Selector | None A column name or selector to insert the new columns before. None
after str | s.Selector | None A column name or selector. Columns in columns are relocated after the last column selected in after. None
kwargs str Additional column names to relocate, renaming argument values to keyword argument names. {}

Returns

Type Description
Table A table with the columns relocated.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> import ibis.selectors as s
>>> t = ibis.memtable(dict(a=[1], b=[1], c=[1], d=["a"], e=["a"], f=["a"]))
>>> t
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      b      c      d       e       f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64int64stringstringstring │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     111a     a     a      │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f       a      b      c      d       e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringint64int64int64stringstring │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a     111a     a      │
└────────┴───────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a", after="c")
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b      c      a      d       e       f      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64int64stringstringstring │
├───────┼───────┼───────┼────────┼────────┼────────┤
│     111a     a     a      │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f", before="b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      f       b      c      d       e      ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64stringint64int64stringstring │
├───────┼────────┼───────┼───────┼────────┼────────┤
│     1a     11a     a      │
└───────┴────────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a", after=s.last())
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b      c      d       e       f       a     ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64int64stringstringstringint64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│     11a     a     a     1 │
└───────┴───────┴────────┴────────┴────────┴───────┘

Relocate allows renaming

>>> t.relocate(ff="f")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff      a      b      c      d       e      ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ stringint64int64int64stringstring │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a     111a     a      │
└────────┴───────┴───────┴───────┴────────┴────────┘

You can relocate based on any predicate selector, such as of_type

>>> t.relocate(s.of_type("string"))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d       e       f       a      b      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a     a     a     111 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.numeric(), after=s.last())
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d       e       f       a      b      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringstringint64int64int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a     a     a     111 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.any_of(s.c(*"ae")))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      e       b      c      d       f      ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64stringint64int64stringstring │
├───────┼────────┼───────┼───────┼────────┼────────┤
│     1a     11a     a      │
└───────┴────────┴───────┴───────┴────────┴────────┘

When multiple columns are selected with before or after, those selected columns are moved before and after the selectors input

>>> t = ibis.memtable(dict(a=[1], b=["a"], c=[1], d=["a"]))
>>> t.relocate(s.numeric(), after=s.of_type("string"))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b       d       a      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringint64int64 │
├────────┼────────┼───────┼───────┤
│ a     a     11 │
└────────┴────────┴───────┴───────┘
>>> t.relocate(s.numeric(), before=s.of_type("string"))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a      c      b       d      ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64int64stringstring │
├───────┼───────┼────────┼────────┤
│     11a     a      │
└───────┴───────┴────────┴────────┘

When there are duplicate renames in a call to relocate, the last one is preserved

>>> t.relocate(e="d", f="d")
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f       a      b       c     ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringint64stringint64 │
├────────┼───────┼────────┼───────┤
│ a     1a     1 │
└────────┴───────┴────────┴───────┘

However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved

>>> t.relocate(
...     "b",
...     s.of_type("string"),  # "b" is a string column, so the selector matches
... )
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b       d       a      c     ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringstringint64int64 │
├────────┼────────┼───────┼───────┤
│ a     a     11 │
└────────┴────────┴───────┴───────┘

rename

rename(method=None, /, **substitutions)

Rename columns in the table.

Parameters

Name Type Description Default
method str | Callable[[str], str | None] | Literal[‘snake_case’, ‘ALL_CAPS’] | Mapping[str, str] | None An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}". - A function from old name to new name. If the function returns None the old name is used. - The literal strings "snake_case" or "ALL_CAPS" to rename all columns using a snake_case or "ALL_CAPS" naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. None
substitutions str Columns to be explicitly renamed, expressed as new_name=old_name keyword arguments. {}

Returns

Type Description
Table A renamed table expression

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> first3 = s.r[:3]  # first 3 columns
>>> t = ibis.examples.penguins_raw_raw.fetch().select(first3)
>>> t
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ studyName  Sample Number  Species                             ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708  1Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  2Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  3Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  4Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  5Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  6Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  7Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  8Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  9Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708  10Adelie Penguin (Pygoscelis adeliae) │
│                                    │
└───────────┴───────────────┴─────────────────────────────────────┘

Rename specific columns by passing keyword arguments like new_name="old_name"

>>> t.rename(study_name="studyName").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name  Sample Number  Species                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using a format string

>>> t.rename("p_{name}").head(1)
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ p_studyName  p_Sample Number  p_Species                           ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├─────────────┼─────────────────┼─────────────────────────────────────┤
│ PAL0708    1Adelie Penguin (Pygoscelis adeliae) │
└─────────────┴─────────────────┴─────────────────────────────────────┘

Rename all columns using a snake_case convention

>>> t.rename("snake_case").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name  sample_number  species                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using an ALL_CAPS convention

>>> t.rename("ALL_CAPS").head(1)
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDY_NAME  SAMPLE_NUMBER  SPECIES                             ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708   1Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘

Rename all columns using a callable

>>> t.rename(str.upper).head(1)
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDYNAME  SAMPLE NUMBER  SPECIES                             ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64string                              │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708  1Adelie Penguin (Pygoscelis adeliae) │
└───────────┴───────────────┴─────────────────────────────────────┘

rowid

rowid()

A unique integer per row.

This operation is only valid on physical tables

Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid).

For a monotonically increasing row number, see ibis.row_number.

Returns

Type Description
IntegerColumn An integer column

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Results may be non-repeatable

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

Name Type Description Default
fraction float The percentage of rows to include in the sample, expressed as a float between 0 and 1. required
method Literal[‘row’, ‘block’] The sampling method to use. The default is “row”, which includes each row with a probability of fraction. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause. 'row'
seed int | None An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases. None

Returns

Type Description
Table The input table, with fraction of rows selected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2b      │
│     3c      │
│     4d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     2b      │
│     3c      │
└───────┴────────┘

schema

schema()

Return the Schema for this table.

Returns

Type Description
Schema The table’s schema.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
  species            string
  island             string
  bill_length_mm     float64
  bill_depth_mm      float64
  flipper_length_mm  int64
  body_mass_g        int64
  sex                string
  year               int64
}

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str] Column expression, string, or list of column expressions and strings. ()
named_exprs ir.Value | str Column expressions {}

Returns

Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Simple projection

>>> t.select("island", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(0, 4).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringint64             │
├─────────┼───────────────────┤
│ Adelie 181 │
│ Adelie 186 │
│ Adelie 195 │
│ Adelie NULL │
│ Adelie 193 │
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from ibis import _
>>> t.select((_.year + 1).name("next_year")).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island     bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64   │
├───────────┼───────────┤
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
└───────────┴───────────┘

Projection with a selector

>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~s.c("year")).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64int64int64       │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│           39.118.71813750 │
│           39.517.41863800 │
│           40.318.01953250 │
│           NULLNULLNULLNULL │
│           36.719.31933450 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~s.c("year"), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64float64float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

Name Type Description Default
query str Query string required
dialect str | None Optional string indicating the dialect of query. Defaults to the backend’s native dialect. None

Returns

Type Description
Table An opaque table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ Biscoe   45.257485 │
│ Dream    44.167742 │
│ Torgersen38.950980 │
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = ibis.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ biscoe   45.3 │
│ dream    44.2 │
│ torgersen39.0 │
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island  avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├────────┼─────────────────┤
│ biscoe45.3 │
│ dream 44.2 │
└────────┴─────────────────┘

See Also

Table.alias

to_array

to_array()

View a single column table as an array.

Returns

Type Description
Value A single column view of a table

to_pandas

to_pandas(**kwargs)

Convert a table expression to a pandas DataFrame.

Parameters

Name Type Description Default
kwargs Same as keyword arguments to execute {}

try_cast

try_cast(schema)

Cast the columns of a table.

If the cast fails for a row, the value is returned as NULL or NaN depending on backend behavior.

Parameters

Name Type Description Default
schema SchemaLike Mapping, schema or iterable of pairs to use for casting required

Returns

Type Description
Table Casted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["1", "2", "3"], "b": ["2.2", "3.3", "book"]})
>>> t.try_cast({"a": "int", "b": "float"})
┏━━━━━━━┳━━━━━━━━━┓
┃ a      b       ┃
┡━━━━━━━╇━━━━━━━━━┩
│ int64float64 │
├───────┼─────────┤
│     12.2 │
│     23.3 │
│     3NULL │
└───────┴─────────┘

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows False

Returns

Type Description
Table A new table containing the union of all input tables.

See Also

ibis.union

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.union(t2)  # union all by default
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘
>>> t1.union(t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

unpack

unpack(*columns)

Project the struct fields of each of columns into self.

Existing fields are retained in the projection.

Parameters

Name Type Description Default
columns str String column names to project into self. ()

Returns

Type Description
Table The child table with struct fields of each of columns projected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
...     {"name": "a", "pos": {"lat": 10.1, "lon": 30.3}}
...     {"name": "b", "pos": {"lat": 10.2, "lon": 30.2}}
...     {"name": "c", "pos": {"lat": 10.3, "lon": 30.1}}
... '''
>>> with open("/tmp/lines.json", "w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json")
>>> t
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name    pos                                ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringstruct<lat: float64, lon: float64> │
├────────┼────────────────────────────────────┤
│ a     {'lat': 10.1, 'lon': 30.3}         │
│ b     {'lat': 10.2, 'lon': 30.2}         │
│ c     {'lat': 10.3, 'lon': 30.1}         │
└────────┴────────────────────────────────────┘
>>> t.unpack("pos")
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name    lat      lon     ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ stringfloat64float64 │
├────────┼─────────┼─────────┤
│ a     10.130.3 │
│ b     10.230.2 │
│ c     10.330.1 │
└────────┴─────────┴─────────┘

See Also

StructValue.lift

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

Type Description
Table Table expression

window_by

window_by(time_col)

Create a windowing table-valued function (TVF) expression.

Windowing table-valued functions (TVF) assign rows of a table to windows based on a time attribute column in the table.

Parameters

Name Type Description Default
time_col ir.Value Column of the table that will be mapped to windows. required

Returns

Type Description
WindowedTable WindowedTable expression.

GroupedTable

GroupedTable(self, groupings, **kwargs)

An intermediate table expression to hold grouping information.

Methods

Name Description
aggregate Compute aggregates over a group by.
count Computing the number of rows per group.
having Add a post-aggregation result filter expr.
mutate Return a table projection with window functions applied.
order_by Sort a grouped table expression by expr.
over Apply a window over the input expressions.
select Project new columns out of the grouped table.

aggregate

aggregate(metrics=(), **kwds)

Compute aggregates over a group by.

count

count()

Computing the number of rows per group.

Returns

Type Description
Table The aggregated table

having

having(expr)

Add a post-aggregation result filter expr.

Expressions like x is None return bool and will not generate a SQL comparison to NULL

Parameters

Name Type Description Default
expr ir.BooleanScalar An expression that filters based on an aggregate value. required

Returns

Type Description
GroupedTable A grouped table expression

mutate

mutate(*exprs, **kwexprs)

Return a table projection with window functions applied.

Any arguments can be functions.

Parameters

Name Type Description Default
exprs ir.Value | Sequence[ir.Value] List of expressions ()
kwexprs ir.Value Expressions {}

Examples

>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> (
...     t.select("species", "bill_length_mm")
...     .group_by("species")
...     .mutate(centered_bill_len=ibis._.bill_length_mm - ibis._.bill_length_mm.mean())
...     .order_by(s.all())
... )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  bill_length_mm  centered_bill_len ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringfloat64float64           │
├─────────┼────────────────┼───────────────────┤
│ Adelie 32.1-6.691391 │
│ Adelie 33.1-5.691391 │
│ Adelie 33.5-5.291391 │
│ Adelie 34.0-4.791391 │
│ Adelie 34.1-4.691391 │
│ Adelie 34.4-4.391391 │
│ Adelie 34.5-4.291391 │
│ Adelie 34.6-4.191391 │
│ Adelie 34.6-4.191391 │
│ Adelie 35.0-3.791391 │
│  │
└─────────┴────────────────┴───────────────────┘

Returns

Type Description
Table A table expression with window functions applied

order_by

order_by(expr)

Sort a grouped table expression by expr.

Notes

This API call is ignored in aggregations.

Parameters

Name Type Description Default
expr ir.Value | Iterable[ir.Value] Expressions to order the results by required

Returns

Type Description
GroupedTable A sorted grouped GroupedTable

over

over(window=None, *, rows=None, range=None, group_by=None, order_by=None)

Apply a window over the input expressions.

Parameters

Name Type Description Default
window Window to add to the input None
rows Whether to use the ROWS window clause None
range Whether to use the RANGE window clause None
group_by Grouping key None
order_by Ordering key None

Returns

Type Description
GroupedTable A new grouped table expression

select

select(*exprs, **kwexprs)

Project new columns out of the grouped table.

See Also

GroupedTable.mutate

read_csv

ibis.read_csv(sources, table_name=None, **kwargs)

Lazily load a CSV or set of CSVs.

This function delegates to the read_csv method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. Supports CSV and TSV files. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * CSV/TSV: https://duckdb.org/docs/data/csv#parameters. {}

Returns

Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''a,b
... 1,d
... 2,
... ,f
... '''
>>> with open("/tmp/lines.csv", mode="w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_csv("/tmp/lines.csv")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1d      │
│     2NULL   │
│  NULLf      │
└───────┴────────┘

read_delta

ibis.read_delta(source, table_name=None, **kwargs)

Lazily load a Delta Lake table.

Parameters

Name Type Description Default
source str | Path A filesystem path or URL. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. {}

Returns

Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": list("ghi")})
>>> df
a b
0 1 g
1 2 h
2 3 i
>>> import deltalake as dl
>>> dl.write_deltalake("/tmp/data.delta", df, mode="overwrite")
>>> t = ibis.read_delta("/tmp/data.delta")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1g      │
│     2h      │
│     3i      │
└───────┴────────┘

read_json

ibis.read_json(sources, table_name=None, **kwargs)

Lazily load newline-delimited JSON data.

This API is experimental and subject to change.

This function delegates to the read_json method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. See https://duckdb.org/docs/extensions/json.html for details. {}

Returns

Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
... {"a": 1, "b": "d"}
... {"a": 2, "b": null}
... {"a": null, "b": "f"}
... '''
>>> with open("/tmp/lines.json", mode="w") as f:
...     nbytes = f.write(lines)  # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1d      │
│     2NULL   │
│  NULLf      │
└───────┴────────┘

read_parquet

ibis.read_parquet(sources, table_name=None, **kwargs)

Lazily load a parquet file or set of parquet files.

This function delegates to the read_parquet method on the current default backend (DuckDB or ibis.config.default_backend).

Parameters

Name Type Description Default
sources str | Path | Sequence[str | Path] A filesystem path or URL or list of same. required
table_name str | None A name to refer to the table. If not provided, a name will be generated. None
kwargs Any Backend-specific keyword arguments for the file type. For the DuckDB backend used by default, please refer to: * Parquet: https://duckdb.org/docs/data/parquet {}

Returns

Type Description
ir.Table Table expression representing a file

Examples

>>> import ibis
>>> import pandas as pd
>>> ibis.options.interactive = True
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": list("ghi")})
>>> df
a b
0 1 g
1 2 h
2 3 i
>>> df.to_parquet("/tmp/data.parquet")
>>> t = ibis.read_parquet("/tmp/data.parquet")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1g      │
│     2h      │
│     3i      │
└───────┴────────┘

memtable

ibis.memtable(data, *, columns=None, schema=None, name=None)

Construct an ibis table expression from in-memory data.

Parameters

Name Type Description Default
data A table-like object (pandas.DataFrame, pyarrow.Table, or polars.DataFrame), or any data accepted by the pandas.DataFrame constructor (e.g. a list of dicts). Note that ibis objects (e.g. MapValue) may not be passed in as part of data and will result in an error. Do not depend on the underlying storage type (e.g., pyarrow.Table), it’s subject to change across non-major releases. required
columns Iterable[str] | None Optional typing.Iterable of str column names. If provided, must match the number of columns in data. None
schema SchemaLike | None Optional Schema. The functions use data to infer a schema if not passed. None
name str | None Optional name of the table. None

Returns

Type Description
Table A table expression backed by in-memory data.

Examples

>>> import ibis
>>> t = ibis.memtable([{"a": 1}, {"a": 2}])
>>> t
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t = ibis.memtable([{"a": 1, "b": "foo"}, {"a": 2, "b": "baz"}])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

Create a table literal without column names embedded in the data and pass columns

>>> t = ibis.memtable([(1, "foo"), (2, "baz")], columns=["a", "b"])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

Create a table literal without column names embedded in the data. Ibis generates column names if none are provided.

>>> t = ibis.memtable([(1, "foo"), (2, "baz")])
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ col0   col1   ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1foo    │
│     2baz    │
└───────┴────────┘

table

ibis.table(schema=None, name=None)

Create a table literal or an abstract table without data.

Parameters

Name Type Description Default
schema SchemaLike | None A schema for the table None
name str | None Name for the table. One is generated if this value is None. None

Returns

Type Description
Table A table expression

Examples

Create a table with no data backing it

>>> import ibis
>>> ibis.options.interactive = False
>>> t = ibis.table(schema=dict(a="int", b="string"), name="t")
>>> t
UnboundTable: t
  a int64
  b string

difference

ibis.difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only diff distinct rows not occurring in the calling table True

Returns

Type Description
Table The rows present in self that are not present in tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.difference(t1, t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

intersect

ibis.intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only return distinct rows True

Returns

Type Description
Table A new table containing the intersection of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.intersect(t1, t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

union

ibis.union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table ir.Table A table expression required
*rest ir.Table Additional table expressions ()
distinct bool Only return distinct rows False

Returns

Type Description
Table A new table containing the union of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> ibis.union(t1, t2)  # union all by default
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘
>>> ibis.union(t1, t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

join

ibis.join(left, right, predicates=(), how='inner', *, lname='', rname='{name}_right')

Perform a join between two tables.

Parameters

Name Type Description Default
left Table Left table to join required
right Table Right table to join required
predicates str | Sequence[str | ir.BooleanColumn | Literal[True] | Literal[False] | tuple[str | ir.Column | ir.Deferred, str | ir.Column | ir.Deferred]] Condition(s) to join on. See examples for details. ()
how JoinKind Join method, e.g. "inner" or "left". 'inner'
lname str A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"). ''
rname str A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"). '{name}_right'

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId  title                               genres                                      ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64stringstring                                      │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│       1Toy Story (1995)                  Adventure|Animation|Children|Comedy|Fantasy │
│       2Jumanji (1995)                    Adventure|Children|Fantasy                  │
│       3Grumpier Old Men (1995)           Comedy|Romance                              │
│       4Waiting to Exhale (1995)          Comedy|Drama|Romance                        │
│       5Father of the Bride Part II (1995)Comedy                                      │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp")
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64int64float64 │
├────────┼─────────┼─────────┤
│      114.0 │
│      134.0 │
│      164.0 │
│      1475.0 │
│      1505.0 │
└────────┴─────────┴─────────┘

Equality left join on the shared movieId column. Note the _right suffix added to all overlapping columns from the right table (in this case only the “movieId” column).

>>> ratings.join(movies, "movieId", how="left").head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   movieId_right  title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64int64stringstring                                      │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.01Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.03Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.06Heat (1995)                Action|Crime|Thriller                       │
│      1475.047Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.050Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

Explicit equality join using the default how value of "inner". Note how there is no _right suffix added to the movieId column since this is an inner join and the movieId column is part of the join condition.

>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId  movieId  rating   title                        genres                                      ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64float64stringstring                                      │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│      114.0Toy Story (1995)           Adventure|Animation|Children|Comedy|Fantasy │
│      134.0Grumpier Old Men (1995)    Comedy|Romance                              │
│      164.0Heat (1995)                Action|Crime|Thriller                       │
│      1475.0Seven (a.k.a. Se7en) (1995)Mystery|Thriller                            │
│      1505.0Usual Suspects, The (1995) Crime|Mystery|Thriller                      │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId  movieId  tag              timestamp  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64int64stringint64      │
├────────┼─────────┼─────────────────┼────────────┤
│      260756funny          1445714994 │
│      260756Highly quotable1445714996 │
│      260756will ferrell   1445714992 │
│      289774Boxing story   1445715207 │
│      289774MMA            1445715200 │
└────────┴─────────┴─────────────────┴────────────┘

You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:

>>> tags.join(ratings, ["userId", "movieId"]).head(5)
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId  movieId  tag              timestamp   rating  ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64int64stringint64float64 │
├────────┼─────────┼─────────────────┼────────────┼─────────┤
│      260756will ferrell   14457149925.0 │
│      289774Tom Hardy      14457152055.0 │
│      2106782Martin Scorsese14457150565.0 │
│      748516way too long   11696873251.0 │
│     18431mafia          14621387554.0 │
└────────┴─────────┴─────────────────┴────────────┴─────────┘

To self-join a table with itself, you need to call .view() on one of the arguments so the two tables are distinct from each other.

For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key}), where each key can be

  • a Column
  • Deferred expression
  • lambda of the form (Table) -> Column

For example, to find all movies pairings that received the same (ignoring case) tags:

>>> movie_tags = tags["movieId", "tag"]
>>> view = movie_tags.view()
>>> movie_tags.join(
...     view,
...     [
...         movie_tags.movieId != view.movieId,
...         (_.tag.lower(), lambda t: t.tag.lower()),
...     ],
... ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId  tag                movieId_right  tag_right         ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64string            │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│   60756funny            1732funny             │
│   60756Highly quotable  1732Highly quotable   │
│   89774Tom Hardy        139385tom hardy         │
│  106782drugs            1732drugs             │
│  106782Leonardo DiCaprio5989Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘

row_number

ibis.row_number()

Return an analytic function expression for the current row number.

Note

row_number is normalized across backends to start at 0

Returns

Type Description
IntegerColumn A column expression enumerating rows

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rownum=ibis.row_number())
┏━━━━━━━━┳━━━━━━━━┓
┃ values  rownum ┃
┡━━━━━━━━╇━━━━━━━━┩
│ int64int64  │
├────────┼────────┤
│      10 │
│      21 │
│      12 │
│      23 │
│      34 │
│      25 │
└────────┴────────┘

rank

ibis.rank()

Compute position of first element within each equal-value group in sorted order.

Equivalent to SQL’s RANK() window function.

Returns

Type Description
Int64Column The min rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=ibis.rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      22 │
│      22 │
│      22 │
│      35 │
└────────┴───────┘

dense_rank

ibis.dense_rank()

Position of first element within each group of equal values.

Values are returned in sorted order and duplicate values are ignored.

Equivalent to SQL’s DENSE_RANK().

Returns

Type Description
IntegerColumn The rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(rank=ibis.dense_rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  rank  ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      21 │
│      21 │
│      21 │
│      32 │
└────────┴───────┘

percent_rank

ibis.percent_rank()

Return the relative rank of the values in the column.

Returns

Type Description
FloatingColumn The percent rank

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(pct_rank=ibis.percent_rank().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values  pct_rank ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64float64  │
├────────┼──────────┤
│      10.0 │
│      10.0 │
│      20.4 │
│      20.4 │
│      20.4 │
│      31.0 │
└────────┴──────────┘

cume_dist

ibis.cume_dist()

Return the cumulative distribution over a window.

Returns

Type Description
FloatingColumn The cumulative distribution

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(dist=ibis.cume_dist().over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━━━━┓
┃ values  dist     ┃
┡━━━━━━━━╇━━━━━━━━━━┩
│ int64float64  │
├────────┼──────────┤
│      10.333333 │
│      10.333333 │
│      20.833333 │
│      20.833333 │
│      20.833333 │
│      31.000000 │
└────────┴──────────┘

ntile

ibis.ntile(buckets)

Return the integer number of a partitioning of the column values.

Parameters

Name Type Description Default
buckets int | ir.IntegerValue Number of buckets to partition into required

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"values": [1, 2, 1, 2, 3, 2]})
>>> t.mutate(ntile=ibis.ntile(2).over(order_by=t.values))
┏━━━━━━━━┳━━━━━━━┓
┃ values  ntile ┃
┡━━━━━━━━╇━━━━━━━┩
│ int64int64 │
├────────┼───────┤
│      10 │
│      10 │
│      20 │
│      21 │
│      21 │
│      31 │
└────────┴───────┘

window

ibis.window(preceding=None, following=None, order_by=None, group_by=None, *, rows=None, range=None, between=None)

Create a window clause for use with window functions.

The ROWS window clause includes peer rows based on differences in row number whereas RANGE includes rows based on the differences in row value of a single order_by expression.

All window frame bounds are inclusive.

Parameters

Name Type Description Default
preceding Number of preceding rows in the window None
following Number of following rows in the window None
group_by Grouping key None
order_by Ordering key None
rows Whether to use the ROWS window clause None
range Whether to use the RANGE window clause None
between Automatically infer the window kind based on the boundaries None

Returns

Type Description
Window A window frame

cumulative_window

ibis.cumulative_window(group_by=None, order_by=None)

Create a cumulative window for use with window functions.

All window frames / ranges are inclusive.

Parameters

Name Type Description Default
group_by Grouping key None
order_by Ordering key None

Returns

Type Description
Window A window frame

range_window

ibis.range_window(preceding=None, following=None, group_by=None, order_by=None)

Create a range-based window clause for use with window functions.

This RANGE window clause aggregates rows based upon differences in the value of the order-by expression.

All window frames / ranges are inclusive.

Parameters

Name Type Description Default
preceding Number of preceding rows in the window None
following Number of following rows in the window None
group_by Grouping key None
order_by Ordering key None

Returns

Type Description
Window A window frame

trailing_range_window

ibis.trailing_range_window(preceding, order_by, group_by=None)

Create a trailing range window for use with window functions.

Parameters

Name Type Description Default
preceding A value expression required
order_by Ordering key required
group_by Grouping key None

Returns

Type Description
Window A window frame

trailing_window

ibis.trailing_window(preceding, group_by=None, order_by=None)

Create a trailing window for use with window functions.

Parameters

Name Type Description Default
preceding The number of preceding rows required
group_by Grouping key None
order_by Ordering key None

Returns

Type Description
Window A window frame
Back to top