An immutable and lazy dataframe.
You will not create Table objects directly. Instead, you will create one
Methods
aggregate
Aggregate a table with a given set of reductions grouping by by
.
alias
Create a table expression with a specific name alias
.
as_scalar
Inform ibis that the table expression should be treated as a scalar.
as_table
Promote the expression to a table.
asof_join
Perform an “as-of” join between left
and right
.
bind
Bind column values to a table expression.
cache
Cache the provided expression.
cast
Cast the columns of a table.
compile
Compile to an execution target.
count
Compute the number of rows in the table.
cross_join
Compute the cross join of a sequence of tables.
describe
Return summary information about a table.
difference
Compute the set difference of multiple table expressions.
distinct
Return a Table with duplicate rows removed.
drop
Remove fields from a table.
drop_null
Remove rows with null values from the table.
dropna
Deprecated - use drop_null
instead.
equals
Return whether this expression is structurally equivalent to other
.
execute
Execute an expression against its backend if one exists.
fill_null
Fill null values in a table expression.
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
get_backend
Get the current Ibis backend of the expression.
get_name
Return the fully qualified name of the table.
group_by
Create a grouped table expression.
head
Select the first n
rows of a table.
info
Return summary information about a table.
intersect
Compute the set intersection of multiple table expressions.
join
Perform a join between two tables.
limit
Select n
rows from self
starting at offset
.
mutate
Add columns to a table expression.
nunique
Compute the number of unique rows in the table.
order_by
Sort a table by one or more expressions.
pipe
Compose f
with self
.
pivot_longer
Transform a table from wider to longer.
pivot_wider
Pivot a table to a wider format.
preview
Return a subset as a Rich Table.
relabel
Deprecated in favor of Table.rename
.
relocate
Relocate columns
before or after other specified columns.
rename
Rename columns in the table.
rowid
A unique integer per row.
sample
Sample a fraction of rows from a table.
schema
Return the Schema for this table.
select
Compute a new table expression using exprs
and named_exprs
.
sql
Run a SQL query against a table expression.
to_array
Deprecated - use as_scalar
instead.
to_csv
Write the results of executing the given expression to a CSV file.
to_delta
Write the results of executing the given expression to a Delta Lake table.
to_pandas
Convert a table expression to a pandas DataFrame.
to_pandas_batches
Execute expression and return an iterator of pandas DataFrames.
to_parquet
Write the results of executing the given expression to a parquet file.
to_parquet_dir
Write the results of executing the given expression to a parquet file in a directory.
to_polars
Execute expression and return results as a polars dataframe.
to_pyarrow
Execute expression and return results in as a pyarrow table.
to_pyarrow_batches
Execute expression and return a RecordBatchReader.
to_torch
Execute an expression and return results as a dictionary of torch tensors.
try_cast
Cast the columns of a table.
unbind
Return an expression built on UnboundTable
instead of backend-specific objects.
union
Compute the set union of multiple table expressions.
unnest
Unnest an array column
from a table.
unpack
Project the struct fields of each of columns
into self
.
value_counts
Compute a frequency table of this table’s values.
view
Create a new table expression distinct from the current one.
visualize
Visualize an expression as a GraphViz graph in the browser.
aggregate
aggregate(metrics= (), by= (), having= (), ** kwargs)
Aggregate a table with a given set of reductions grouping by by
.
Parameters
metrics
Sequence [ir
.Scalar
] | None
Aggregate expressions. These can be any scalar-producing expression, including aggregation functions like sum
or literal values like ibis.literal(1)
.
()
by
Sequence [ir
.Value
] | None
Grouping expressions.
()
having
Sequence [ir
.BooleanValue
] | None
Post-aggregation filters. The shape requirements are the same metrics
, but the output type for having
is boolean
. ::: {.callout-warning} ## Expressions like x is None
return bool
and will not generate a SQL comparison to NULL
:::
()
kwargs
ir
.Value
Named aggregate expressions
{}
Returns
Table
An aggregate table expression
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.aggregate(
... by= ["fruit" ],
... total_cost= _.price.sum (),
... avg_cost= _.price.mean(),
... having= _.price.sum () < 0.5 ,
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
alias
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql
method.
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API .
Parameters
alias
str
Name of the child expression
required
Returns
Table
An table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> expr = t.alias("pingüinos" ).sql('SELECT * FROM "pingüinos" LIMIT 5' )
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
as_scalar
Inform ibis that the table expression should be treated as a scalar.
Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_gentoo = t.filter (t.species == "Gentoo" , t.body_mass_g > 6200 )
>>> from_that_island = t.filter (t.island == heavy_gentoo.select("island" ).as_scalar())
>>> from_that_island.species.value_counts().order_by("species" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────┤
│ Adelie │ 44 │
│ Gentoo │ 124 │
└─────────┴───────────────┘
as_table
Promote the expression to a table.
This method is a no-op for table expressions.
Examples
>>> t = ibis.table(dict (a= "int" ), name= "t" )
>>> s = t.as_table()
>>> t is s
asof_join
asof_join(
left,
right,
on,
predicates= (),
tolerance= None ,
* ,
lname= '' ,
rname= ' {name} _right' ,
)
Perform an “as-of” join between left
and right
.
Similar to a left join except that the match is done on nearest key rather than equal keys.
Parameters
left
Table
Table expression
required
right
Table
Table expression
required
on
str | ir
.BooleanColumn
Closest match inequality condition
required
predicates
str | ir
.Column
| Sequence [str | ir
.Column
]
Additional join predicates
()
tolerance
str | ir
.IntervalScalar
| None
Amount of time to look behind when joining
None
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Examples
>>> from datetime import datetime, timedelta
>>> import ibis
>>> ibis.options.interactive = True
>>> sensors = ibis.memtable(
... {
... "site" : ["a" , "b" , "a" , "b" , "a" ],
... "humidity" : [0.3 , 0.4 , 0.5 , 0.6 , 0.7 ],
... "event_time" : [
... datetime(2024 , 11 , 16 , 12 , 0 , 15 , 500000 ),
... datetime(2024 , 11 , 16 , 12 , 0 , 15 , 700000 ),
... datetime(2024 , 11 , 17 , 18 , 12 , 14 , 950000 ),
... datetime(2024 , 11 , 17 , 18 , 12 , 15 , 120000 ),
... datetime(2024 , 11 , 18 , 18 , 12 , 15 , 100000 ),
... ],
... }
... )
>>> events = ibis.memtable(
... {
... "site" : ["a" , "b" , "a" ],
... "event_type" : [
... "cloud coverage" ,
... "rain start" ,
... "rain stop" ,
... ],
... "event_time" : [
... datetime(2024 , 11 , 16 , 12 , 0 , 15 , 400000 ),
... datetime(2024 , 11 , 17 , 18 , 12 , 15 , 100000 ),
... datetime(2024 , 11 , 18 , 18 , 12 , 15 , 100000 ),
... ],
... }
... )
This setup simulates time-series data by pairing irregularly collected sensor readings with weather events, enabling analysis of environmental conditions before each event. We will use the asof_join
method to match each event with the most recent prior sensor reading from the sensors table at the same site.
┏━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ site ┃ humidity ┃ event_time ┃
┡━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ float64 │ timestamp │
├────────┼──────────┼─────────────────────────┤
│ a │ 0.3 │ 2024-11-16 12:00:15.500 │
│ b │ 0.4 │ 2024-11-16 12:00:15.700 │
│ a │ 0.5 │ 2024-11-17 18:12:14.950 │
│ b │ 0.6 │ 2024-11-17 18:12:15.120 │
│ a │ 0.7 │ 2024-11-18 18:12:15.100 │
└────────┴──────────┴─────────────────────────┘
┏━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ site ┃ event_type ┃ event_time ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ timestamp │
├────────┼────────────────┼─────────────────────────┤
│ a │ cloud coverage │ 2024-11-16 12:00:15.400 │
│ b │ rain start │ 2024-11-17 18:12:15.100 │
│ a │ rain stop │ 2024-11-18 18:12:15.100 │
└────────┴────────────────┴─────────────────────────┘
We can find the closest event to each sensor reading with a 1 second tolerance. Using the “site” column as a join predicate ensures we only match events that occurred at or near the same site as the sensor reading.
>>> tolerance = timedelta(seconds= 1 )
>>> sensors.asof_join(events, on= "event_time" , predicates= "site" , tolerance= tolerance).drop(
... "event_time_right"
... ).order_by("event_time" )
┏━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ site ┃ humidity ┃ event_time ┃ site_right ┃ event_type ┃
┡━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │ timestamp │ string │ string │
├────────┼──────────┼─────────────────────────┼────────────┼────────────────┤
│ a │ 0.3 │ 2024-11-16 12:00:15.500 │ a │ cloud coverage │
│ b │ 0.4 │ 2024-11-16 12:00:15.700 │ NULL │ NULL │
│ a │ 0.5 │ 2024-11-17 18:12:14.950 │ NULL │ NULL │
│ b │ 0.6 │ 2024-11-17 18:12:15.120 │ b │ rain start │
│ a │ 0.7 │ 2024-11-18 18:12:15.100 │ a │ rain stop │
└────────┴──────────┴─────────────────────────┴────────────┴────────────────┘
bind
Bind column values to a table expression.
This method handles the binding of every kind of column-like value that Ibis handles, including strings, integers, deferred expressions and selectors, to a table expression.
Parameters
args
Any
Column-like values to bind.
()
kwargs
Any
Column-like values to bind, with names.
{}
cache
Cache the provided expression.
All subsequent operations on the returned expression will be performed on the cached data. The lifetime of the cached table is tied to its python references (ie. it is released once the last reference to it is garbage collected). Alternatively, use the with
statement or call the .release()
method for more control.
This method is idempotent: calling it multiple times in succession will return the same value as the first call.
Subsequent evaluations will not recompute the expression so method chaining will not incur the overhead of caching more than once.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> heavy_computation = ibis.literal("Heavy Computation" )
>>> cached_penguins = t.mutate(computation= heavy_computation).cache()
>>> cached_penguins
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ computation ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ string │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼───────────────────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ Heavy Computation │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ Heavy Computation │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴───────────────────┘
Explicit cache cleanup
>>> with t.mutate(computation= heavy_computation).cache() as cached_penguins:
... cached_penguins
cast
Cast the columns of a table.
Similar to pandas.DataFrame.astype
.
Parameters
schema
SchemaLike
Mapping, schema or iterable of pairs to use for casting
required
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
}
>>> cols = ["body_mass_g" , "bill_length_mm" ]
>>> t[cols].head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ int64 │ float64 │
├─────────────┼────────────────┤
│ 3750 │ 39.1 │
│ 3800 │ 39.5 │
│ 3250 │ 40.3 │
│ NULL │ NULL │
│ 3450 │ 36.7 │
└─────────────┴────────────────┘
Columns not present in the input schema will be passed through unchanged
('species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year')
>>> expr = t.cast({"body_mass_g" : "float64" , "bill_length_mm" : "int" })
>>> expr.select(* cols).head()
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ body_mass_g ┃ bill_length_mm ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ float64 │ int64 │
├─────────────┼────────────────┤
│ 3750.0 │ 39 │
│ 3800.0 │ 40 │
│ 3250.0 │ 40 │
│ NULL │ NULL │
│ 3450.0 │ 37 │
└─────────────┴────────────────┘
Columns that are in the input schema
but not in the table raise an error
>>> t.cast({"foo" : "string" })
---------------------------------------------------------------------------
IbisError Traceback (most recent call last)
Cell In[47], line 1
----> 1 t . cast ( { " foo " : " string " } )
File ~/work/ibis/ibis/ibis/expr/types/relations.py:447 , in Table.cast (self, schema)
372 def cast (self , schema: SchemaLike) - > Table:
373 """Cast the columns of a table.
374
375 Similar to `pandas.DataFrame.astype`.
(...)
445 ibis.common.exceptions.IbisError: Cast schema has fields that are not in the table: ['foo']
446 """
--> 447 return self . _cast ( schema , cast_method = " cast " )
File ~/work/ibis/ibis/ibis/expr/types/relations.py:490 , in Table._cast (self, schema, cast_method)
488 columns = self . columns
489 if missing_fields := frozenset (schema. names). difference(columns):
--> 490 raise com. IbisError(
491 f " Cast schema has fields that are not in the table: { sorted (missing_fields)} "
492 )
494 for col in columns:
495 if (new_type := schema. get(col)) is not None :
IbisError : Cast schema has fields that are not in the table: ['foo']
compile
compile (limit= None , params= None , pretty= False )
Compile to an execution target.
Parameters
limit
int | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
params
Mapping [ir
.Value
, Any ] | None
Mapping of scalar parameter expressions to value
None
pretty
bool
In case of SQL backends, return a pretty formatted SQL query.
False
Returns
str
SQL query representation of the expression
Examples
>>> import ibis
>>> d = {"a" : [1 , 2 , 3 ], "b" : [4 , 5 , 6 ]}
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table("t" , d)
>>> expr = t.mutate(c= t.a + t.b)
>>> expr.compile ()
'SELECT "t0"."a", "t0"."b", "t0"."a" + "t0"."b" AS "c" FROM "memory"."main"."t" AS "t0"'
If you want to see the pretty formatted SQL query, set pretty
to True
.
>>> expr.compile (pretty= True )
'SELECT\n "t0"."a",\n "t0"."b",\n "t0"."a" + "t0"."b" AS "c"\nFROM "memory"."main"."t" AS "t0"'
If the expression does not have a backend, an error will be raised.
>>> t = ibis.memtable(d)
>>> expr = t.mutate(c= t.a + t.b)
>>> expr.compile () # quartodoc: +EXPECTED_FAILURE
---------------------------------------------------------------------------
IbisError Traceback (most recent call last)
Cell In[56], line 3
1 t = ibis. memtable(d)
2 expr = t. mutate(c= t. a + t. b)
----> 3 expr . compile ( ) # quartodoc: +EXPECTED_FAILURE
File ~/work/ibis/ibis/ibis/expr/types/core.py:506 , in Expr.compile (self, limit, params, pretty)
457 def compile (
458 self ,
459 limit: int | None = None ,
460 params: Mapping[ir. Value, Any] | None = None ,
461 pretty: bool = False ,
462 ) - > str :
463 r """Compile to an execution target.
464
465 Parameters
(...)
504 [`ibis.to_sql()`](./expression-generic.qmd#ibis.to_sql)
505 """
--> 506 return self . _find_backend ( ) . compile(
507 self , limit= limit, params= params, pretty= pretty
508 )
File ~/work/ibis/ibis/ibis/expr/types/core.py:368 , in Expr._find_backend (self, use_default)
366 default = _default_backend() if use_default else None
367 if default is None :
--> 368 raise IbisError(
369 " Expression depends on no backends, and found no default "
370 )
371 return default
373 if len (backends) > 1 :
IbisError : Expression depends on no backends, and found no default
count
Compute the number of rows in the table.
Parameters
where
ir
.BooleanValue
| None
Optional boolean expression to filter rows when counting.
None
Returns
IntegerScalar
Number of rows in the table
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["foo" , "bar" , "baz" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ baz │
└────────┘
>>> t.count(t.a != "foo" )
ibis.expr.types.numeric.IntegerScalar
cross_join
cross_join(left, right, * rest, lname= '' , rname= ' {name} _right' )
Compute the cross join of a sequence of tables.
Parameters
left
Table
Left table
required
right
Table
Right table
required
rest
Table
Additional tables to cross join
()
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Returns
Table
Cross join of left
, right
and rest
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.count()
>>> agg = t.drop("year" ).agg(s.across(s.numeric(), _.mean()))
>>> expr = t.cross_join(agg)
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃ bill_length_mm_right ┃ bill_depth_mm_right ┃ flipper_length_mm_right ┃ body_mass_g_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼──────────────────────┼─────────────────────┼─────────────────────────┼───────────────────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴──────────────────────┴─────────────────────┴─────────────────────────┴───────────────────┘
('species',
'island',
'bill_length_mm',
'bill_depth_mm',
'flipper_length_mm',
'body_mass_g',
'sex',
'year',
'bill_length_mm_right',
'bill_depth_mm_right',
'flipper_length_mm_right',
'body_mass_g_right')
describe
describe(quantile= (0.25 , 0.5 , 0.75 ))
Return summary information about a table.
Parameters
quantile
Sequence [ir
.NumericValue
| float ]
The quantiles to compute for numerical columns. Defaults to (0.25, 0.5, 0.75).
(0.25, 0.5, 0.75)
Returns
Table
A table containing summary information about the columns of self.
Notes
This function computes summary statistics for each column in the table. For numerical columns, it computes statistics such as minimum, maximum, mean, standard deviation, and quantiles. For string columns, it computes the mode and the number of unique values.
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> p = ibis.examples.penguins.fetch()
>>> p.describe()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ pos ┃ type ┃ count ┃ nulls ┃ unique ┃ mode ┃ mean ┃ std ┃ min ┃ p25 ┃ p50 ┃ p75 ┃ max ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ string │ int16 │ string │ int64 │ int64 │ int64 │ string │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │
├───────────────────┼───────┼─────────┼───────┼───────┼────────┼────────┼─────────────┼────────────┼─────────┼──────────┼─────────┼─────────┼─────────┤
│ species │ 0 │ string │ 344 │ 0 │ 3 │ Adelie │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ island │ 1 │ string │ 344 │ 0 │ 3 │ Biscoe │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ bill_length_mm │ 2 │ float64 │ 344 │ 2 │ 164 │ NULL │ 43.921930 │ 5.459584 │ 32.1 │ 39.225 │ 44.45 │ 48.5 │ 59.6 │
│ bill_depth_mm │ 3 │ float64 │ 344 │ 2 │ 80 │ NULL │ 17.151170 │ 1.974793 │ 13.1 │ 15.600 │ 17.30 │ 18.7 │ 21.5 │
│ flipper_length_mm │ 4 │ int64 │ 344 │ 2 │ 55 │ NULL │ 200.915205 │ 14.061714 │ 172.0 │ 190.000 │ 197.00 │ 213.0 │ 231.0 │
│ body_mass_g │ 5 │ int64 │ 344 │ 2 │ 94 │ NULL │ 4201.754386 │ 801.954536 │ 2700.0 │ 3550.000 │ 4050.00 │ 4750.0 │ 6300.0 │
│ sex │ 6 │ string │ 344 │ 11 │ 2 │ male │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ year │ 7 │ int64 │ 344 │ 0 │ 3 │ NULL │ 2008.029070 │ 0.818356 │ 2007.0 │ 2007.000 │ 2008.00 │ 2009.0 │ 2009.0 │
└───────────────────┴───────┴─────────┴───────┴───────┴────────┴────────┴─────────────┴────────────┴─────────┴──────────┴─────────┴─────────┴─────────┘
>>> p.select(s.of_type("numeric" )).describe()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ pos ┃ type ┃ count ┃ nulls ┃ unique ┃ mean ┃ std ┃ min ┃ p25 ┃ p50 ┃ p75 ┃ max ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ string │ int16 │ string │ int64 │ int64 │ int64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │
├───────────────────┼───────┼─────────┼───────┼───────┼────────┼─────────────┼────────────┼─────────┼──────────┼─────────┼─────────┼─────────┤
│ flipper_length_mm │ 2 │ int64 │ 344 │ 2 │ 55 │ 200.915205 │ 14.061714 │ 172.0 │ 190.000 │ 197.00 │ 213.0 │ 231.0 │
│ body_mass_g │ 3 │ int64 │ 344 │ 2 │ 94 │ 4201.754386 │ 801.954536 │ 2700.0 │ 3550.000 │ 4050.00 │ 4750.0 │ 6300.0 │
│ year │ 4 │ int64 │ 344 │ 0 │ 3 │ 2008.029070 │ 0.818356 │ 2007.0 │ 2007.000 │ 2008.00 │ 2009.0 │ 2009.0 │
│ bill_length_mm │ 0 │ float64 │ 344 │ 2 │ 164 │ 43.921930 │ 5.459584 │ 32.1 │ 39.225 │ 44.45 │ 48.5 │ 59.6 │
│ bill_depth_mm │ 1 │ float64 │ 344 │ 2 │ 80 │ 17.151170 │ 1.974793 │ 13.1 │ 15.600 │ 17.30 │ 18.7 │ 21.5 │
└───────────────────┴───────┴─────────┴───────┴───────┴────────┴─────────────┴────────────┴─────────┴──────────┴─────────┴─────────┴─────────┘
>>> p.select(s.of_type("string" )).describe()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ name ┃ pos ┃ type ┃ count ┃ nulls ┃ unique ┃ mode ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int16 │ string │ int64 │ int64 │ int64 │ string │
├─────────┼───────┼────────┼───────┼───────┼────────┼────────┤
│ sex │ 2 │ string │ 344 │ 11 │ 2 │ male │
│ species │ 0 │ string │ 344 │ 0 │ 3 │ Adelie │
│ island │ 1 │ string │ 344 │ 0 │ 3 │ Biscoe │
└─────────┴───────┴────────┴───────┴───────┴────────┴────────┘
difference
difference(table, * rest, distinct= True )
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only diff distinct rows not occurring in the calling table
True
Returns
Table
The rows present in self
that are not present in tables
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
└───────┘
distinct
distinct(on= None , keep= 'first' )
Return a Table with duplicate rows removed.
Similar to pandas.DataFrame.drop_duplicates()
.
Parameters
on
str | Iterable [str ] | s
.Selector
| None
Only consider certain columns for identifying duplicates. By default deduplicate all of the columns.
None
keep
Literal ['first', 'last'] | None
Determines which duplicates to keep. - "first"
: Drop duplicates except for the first occurrence. - "last"
: Drop duplicates except for the last occurrence. - None
: Drop all duplicates
'first'
Examples
>>> import ibis
>>> import ibis.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Compute the distinct rows of a subset of columns
>>> t[["species" , "island" ]].distinct().order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ island ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├───────────┼───────────┤
│ Adelie │ Biscoe │
│ Adelie │ Dream │
│ Adelie │ Torgersen │
│ Chinstrap │ Dream │
│ Gentoo │ Biscoe │
└───────────┴───────────┘
Drop all duplicate rows except the first
>>> t.distinct(on= ["species" , "island" ], keep= "first" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174 │ 3400 │ female │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192 │ 3500 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.1 │ 13.2 │ 211 │ 4500 │ female │ 2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicate rows except the last
>>> t.distinct(on= ["species" , "island" ], keep= "last" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 42.7 │ 18.3 │ 196 │ 4075 │ male │ 2009 │
│ Adelie │ Dream │ 41.5 │ 18.5 │ 201 │ 4000 │ male │ 2009 │
│ Adelie │ Torgersen │ 43.1 │ 19.2 │ 197 │ 3500 │ male │ 2009 │
│ Chinstrap │ Dream │ 50.2 │ 18.7 │ 198 │ 3775 │ female │ 2009 │
│ Gentoo │ Biscoe │ 49.9 │ 16.1 │ 213 │ 5400 │ male │ 2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicated rows
>>> expr = t.distinct(on= ["species" , "island" , "year" , "bill_length_mm" ], keep= None )
>>> expr.count()
You can pass selectors
to on
>>> t.distinct(on=~ s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.7 │ 18.7 │ 180 │ 3600 │ male │ 2007 │
│ Gentoo │ Biscoe │ 50.0 │ 16.3 │ 230 │ 5700 │ male │ 2007 │
│ Gentoo │ Biscoe │ 44.5 │ 14.3 │ 216 │ 4100 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178 │ 3250 │ female │ 2007 │
│ Chinstrap │ Dream │ 50.0 │ 19.5 │ 196 │ 3900 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Dream │ 37.2 │ 18.1 │ 178 │ 3900 │ male │ 2007 │
│ Adelie │ Dream │ 37.5 │ 18.9 │ 179 │ 2975 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
The only valid values of keep
are "first"
, "last"
and None
.
>>> t.distinct(on= "species" , keep= "second" )
---------------------------------------------------------------------------
IbisError Traceback (most recent call last)
Cell In[122], line 1
----> 1 t . distinct ( on = " species " , keep = " second " )
File ~/work/ibis/ibis/ibis/expr/types/relations.py:1199 , in Table.distinct (self, on, keep)
1197 method = keep
1198 else :
-> 1199 raise com. IbisError(
1200 f " Invalid value for `keep`: { keep!r} , must be ' first ' , ' last ' or None "
1201 )
1203 aggs = {col. get_name(): getattr (col, method)() for col in (~ on). expand(self )}
1204 res = self . aggregate(aggs, by= on, having= having)
IbisError : Invalid value for `keep`: 'second', must be 'first', 'last' or None
drop
Remove fields from a table.
Parameters
fields
str | Selector
Fields to drop. Strings and selectors are accepted.
()
Returns
Table
A table with all columns matching fields
removed.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop one or more columns
>>> t.drop("species" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop("species" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ float64 │ int64 │ int64 │ string │ int64 │
├───────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop with selectors, mix and match
>>> import ibis.selectors as s
>>> t.drop("species" , s.startswith("bill_" )).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ island ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ string │ int64 │
├───────────┼───────────────────┼─────────────┼────────┼───────┤
│ Torgersen │ 181 │ 3750 │ male │ 2007 │
│ Torgersen │ 186 │ 3800 │ female │ 2007 │
│ Torgersen │ 195 │ 3250 │ female │ 2007 │
│ Torgersen │ NULL │ NULL │ NULL │ 2007 │
│ Torgersen │ 193 │ 3450 │ female │ 2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘
drop_null
drop_null(subset= None , how= 'any' )
Remove rows with null values from the table.
Parameters
subset
Sequence [str ] | str | None
Columns names to consider when dropping nulls. By default all columns are considered.
None
how
Literal ['any', 'all']
Determine whether a row is removed if there is at least one null value in the row ('any'
), or if all row values are null ('all'
).
'any'
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.drop_null(["bill_length_mm" , "body_mass_g" ]).count()
>>> t.drop_null(how= "all" ).count() # no rows where all columns are null
dropna
dropna(subset= None , how= 'any' )
Deprecated - use drop_null
instead.
equals
Return whether this expression is structurally equivalent to other
.
If you want to produce an equality expression, use ==
syntax.
Parameters
other
Another expression
required
Examples
>>> import ibis
>>> t1 = ibis.table(dict (a= "int" ), name= "t" )
>>> t2 = ibis.table(dict (a= "int" ), name= "t" )
>>> t1.equals(t2)
>>> v = ibis.table(dict (a= "string" ), name= "v" )
>>> t1.equals(v)
execute
execute(limit= 'default' , params= None , ** kwargs)
Execute an expression against its backend if one exists.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
'default'
params
Mapping [ir
.Value
, Any ] | None
Mapping of scalar parameter expressions to value
None
kwargs
Any
Keyword arguments
{}
Examples
>>> import ibis
>>> t = ibis.examples.penguins.fetch()
>>> t.execute()
0
Adelie
Torgersen
39.1
18.7
181.0
3750.0
male
2007
1
Adelie
Torgersen
39.5
17.4
186.0
3800.0
female
2007
2
Adelie
Torgersen
40.3
18.0
195.0
3250.0
female
2007
3
Adelie
Torgersen
NaN
NaN
NaN
NaN
None
2007
4
Adelie
Torgersen
36.7
19.3
193.0
3450.0
female
2007
...
...
...
...
...
...
...
...
...
339
Chinstrap
Dream
55.8
19.8
207.0
4000.0
male
2009
340
Chinstrap
Dream
43.5
18.1
202.0
3400.0
female
2009
341
Chinstrap
Dream
49.6
18.2
193.0
3775.0
male
2009
342
Chinstrap
Dream
50.8
19.0
210.0
4100.0
male
2009
343
Chinstrap
Dream
50.2
18.7
198.0
3775.0
female
2009
344 rows × 8 columns
Scalar parameters can be supplied dynamically during execution.
>>> species = ibis.param("string" )
>>> expr = t.filter (t.species == species).order_by(t.bill_length_mm)
>>> expr.execute(limit= 3 , params= {species: "Gentoo" })
0
Gentoo
Biscoe
40.9
13.7
214
4650
female
2007
1
Gentoo
Biscoe
41.7
14.7
210
4700
female
2009
2
Gentoo
Biscoe
42.0
13.5
210
4150
female
2007
fill_null
Fill null values in a table expression.
For example, different library versions may impact whether a given backend promotes integer replacement values to floats.
Parameters
replacements
ir
.Scalar
| Mapping [str , ir
.Scalar
]
Value with which to fill nulls. If replacements
is a mapping, the keys are column names that map to their replacement value. If passed as a scalar all columns are filled with that value.
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.sex
┏━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ male │
│ female │
│ female │
│ NULL │
│ female │
│ male │
│ female │
│ male │
│ NULL │
│ NULL │
│ … │
└────────┘
>>> t.fill_null({"sex" : "unrecorded" }).sex
┏━━━━━━━━━━━━┓
┃ sex ┃
┡━━━━━━━━━━━━┩
│ string │
├────────────┤
│ male │
│ female │
│ female │
│ unrecorded │
│ female │
│ male │
│ female │
│ male │
│ unrecorded │
│ unrecorded │
│ … │
└────────────┘
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
Parameters
predicates
ir
.BooleanValue
| Sequence [ir
.BooleanValue
] | IfAnyAll
Boolean value expressions used to select rows in table
.
()
Returns
Table
Filtered table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter ([t.species == "Adelie" , t.body_mass_g > 3500 ]).sex.value_counts().drop_null(
... "sex"
... ).order_by("sex" )
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │
├────────┼───────────┤
│ female │ 22 │
│ male │ 68 │
└────────┴───────────┘
get_backend
Get the current Ibis backend of the expression.
Returns
BaseBackend
The Ibis backend.
Examples
>>> import ibis
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table("t" , {"id" : [1 , 2 , 3 ]})
>>> t.get_backend()
<ibis.backends.duckdb.Backend at 0x7fff88314450>
get_name
Return the fully qualified name of the table.
Examples
>>> import ibis
>>> con = ibis.duckdb.connect ()
>>> t = con.create_table("t" , {"id" : [1 , 2 , 3 ]})
>>> t.get_name()
group_by
group_by(* by, ** key_exprs)
Create a grouped table expression.
Similar to SQL’s GROUP BY statement, or pandas .groupby() method.
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "fruit" : ["apple" , "apple" , "banana" , "orange" ],
... "price" : [0.5 , 0.5 , 0.25 , 0.33 ],
... }
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ fruit ┃ price ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────┤
│ apple │ 0.50 │
│ apple │ 0.50 │
│ banana │ 0.25 │
│ orange │ 0.33 │
└────────┴─────────┘
>>> t.group_by("fruit" ).agg(total_cost= _.price.sum (), avg_cost= _.price.mean()).order_by(
... "fruit"
... )
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ fruit ┃ total_cost ┃ avg_cost ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼────────────┼──────────┤
│ apple │ 1.00 │ 0.50 │
│ banana │ 0.25 │ 0.25 │
│ orange │ 0.33 │ 0.33 │
└────────┴────────────┴──────────┘
head
Select the first n
rows of a table.
Parameters
n
int
Number of rows to include
5
Returns
Table
self
limited to n
rows
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
info
Return summary information about a table.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.info()
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ name ┃ type ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ boolean │ int64 │ int64 │ float64 │ int16 │
├───────────────────┼─────────┼──────────┼───────┼───────────┼───────────┼───────┤
│ species │ string │ True │ 0 │ 344 │ 0.000000 │ 0 │
│ island │ string │ True │ 0 │ 344 │ 0.000000 │ 1 │
│ bill_length_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 2 │
│ bill_depth_mm │ float64 │ True │ 2 │ 342 │ 0.005814 │ 3 │
│ flipper_length_mm │ int64 │ True │ 2 │ 342 │ 0.005814 │ 4 │
│ body_mass_g │ int64 │ True │ 2 │ 342 │ 0.005814 │ 5 │
│ sex │ string │ True │ 11 │ 333 │ 0.031977 │ 6 │
│ year │ int64 │ True │ 0 │ 344 │ 0.000000 │ 7 │
└───────────────────┴─────────┴──────────┴───────┴───────────┴───────────┴───────┘
intersect
intersect(table, * rest, distinct= True )
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
True
Returns
Table
A new table containing the intersection of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
>>> t1.intersect(t2, distinct= False )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 2 │
└───────┘
More than two table expressions can be intersected at once.
>>> t3 = ibis.memtable({"a" : [2 , 3 , 3 ]})
>>> t1.intersect(t2, t3)
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
join
join(left, right, predicates= (), how= 'inner' , * , lname= '' , rname= ' {name} _right' )
Perform a join between two tables.
Parameters
left
Table
Left table to join
required
right
Table
Right table to join
required
predicates
str | Sequence [str | ir
.BooleanColumn
| Literal [True] | Literal [False] | tuple [str | ir
.Column
| ir
.Deferred
, str | ir
.Column
| ir
.Deferred
]]
Condition(s) to join on. See examples for details.
()
how
JoinKind
Join method, e.g. "inner"
or "left"
.
'inner'
lname
str
A format string to use to rename overlapping columns in the left table (e.g. "left_{name}"
).
''
rname
str
A format string to use to rename overlapping columns in the right table (e.g. "right_{name}"
).
'{name}_right'
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> movies = ibis.examples.ml_latest_small_movies.fetch()
>>> movies.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ title ┃ genres ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ string │
├─────────┼────────────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 2 │ Jumanji (1995) │ Adventure|Children|Fantasy │
│ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 4 │ Waiting to Exhale (1995) │ Comedy|Drama|Romance │
│ 5 │ Father of the Bride Part II (1995) │ Comedy │
└─────────┴────────────────────────────────────┴─────────────────────────────────────────────┘
>>> ratings = ibis.examples.ml_latest_small_ratings.fetch().drop("timestamp" )
>>> ratings.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ float64 │
├────────┼─────────┼─────────┤
│ 1 │ 1 │ 4.0 │
│ 1 │ 3 │ 4.0 │
│ 1 │ 6 │ 4.0 │
│ 1 │ 47 │ 5.0 │
│ 1 │ 50 │ 5.0 │
└────────┴─────────┴─────────┘
Equality left join on the shared movieId
column. Note the _right
suffix added to all overlapping columns from the right table (in this case only the “movieId” column).
>>> ratings.join(movies, "movieId" , how= "left" ).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ movieId_right ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ int64 │ string │ string │
├────────┼─────────┼─────────┼───────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ 1 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ 3 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ 6 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ 47 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ 50 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴───────────────┴─────────────────────────────┴─────────────────────────────────────────────┘
Explicit equality join using the default how
value of "inner"
. Note how there is no _right
suffix added to the movieId
column since this is an inner join and the movieId
column is part of the join condition.
>>> ratings.join(movies, ratings.movieId == movies.movieId).head(5 )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ rating ┃ title ┃ genres ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64 │ int64 │ float64 │ string │ string │
├────────┼─────────┼─────────┼─────────────────────────────┼─────────────────────────────────────────────┤
│ 1 │ 1 │ 4.0 │ Toy Story (1995) │ Adventure|Animation|Children|Comedy|Fantasy │
│ 1 │ 3 │ 4.0 │ Grumpier Old Men (1995) │ Comedy|Romance │
│ 1 │ 6 │ 4.0 │ Heat (1995) │ Action|Crime|Thriller │
│ 1 │ 47 │ 5.0 │ Seven (a.k.a. Se7en) (1995) │ Mystery|Thriller │
│ 1 │ 50 │ 5.0 │ Usual Suspects, The (1995) │ Crime|Mystery|Thriller │
└────────┴─────────┴─────────┴─────────────────────────────┴─────────────────────────────────────────────┘
>>> tags = ibis.examples.ml_latest_small_tags.fetch()
>>> tags.head()
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │
├────────┼─────────┼─────────────────┼────────────┤
│ 2 │ 60756 │ funny │ 1445714994 │
│ 2 │ 60756 │ Highly quotable │ 1445714996 │
│ 2 │ 60756 │ will ferrell │ 1445714992 │
│ 2 │ 89774 │ Boxing story │ 1445715207 │
│ 2 │ 89774 │ MMA │ 1445715200 │
└────────┴─────────┴─────────────────┴────────────┘
You can join on multiple columns/conditions by passing in a sequence. Find all instances where a user both tagged and rated a movie:
>>> tags.join(ratings, ["userId" , "movieId" ]).head(5 ).order_by("userId" )
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ userId ┃ movieId ┃ tag ┃ timestamp ┃ rating ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ string │ int64 │ float64 │
├────────┼─────────┼────────────────┼────────────┼─────────┤
│ 62 │ 2 │ Robin Williams │ 1528843907 │ 4.0 │
│ 62 │ 110 │ sword fight │ 1528152535 │ 4.5 │
│ 62 │ 410 │ gothic │ 1525636609 │ 4.5 │
│ 62 │ 2023 │ mafia │ 1525636733 │ 5.0 │
│ 62 │ 2124 │ quirky │ 1525636846 │ 5.0 │
└────────┴─────────┴────────────────┴────────────┴─────────┘
To self-join a table with itself, you need to call .view()
on one of the arguments so the two tables are distinct from each other.
For crafting more complex join conditions, a valid form of a join condition is a 2-tuple like ({left_key}, {right_key})
, where each key can be
a Column
Deferred expression
lambda of the form (Table) -> Column
For example, to find all movies pairings that received the same (ignoring case) tags:
>>> movie_tags = tags["movieId" , "tag" ]
>>> view = movie_tags.view()
>>> movie_tags.join(
... view,
... [
... movie_tags.movieId != view.movieId,
... (_.tag.lower(), lambda t: t.tag.lower()),
... ],
... ).head().order_by(("movieId" , "movieId_right" ))
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ movieId ┃ tag ┃ movieId_right ┃ tag_right ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │ string │
├─────────┼───────────────────┼───────────────┼───────────────────┤
│ 60756 │ funny │ 1732 │ funny │
│ 60756 │ Highly quotable │ 1732 │ Highly quotable │
│ 89774 │ Tom Hardy │ 139385 │ tom hardy │
│ 106782 │ drugs │ 1732 │ drugs │
│ 106782 │ Leonardo DiCaprio │ 5989 │ Leonardo DiCaprio │
└─────────┴───────────────────┴───────────────┴───────────────────┘
limit
Select n
rows from self
starting at offset
.
Parameters
n
int | None
Number of rows to include. If None
, the entire table is selected starting from offset
.
required
offset
int
Number of rows to skip first
0
Returns
Table
The first n
rows of self
starting at offset
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
You can use None
with offset
to slice starting from a particular row
>>> t.limit(None , offset= 1 )
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
mutate
mutate(* exprs, ** mutations)
Add columns to a table expression.
Parameters
exprs
Sequence [ir
.Expr
] | None
List of named expressions to add as columns
()
mutations
ir
.Value
Named expressions using keyword arguments
{}
Returns
Table
Table expression with additional columns
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch().select("species" , "year" , "bill_length_mm" )
>>> t
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ 39.1 │
│ Adelie │ 2007 │ 39.5 │
│ Adelie │ 2007 │ 40.3 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ 36.7 │
│ Adelie │ 2007 │ 39.3 │
│ Adelie │ 2007 │ 38.9 │
│ Adelie │ 2007 │ 39.2 │
│ Adelie │ 2007 │ 34.1 │
│ Adelie │ 2007 │ 42.0 │
│ … │ … │ … │
└─────────┴───────┴────────────────┘
Add a new column from a per-element expression
>>> t.mutate(next_year= _.year + 1 ).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃ next_year ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │ float64 │ int64 │
├─────────┼───────┼────────────────┼───────────┤
│ Adelie │ 2007 │ 39.1 │ 2008 │
│ Adelie │ 2007 │ 39.5 │ 2008 │
│ Adelie │ 2007 │ 40.3 │ 2008 │
│ Adelie │ 2007 │ NULL │ 2008 │
│ Adelie │ 2007 │ 36.7 │ 2008 │
└─────────┴───────┴────────────────┴───────────┘
Add a new column based on an aggregation. Note the automatic broadcasting.
>>> t.select("species" , bill_demean= _.bill_length_mm - _.bill_length_mm.mean()).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ species ┃ bill_demean ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━┩
│ string │ float64 │
├─────────┼─────────────┤
│ Adelie │ -4.82193 │
│ Adelie │ -4.42193 │
│ Adelie │ -3.62193 │
│ Adelie │ NULL │
│ Adelie │ -7.22193 │
└─────────┴─────────────┘
Mutate across multiple columns
>>> t.mutate(s.across(s.numeric() & ~ s.cols("year" ), _ - _.mean())).head()
┏━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ species ┃ year ┃ bill_length_mm ┃
┡━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ int64 │ float64 │
├─────────┼───────┼────────────────┤
│ Adelie │ 2007 │ -4.82193 │
│ Adelie │ 2007 │ -4.42193 │
│ Adelie │ 2007 │ -3.62193 │
│ Adelie │ 2007 │ NULL │
│ Adelie │ 2007 │ -7.22193 │
└─────────┴───────┴────────────────┘
nunique
Compute the number of unique rows in the table.
Parameters
where
ir
.BooleanValue
| None
Optional boolean expression to filter rows when counting.
None
Returns
IntegerScalar
Number of unique rows in the table
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["foo" , "bar" , "bar" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ bar │
└────────┘
>>> t.nunique(t.a != "foo" )
order_by
Sort a table by one or more expressions.
Similar to pandas.DataFrame.sort_values()
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "a" : [3 , 2 , 1 , 3 ],
... "b" : ["a" , "B" , "c" , "D" ],
... "c" : [4 , 6 , 5 , 7 ],
... }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
Sort by b. Default is ascending. Note how capital letters come before lowercase
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
│ 1 │ c │ 5 │
└───────┴────────┴───────┘
Sort in descending order
>>> t.order_by(ibis.desc("b" ))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
You can also use the deferred API to get the same result
>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
Sort by multiple columns/expressions
>>> t.order_by(["a" , _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
└───────┴────────┴───────┘
You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b
>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
This means that shuffling a Table is super simple
>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
└───────┴────────┴───────┘
Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria
>>> import ibis.selectors as s
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins[["year" , "island" ]].value_counts().order_by(s.startswith("year" ))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2007 │ Torgersen │ 20 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Dream │ 46 │
│ 2008 │ Torgersen │ 16 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Biscoe │ 64 │
│ 2009 │ Torgersen │ 16 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Biscoe │ 60 │
└───────┴───────────┴───────────────────┘
Use the across
selector to apply a specific order to multiple columns
>>> penguins[["year" , "island" ]].value_counts().order_by(
... s.across(s.startswith("year" ), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2009 │ Biscoe │ 60 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Torgersen │ 16 │
│ 2008 │ Biscoe │ 64 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Torgersen │ 16 │
│ 2007 │ Dream │ 46 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Torgersen │ 20 │
└───────┴───────────┴───────────────────┘
pipe
Compose f
with self
.
Parameters
f
If the expression needs to be passed as anything other than the first argument to the function, pass a tuple with the argument name. For example, (f, ‘data’) if the function f expects a ‘data’ keyword
required
args
Any
Positional arguments to f
()
kwargs
Any
Keyword arguments to f
{}
Examples
>>> import ibis
>>> t = ibis.memtable(
... {
... "a" : [5 , 10 , 15 ],
... "b" : ["a" , "b" , "c" ],
... }
... )
>>> f = lambda a: (a + 1 ).name("a" )
>>> g = lambda a: (a * 2 ).name("a" )
>>> result1 = t.a.pipe(f).pipe(g)
>>> result1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 12 │
│ 22 │
│ 32 │
└───────┘
>>> result2 = g(f(t.a)) # equivalent to the above
>>> result1.equals(result2)
Returns
Expr
Result type of passed function
pivot_longer
pivot_longer(
col,
* ,
names_to= 'name' ,
names_pattern= '(.+)' ,
names_transform= None ,
values_to= 'value' ,
values_transform= None ,
)
Transform a table from wider to longer.
Parameters
col
str | s
.Selector
String column name or selector.
required
names_to
str | Iterable [str ]
A string or iterable of strings indicating how to name the new pivoted columns.
'name'
names_pattern
str | re .Pattern
Pattern to use to extract column names from the input. By default the entire column name is extracted.
'(.+)'
names_transform
Callable [[str ], ir
.Value
] | Mapping [str , Callable [[str ], ir
.Value
]] | None
Function or mapping of a name in names_to
to a function to transform a column name to a value.
None
values_to
str
Name of the pivoted value column.
'value'
values_transform
Callable [[ir
.Value
], ir
.Value
] | Deferred | None
Apply a function to the value column. This can be a lambda or deferred expression.
None
Examples
Basic usage
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
>>> relig_income = ibis.examples.relig_income_raw.fetch()
>>> relig_income
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ religion ┃ <$10k ┃ $10-20k ┃ $20-30k ┃ $30-40k ┃ $40-50k ┃ $50-75k ┃ $75-100k ┃ $100-150k ┃ >150k ┃ Don't know/refused ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────────────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼──────────┼───────────┼───────┼────────────────────┤
│ Agnostic │ 27 │ 34 │ 60 │ 81 │ 76 │ 137 │ 122 │ 109 │ 84 │ 96 │
│ Atheist │ 12 │ 27 │ 37 │ 52 │ 35 │ 70 │ 73 │ 59 │ 74 │ 76 │
│ Buddhist │ 27 │ 21 │ 30 │ 34 │ 33 │ 58 │ 62 │ 39 │ 53 │ 54 │
│ Catholic │ 418 │ 617 │ 732 │ 670 │ 638 │ 1116 │ 949 │ 792 │ 633 │ 1489 │
│ Don’t know/refused │ 15 │ 14 │ 15 │ 11 │ 10 │ 35 │ 21 │ 17 │ 18 │ 116 │
│ Evangelical Prot │ 575 │ 869 │ 1064 │ 982 │ 881 │ 1486 │ 949 │ 723 │ 414 │ 1529 │
│ Hindu │ 1 │ 9 │ 7 │ 9 │ 11 │ 34 │ 47 │ 48 │ 54 │ 37 │
│ Historically Black Prot │ 228 │ 244 │ 236 │ 238 │ 197 │ 223 │ 131 │ 81 │ 78 │ 339 │
│ Jehovah's Witness │ 20 │ 27 │ 24 │ 24 │ 21 │ 30 │ 15 │ 11 │ 6 │ 37 │
│ Jewish │ 19 │ 19 │ 25 │ 25 │ 30 │ 95 │ 69 │ 87 │ 151 │ 162 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────────────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴──────────┴───────────┴───────┴────────────────────┘
Here we convert column names not matching the selector for the religion
column and convert those names into values
>>> relig_income.pivot_longer(~ s.cols("religion" ), names_to= "income" , values_to= "count" )
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ religion ┃ income ┃ count ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │
├──────────┼────────────────────┼───────┤
│ Agnostic │ <$10k │ 27 │
│ Agnostic │ $10-20k │ 34 │
│ Agnostic │ $20-30k │ 60 │
│ Agnostic │ $30-40k │ 81 │
│ Agnostic │ $40-50k │ 76 │
│ Agnostic │ $50-75k │ 137 │
│ Agnostic │ $75-100k │ 122 │
│ Agnostic │ $100-150k │ 109 │
│ Agnostic │ >150k │ 84 │
│ Agnostic │ Don't know/refused │ 96 │
│ … │ … │ … │
└──────────┴────────────────────┴───────┘
Similarly for a different example dataset, we convert names to values but using a different selector and the default values_to
value.
>>> world_bank_pop = ibis.examples.world_bank_pop_raw.fetch()
>>> world_bank_pop.head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country ┃ indicator ┃ 2000 ┃ 2001 ┃ 2002 ┃ 2003 ┃ 2004 ┃ 2005 ┃ 2006 ┃ 2007 ┃ 2008 ┃ 2009 ┃ 2010 ┃ 2011 ┃ 2012 ┃ 2013 ┃ 2014 ┃ 2015 ┃ 2016 ┃ 2017 ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │ float64 │
├─────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ ABW │ SP.URB.TOTL │ 4.162500e+04 │ 4.202500e+04 │ 4.219400e+04 │ 4.227700e+04 │ 4.231700e+04 │ 4.239900e+04 │ 4.255500e+04 │ 4.272900e+04 │ 4.290600e+04 │ 4.307900e+04 │ 4.320600e+04 │ 4.349300e+04 │ 4.386400e+04 │ 4.422800e+04 │ 4.458800e+04 │ 4.494300e+04 │ 4.529700e+04 │ 4.564800e+04 │
│ ABW │ SP.URB.GROW │ 1.664222e+00 │ 9.563731e-01 │ 4.013352e-01 │ 1.965172e-01 │ 9.456936e-02 │ 1.935880e-01 │ 3.672580e-01 │ 4.080490e-01 │ 4.133830e-01 │ 4.023963e-01 │ 2.943735e-01 │ 6.620631e-01 │ 8.493932e-01 │ 8.264135e-01 │ 8.106692e-01 │ 7.930256e-01 │ 7.845785e-01 │ 7.718989e-01 │
│ ABW │ SP.POP.TOTL │ 8.910100e+04 │ 9.069100e+04 │ 9.178100e+04 │ 9.270100e+04 │ 9.354000e+04 │ 9.448300e+04 │ 9.560600e+04 │ 9.678700e+04 │ 9.799600e+04 │ 9.921200e+04 │ 1.003410e+05 │ 1.012880e+05 │ 1.021120e+05 │ 1.028800e+05 │ 1.035940e+05 │ 1.042570e+05 │ 1.048740e+05 │ 1.054390e+05 │
│ ABW │ SP.POP.GROW │ 2.539234e+00 │ 1.768757e+00 │ 1.194718e+00 │ 9.973955e-01 │ 9.009892e-01 │ 1.003077e+00 │ 1.181566e+00 │ 1.227711e+00 │ 1.241397e+00 │ 1.233231e+00 │ 1.131541e+00 │ 9.393559e-01 │ 8.102306e-01 │ 7.493010e-01 │ 6.916153e-01 │ 6.379592e-01 │ 5.900625e-01 │ 5.372957e-01 │
│ AFE │ SP.URB.TOTL │ 1.155517e+08 │ 1.197755e+08 │ 1.242275e+08 │ 1.288340e+08 │ 1.336475e+08 │ 1.387456e+08 │ 1.440267e+08 │ 1.492313e+08 │ 1.553838e+08 │ 1.617762e+08 │ 1.684561e+08 │ 1.754157e+08 │ 1.825587e+08 │ 1.901087e+08 │ 1.980733e+08 │ 2.065563e+08 │ 2.150833e+08 │ 2.237321e+08 │
└─────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
>>> world_bank_pop.pivot_longer(s.matches(r"\d {4} " ), names_to= "year" ).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃ country ┃ indicator ┃ year ┃ value ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ string │ string │ string │ float64 │
├─────────┼─────────────┼────────┼─────────┤
│ ABW │ SP.URB.TOTL │ 2000 │ 41625.0 │
│ ABW │ SP.URB.TOTL │ 2001 │ 42025.0 │
│ ABW │ SP.URB.TOTL │ 2002 │ 42194.0 │
│ ABW │ SP.URB.TOTL │ 2003 │ 42277.0 │
│ ABW │ SP.URB.TOTL │ 2004 │ 42317.0 │
└─────────┴─────────────┴────────┴─────────┘
pivot_longer
has some preprocessing capabilities like stripping a prefix and applying a function to column names
>>> billboard = ibis.examples.billboard.fetch()
>>> billboard
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ wk1 ┃ wk2 ┃ wk3 ┃ wk4 ┃ wk5 ┃ wk6 ┃ wk7 ┃ wk8 ┃ wk9 ┃ wk10 ┃ wk11 ┃ wk12 ┃ wk13 ┃ wk14 ┃ wk15 ┃ wk16 ┃ wk17 ┃ wk18 ┃ wk19 ┃ wk20 ┃ wk21 ┃ wk22 ┃ wk23 ┃ wk24 ┃ wk25 ┃ wk26 ┃ wk27 ┃ wk28 ┃ wk29 ┃ wk30 ┃ wk31 ┃ wk32 ┃ wk33 ┃ wk34 ┃ wk35 ┃ wk36 ┃ wk37 ┃ wk38 ┃ wk39 ┃ wk40 ┃ wk41 ┃ wk42 ┃ wk43 ┃ wk44 ┃ wk45 ┃ wk46 ┃ wk47 ┃ wk48 ┃ wk49 ┃ wk50 ┃ wk51 ┃ wk52 ┃ wk53 ┃ wk54 ┃ wk55 ┃ wk56 ┃ wk57 ┃ wk58 ┃ wk59 ┃ wk60 ┃ wk61 ┃ wk62 ┃ wk63 ┃ wk64 ┃ wk65 ┃ wk66 ┃ wk67 ┃ wk68 ┃ wk69 ┃ wk70 ┃ wk71 ┃ wk72 ┃ wk73 ┃ wk74 ┃ wk75 ┃ wk76 ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ string │ date │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │ string │
├────────────────┼─────────────────────────┼──────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 87 │ 82 │ 72 │ 77 │ 87 │ 94 │ 99 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 91 │ 87 │ 92 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Kryptonite │ 2000-04-08 │ 81 │ 70 │ 68 │ 67 │ 66 │ 57 │ 54 │ 53 │ 51 │ 51 │ 51 │ 51 │ 47 │ 44 │ 38 │ 28 │ 22 │ 18 │ 18 │ 14 │ 12 │ 7 │ 6 │ 6 │ 6 │ 5 │ 5 │ 4 │ 4 │ 4 │ 4 │ 3 │ 3 │ 3 │ 4 │ 5 │ 5 │ 9 │ 9 │ 15 │ 14 │ 13 │ 14 │ 16 │ 17 │ 21 │ 22 │ 24 │ 28 │ 33 │ 42 │ 42 │ 49 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 3 Doors Down │ Loser │ 2000-10-21 │ 76 │ 76 │ 72 │ 69 │ 67 │ 65 │ 55 │ 59 │ 62 │ 61 │ 61 │ 59 │ 61 │ 66 │ 72 │ 76 │ 75 │ 67 │ 73 │ 70 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 504 Boyz │ Wobble Wobble │ 2000-04-15 │ 57 │ 34 │ 25 │ 17 │ 17 │ 31 │ 36 │ 49 │ 53 │ 57 │ 64 │ 70 │ 75 │ 76 │ 78 │ 85 │ 92 │ 96 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ 98^0 │ Give Me Just One Nig... │ 2000-08-19 │ 51 │ 39 │ 34 │ 26 │ 26 │ 19 │ 2 │ 2 │ 3 │ 6 │ 7 │ 22 │ 29 │ 36 │ 47 │ 67 │ 66 │ 84 │ 93 │ 94 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ A*Teens │ Dancing Queen │ 2000-07-08 │ 97 │ 97 │ 96 │ 95 │ 100 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ I Don't Wanna │ 2000-01-29 │ 84 │ 62 │ 51 │ 41 │ 38 │ 35 │ 35 │ 38 │ 38 │ 36 │ 37 │ 37 │ 38 │ 49 │ 61 │ 63 │ 62 │ 67 │ 83 │ 86 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Aaliyah │ Try Again │ 2000-03-18 │ 59 │ 53 │ 38 │ 28 │ 21 │ 18 │ 16 │ 14 │ 12 │ 10 │ 9 │ 8 │ 6 │ 1 │ 2 │ 2 │ 2 │ 2 │ 3 │ 4 │ 5 │ 5 │ 6 │ 9 │ 13 │ 14 │ 16 │ 23 │ 22 │ 33 │ 36 │ 43 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Adams, Yolanda │ Open My Heart │ 2000-08-26 │ 76 │ 76 │ 74 │ 69 │ 68 │ 67 │ 61 │ 58 │ 57 │ 59 │ 66 │ 68 │ 61 │ 67 │ 59 │ 63 │ 67 │ 71 │ 79 │ 89 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└────────────────┴─────────────────────────┴──────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
>>> billboard.pivot_longer(
... s.startswith("wk" ),
... names_to= "week" ,
... names_pattern= r"wk(.+)" ,
... names_transform= int ,
... values_to= "rank" ,
... values_transform= _.cast("int" ),
... ).drop_null("rank" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ artist ┃ track ┃ date_entered ┃ week ┃ rank ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ date │ int8 │ int64 │
├─────────┼─────────────────────────┼──────────────┼──────┼───────┤
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 1 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 2 │ 82 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 3 │ 72 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 4 │ 77 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 5 │ 87 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 6 │ 94 │
│ 2 Pac │ Baby Don't Cry (Keep... │ 2000-02-26 │ 7 │ 99 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 1 │ 91 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 2 │ 87 │
│ 2Ge+her │ The Hardest Part Of ... │ 2000-09-02 │ 3 │ 92 │
│ … │ … │ … │ … │ … │
└─────────┴─────────────────────────┴──────────────┴──────┴───────┘
You can use regular expression capture groups to extract multiple variables stored in column names
>>> who = ibis.examples.who.fetch()
>>> who
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ new_sp_m014 ┃ new_sp_m1524 ┃ new_sp_m2534 ┃ new_sp_m3544 ┃ new_sp_m4554 ┃ new_sp_m5564 ┃ new_sp_m65 ┃ new_sp_f014 ┃ new_sp_f1524 ┃ new_sp_f2534 ┃ new_sp_f3544 ┃ new_sp_f4554 ┃ new_sp_f5564 ┃ new_sp_f65 ┃ new_sn_m014 ┃ new_sn_m1524 ┃ new_sn_m2534 ┃ new_sn_m3544 ┃ new_sn_m4554 ┃ new_sn_m5564 ┃ new_sn_m65 ┃ new_sn_f014 ┃ new_sn_f1524 ┃ new_sn_f2534 ┃ new_sn_f3544 ┃ new_sn_f4554 ┃ new_sn_f5564 ┃ new_sn_f65 ┃ new_ep_m014 ┃ new_ep_m1524 ┃ new_ep_m2534 ┃ new_ep_m3544 ┃ new_ep_m4554 ┃ new_ep_m5564 ┃ new_ep_m65 ┃ new_ep_f014 ┃ new_ep_f1524 ┃ new_ep_f2534 ┃ new_ep_f3544 ┃ new_ep_f4554 ┃ new_ep_f5564 ┃ new_ep_f65 ┃ newrel_m014 ┃ newrel_m1524 ┃ newrel_m2534 ┃ newrel_m3544 ┃ newrel_m4554 ┃ newrel_m5564 ┃ newrel_m65 ┃ newrel_f014 ┃ newrel_f1524 ┃ newrel_f2534 ┃ newrel_f3544 ┃ newrel_f4554 ┃ newrel_f5564 ┃ newrel_f65 ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├─────────────┼────────┼────────┼───────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┼─────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼────────────┤
│ Afghanistan │ AF │ AFG │ 1980 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1981 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1982 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1983 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1984 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1985 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1986 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1987 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1988 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ Afghanistan │ AF │ AFG │ 1989 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴────────────┘
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ string │ string │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼────────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 2534 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 3544 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 4554 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 5564 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ m │ 65 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 014 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 1524 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ f │ 2534 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴────────┴───────┘
names_transform
is flexible, and can be:
1. A mapping of one or more names in `names_to` to callable
2. A callable that will be applied to every name
Let’s recode gender and age to numeric values using a mapping
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_(.)(.*)" ,
... names_transform= dict (
... gender= {"m" : 1 , "f" : 2 }.get,
... age= dict (
... zip (
... ["014" , "1524" , "2534" , "3544" , "4554" , "5564" , "65" ],
... range (7 ),
... )
... ).get,
... ),
... values_to= "count" ,
... )
┏━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┓
┃ country ┃ iso2 ┃ iso3 ┃ year ┃ diagnosis ┃ gender ┃ age ┃ count ┃
┡━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ string │ int8 │ int8 │ int64 │
├─────────────┼────────┼────────┼───────┼───────────┼────────┼──────┼───────┤
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 2 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 3 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 4 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 5 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 1 │ 6 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 0 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 1 │ NULL │
│ Afghanistan │ AF │ AFG │ 1980 │ sp │ 2 │ 2 │ NULL │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────────┴────────┴────────┴───────┴───────────┴────────┴──────┴───────┘
The number of match groups in names_pattern
must match the length of names_to
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ],
... names_to= ["diagnosis" , "gender" , "age" ],
... names_pattern= "new_?(.*)_.(.*)" ,
... )
---------------------------------------------------------------------------
IbisInputError Traceback (most recent call last)
Cell In[323], line 1
----> 1 who . pivot_longer (
2 s . index [ " new_sp_m014 " : " newrel_f65 " ] ,
3 names_to = [ " diagnosis " , " gender " , " age " ] ,
4 names_pattern = " new_?(.*)_.(.*) " ,
5 )
File ~/work/ibis/ibis/ibis/expr/types/relations.py:3939 , in Table.pivot_longer (self, col, names_to, names_pattern, names_transform, values_to, values_transform)
3937 names_pattern = re. compile(names_pattern)
3938 if (ngroups := names_pattern. groups) != (nnames := len (names_to)):
-> 3939 raise com. IbisInputError(
3940 f " Number of match groups in `names_pattern` "
3941 f " { names_pattern. pattern!r} ( { ngroups: d } groups) doesn ' t "
3942 f " match the length of `names_to` { names_to} (length { nnames: d } ) "
3943 )
3945 if names_transform is None :
3946 names_transform = dict . fromkeys(names_to, toolz. identity)
IbisInputError : Number of match groups in `names_pattern`'new_?(.*)_.(.*)' (2 groups) doesn't match the length of `names_to` ['diagnosis', 'gender', 'age'] (length 3)
names_transform
must be a mapping or callable
>>> who.pivot_longer(
... s.index["new_sp_m014" :"newrel_f65" ], names_transform= "upper"
... ) # quartodoc: +EXPECTED_FAILURE
---------------------------------------------------------------------------
IbisTypeError Traceback (most recent call last)
Cell In[326], line 1
----> 1 who . pivot_longer (
2 s . index [ " new_sp_m014 " : " newrel_f65 " ] , names_transform = " upper "
3 ) # quartodoc: +EXPECTED_FAILURE
File ~/work/ibis/ibis/ibis/expr/types/relations.py:3951 , in Table.pivot_longer (self, col, names_to, names_pattern, names_transform, values_to, values_transform)
3949 names_transform = dict . fromkeys(names_to, names_transform)
3950 else :
-> 3951 raise com. IbisTypeError(
3952 f " `names_transform` must be a mapping or callable. Got { type (names_transform)} "
3953 )
3955 for name in names_to:
3956 names_transform. setdefault(name, toolz. identity)
IbisTypeError : `names_transform` must be a mapping or callable. Got <class 'str'>
pivot_wider
pivot_wider(
id_cols= None ,
names_from= 'name' ,
names_prefix= '' ,
names_sep= '_' ,
names_sort= False ,
names= None ,
values_from= 'value' ,
values_fill= None ,
values_agg= 'arbitrary' ,
)
Pivot a table to a wider format.
Parameters
id_cols
s
.Selector
| None
A set of columns that uniquely identify each observation.
None
names_from
str | Iterable [str ] | s
.Selector
An argument describing which column or columns to use to get the name of the output columns.
'name'
names_prefix
str
String added to the start of every column name.
''
names_sep
str
If names_from
or values_from
contains multiple columns, this argument will be used to join their values together into a single string to use as a column name.
'_'
names_sort
bool
If True
columns are sorted. If False
column names are ordered by appearance.
False
names
Iterable [str ] | None
An explicit sequence of values to look for in columns matching names_from
. * When this value is None
, the values will be computed from names_from
. * When this value is not None
, each element’s length must match the length of names_from
. See examples below for more detail.
None
values_from
str | Iterable [str ] | s
.Selector
An argument describing which column or columns to get the cell values from.
'value'
values_fill
int | float | str | ir
.Scalar
| None
A scalar value that specifies what each value should be filled with when missing.
None
values_agg
str | Callable [[ir
.Value
], ir
.Scalar
] | Deferred
A function applied to the value in each cell in the output.
'arbitrary'
Returns
Table
Wider pivoted table
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> from ibis import _
>>> ibis.options.interactive = True
Basic usage
>>> fish_encounters = ibis.examples.fish_encounters.fetch()
>>> fish_encounters
┏━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish ┃ station ┃ seen ┃
┡━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼─────────┼───────┤
│ 4842 │ Release │ 1 │
│ 4842 │ I80_1 │ 1 │
│ 4842 │ Lisbon │ 1 │
│ 4842 │ Rstr │ 1 │
│ 4842 │ Base_TD │ 1 │
│ 4842 │ BCE │ 1 │
│ 4842 │ BCW │ 1 │
│ 4842 │ BCE2 │ 1 │
│ 4842 │ BCW2 │ 1 │
│ 4842 │ MAE │ 1 │
│ … │ … │ … │
└───────┴─────────┴───────┘
>>> fish_encounters.pivot_wider(names_from= "station" , values_from= "seen" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ fish ┃ Lisbon ┃ Base_TD ┃ MAE ┃ Rstr ┃ BCW ┃ BCW2 ┃ Release ┃ I80_1 ┃ BCE ┃ BCE2 ┃ MAW ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼────────┼─────────┼───────┼───────┼───────┼───────┼─────────┼───────┼───────┼───────┼───────┤
│ 4843 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4848 │ 1 │ NULL │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4865 │ 1 │ NULL │ NULL │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4842 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4847 │ 1 │ NULL │ NULL │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4850 │ NULL │ 1 │ NULL │ 1 │ 1 │ NULL │ 1 │ 1 │ 1 │ NULL │ NULL │
│ 4854 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4855 │ 1 │ 1 │ NULL │ 1 │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4864 │ NULL │ NULL │ NULL │ NULL │ NULL │ NULL │ 1 │ 1 │ NULL │ NULL │ NULL │
│ 4844 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴────────┴─────────┴───────┴───────┴───────┴───────┴─────────┴───────┴───────┴───────┴───────┘
You can do simple transpose-like operations using pivot_wider
>>> t = ibis.memtable(dict (outcome= ["yes" , "no" ], counted= [3 , 4 ]))
>>> t
┏━━━━━━━━━┳━━━━━━━━━┓
┃ outcome ┃ counted ┃
┡━━━━━━━━━╇━━━━━━━━━┩
│ string │ int64 │
├─────────┼─────────┤
│ yes │ 3 │
│ no │ 4 │
└─────────┴─────────┘
>>> t.pivot_wider(names_from= "outcome" , values_from= "counted" , names_sort= True )
┏━━━━━━━┳━━━━━━━┓
┃ no ┃ yes ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 4 │ 3 │
└───────┴───────┘
Fill missing pivoted values using values_fill
>>> fish_encounters.pivot_wider(
... names_from= "station" , values_from= "seen" , values_fill= 0
... )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓
┃ fish ┃ Rstr ┃ BCW ┃ BCW2 ┃ MAW ┃ Release ┃ I80_1 ┃ BCE ┃ BCE2 ┃ Lisbon ┃ Base_TD ┃ MAE ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼───────┼───────┼─────────┼───────┼───────┼───────┼────────┼─────────┼───────┤
│ 4842 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4847 │ 0 │ 0 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │
│ 4850 │ 1 │ 1 │ 0 │ 0 │ 1 │ 1 │ 1 │ 0 │ 0 │ 1 │ 0 │
│ 4854 │ 0 │ 0 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4855 │ 1 │ 0 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 1 │ 1 │ 0 │
│ 4864 │ 0 │ 0 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4851 │ 0 │ 0 │ 0 │ 0 │ 1 │ 1 │ 0 │ 0 │ 0 │ 0 │ 0 │
│ 4857 │ 1 │ 1 │ 1 │ 0 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 0 │
│ 4858 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 4862 │ 1 │ 1 │ 1 │ 0 │ 1 │ 1 │ 1 │ 1 │ 1 │ 1 │ 0 │
│ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │ … │
└───────┴───────┴───────┴───────┴───────┴─────────┴───────┴───────┴───────┴────────┴─────────┴───────┘
Compute multiple values columns
>>> us_rent_income = ibis.examples.us_rent_income.fetch()
>>> us_rent_income
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ geoid ┃ name ┃ variable ┃ estimate ┃ moe ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │
├────────┼────────────┼──────────┼──────────┼───────┤
│ 01 │ Alabama │ income │ 24476 │ 136 │
│ 01 │ Alabama │ rent │ 747 │ 3 │
│ 02 │ Alaska │ income │ 32940 │ 508 │
│ 02 │ Alaska │ rent │ 1200 │ 13 │
│ 04 │ Arizona │ income │ 27517 │ 148 │
│ 04 │ Arizona │ rent │ 972 │ 4 │
│ 05 │ Arkansas │ income │ 23789 │ 165 │
│ 05 │ Arkansas │ rent │ 709 │ 5 │
│ 06 │ California │ income │ 29454 │ 109 │
│ 06 │ California │ rent │ 1358 │ 3 │
│ … │ … │ … │ … │ … │
└────────┴────────────┴──────────┴──────────┴───────┘
>>> us_rent_income.pivot_wider(
... names_from= "variable" , values_from= ["estimate" , "moe" ]
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate_income ┃ moe_income ┃ estimate_rent ┃ moe_rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 06 │ California │ 29454 │ 109 │ 1358 │ 3 │
│ 13 │ Georgia │ 27024 │ 106 │ 927 │ 3 │
│ 15 │ Hawaii │ 32453 │ 218 │ 1507 │ 18 │
│ 16 │ Idaho │ 25298 │ 208 │ 792 │ 7 │
│ 30 │ Montana │ 26249 │ 206 │ 751 │ 9 │
│ 38 │ North Dakota │ 32336 │ 245 │ 775 │ 9 │
│ 39 │ Ohio │ 27435 │ 94 │ 764 │ 2 │
│ 40 │ Oklahoma │ 26207 │ 101 │ 766 │ 3 │
│ 47 │ Tennessee │ 25453 │ 102 │ 808 │ 4 │
│ … │ … │ … │ … │ … │ … │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘
The column name separator can be changed using the names_sep
parameter
>>> us_rent_income.pivot_wider(
... names_from= "variable" ,
... names_sep= "." ,
... values_from= ("estimate" , "moe" ),
... )
┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ geoid ┃ name ┃ estimate.income ┃ moe.income ┃ estimate.rent ┃ moe.rent ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ string │ int64 │ int64 │ int64 │ int64 │
├────────┼──────────────┼─────────────────┼────────────┼───────────────┼──────────┤
│ 05 │ Arkansas │ 23789 │ 165 │ 709 │ 5 │
│ 06 │ California │ 29454 │ 109 │ 1358 │ 3 │
│ 13 │ Georgia │ 27024 │ 106 │ 927 │ 3 │
│ 15 │ Hawaii │ 32453 │ 218 │ 1507 │ 18 │
│ 16 │ Idaho │ 25298 │ 208 │ 792 │ 7 │
│ 30 │ Montana │ 26249 │ 206 │ 751 │ 9 │
│ 38 │ North Dakota │ 32336 │ 245 │ 775 │ 9 │
│ 39 │ Ohio │ 27435 │ 94 │ 764 │ 2 │
│ 40 │ Oklahoma │ 26207 │ 101 │ 766 │ 3 │
│ 47 │ Tennessee │ 25453 │ 102 │ 808 │ 4 │
│ … │ … │ … │ … │ … │ … │
└────────┴──────────────┴─────────────────┴────────────┴───────────────┴──────────┘
Supply an alternative function to summarize values
>>> warpbreaks = ibis.examples.warpbreaks.fetch().select("wool" , "tension" , "breaks" )
>>> warpbreaks
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ wool ┃ tension ┃ breaks ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ string │ string │ int64 │
├────────┼─────────┼────────┤
│ A │ L │ 26 │
│ A │ L │ 30 │
│ A │ L │ 54 │
│ A │ L │ 25 │
│ A │ L │ 70 │
│ A │ L │ 52 │
│ A │ L │ 51 │
│ A │ L │ 26 │
│ A │ L │ 67 │
│ A │ M │ 18 │
│ … │ … │ … │
└────────┴─────────┴────────┘
>>> warpbreaks.pivot_wider(
... names_from= "wool" , values_from= "breaks" , values_agg= "mean"
... ).select("tension" , "A" , "B" ).order_by("tension" )
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼───────────┼───────────┤
│ H │ 24.555556 │ 18.777778 │
│ L │ 44.555556 │ 28.222222 │
│ M │ 24.000000 │ 28.777778 │
└─────────┴───────────┴───────────┘
Passing Deferred
objects to values_agg
is supported
>>> warpbreaks.pivot_wider(
... names_from= "tension" ,
... values_from= "breaks" ,
... values_agg= _.sum (),
... ).select("wool" , "H" , "L" , "M" ).order_by(s.all ())
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ wool ┃ H ┃ L ┃ M ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ int64 │ int64 │ int64 │
├────────┼───────┼───────┼───────┤
│ A │ 221 │ 401 │ 216 │
│ B │ 169 │ 254 │ 259 │
└────────┴───────┴───────┴───────┘
Use a custom aggregate function
>>> warpbreaks.pivot_wider(
... names_from= "wool" ,
... values_from= "breaks" ,
... values_agg= lambda col: col.std() / col.mean(),
... ).select("tension" , "A" , "B" ).order_by("tension" )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ tension ┃ A ┃ B ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ string │ float64 │ float64 │
├─────────┼──────────┼──────────┤
│ H │ 0.418344 │ 0.260590 │
│ L │ 0.406183 │ 0.349325 │
│ M │ 0.360844 │ 0.327719 │
└─────────┴──────────┴──────────┘
Generate some random data, setting the random seed for reproducibility
>>> import random
>>> random.seed(0 )
>>> raw = ibis.memtable(
... [
... dict (
... product= product,
... country= country,
... year= year,
... production= random.random(),
... )
... for product in "AB"
... for country in ["AI" , "EI" ]
... for year in range (2000 , 2015 )
... ]
... )
>>> production = raw.filter (((_.product == "A" ) & (_.country == "AI" )) | (_.product == "B" ))
>>> production.order_by(s.all ())
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ product ┃ country ┃ year ┃ production ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ string │ string │ int64 │ float64 │
├─────────┼─────────┼───────┼────────────┤
│ A │ AI │ 2000 │ 0.844422 │
│ A │ AI │ 2001 │ 0.757954 │
│ A │ AI │ 2002 │ 0.420572 │
│ A │ AI │ 2003 │ 0.258917 │
│ A │ AI │ 2004 │ 0.511275 │
│ A │ AI │ 2005 │ 0.404934 │
│ A │ AI │ 2006 │ 0.783799 │
│ A │ AI │ 2007 │ 0.303313 │
│ A │ AI │ 2008 │ 0.476597 │
│ A │ AI │ 2009 │ 0.583382 │
│ … │ … │ … │ … │
└─────────┴─────────┴───────┴────────────┘
Pivoting with multiple name columns
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ B_AI ┃ B_EI ┃ A_AI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2002 │ 0.260492 │ 0.567511 │ 0.420572 │
│ 2005 │ 0.014042 │ 0.803179 │ 0.404934 │
│ 2013 │ 0.243911 │ 0.706561 │ 0.755804 │
│ 2000 │ 0.477010 │ 0.870471 │ 0.844422 │
│ 2010 │ 0.001143 │ 0.932834 │ 0.908113 │
│ 2014 │ 0.325204 │ 0.547441 │ 0.618369 │
│ 2004 │ 0.548699 │ 0.967540 │ 0.511275 │
│ 2001 │ 0.865310 │ 0.191067 │ 0.757954 │
│ 2003 │ 0.805028 │ 0.238616 │ 0.258917 │
│ 2009 │ 0.668153 │ 0.507941 │ 0.583382 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
Select a subset of names. This call incurs no computation when constructing the expression.
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... names= [("A" , "AI" ), ("B" , "AI" )],
... values_from= "production" ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │
├───────┼──────────┼──────────┤
│ 2004 │ 0.511275 │ 0.548699 │
│ 2006 │ 0.783799 │ 0.719705 │
│ 2007 │ 0.303313 │ 0.398824 │
│ 2008 │ 0.476597 │ 0.824845 │
│ 2011 │ 0.504687 │ 0.493578 │
│ 2002 │ 0.420572 │ 0.260492 │
│ 2005 │ 0.404934 │ 0.014042 │
│ 2001 │ 0.757954 │ 0.865310 │
│ 2003 │ 0.258917 │ 0.805028 │
│ 2009 │ 0.583382 │ 0.668153 │
│ … │ … │ … │
└───────┴──────────┴──────────┘
Sort the new columns’ names
>>> production.pivot_wider(
... names_from= ["product" , "country" ],
... values_from= "production" ,
... names_sort= True ,
... )
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ year ┃ A_AI ┃ B_AI ┃ B_EI ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├───────┼──────────┼──────────┼──────────┤
│ 2004 │ 0.511275 │ 0.548699 │ 0.967540 │
│ 2006 │ 0.783799 │ 0.719705 │ 0.447970 │
│ 2007 │ 0.303313 │ 0.398824 │ 0.080446 │
│ 2008 │ 0.476597 │ 0.824845 │ 0.320055 │
│ 2011 │ 0.504687 │ 0.493578 │ 0.109058 │
│ 2000 │ 0.844422 │ 0.477010 │ 0.870471 │
│ 2010 │ 0.908113 │ 0.001143 │ 0.932834 │
│ 2014 │ 0.618369 │ 0.325204 │ 0.547441 │
│ 2001 │ 0.757954 │ 0.865310 │ 0.191067 │
│ 2003 │ 0.258917 │ 0.805028 │ 0.238616 │
│ … │ … │ … │ … │
└───────┴──────────┴──────────┴──────────┘
preview
preview(
max_rows= None ,
max_columns= None ,
max_length= None ,
max_string= None ,
max_depth= None ,
console_width= None ,
)
Return a subset as a Rich Table.
This is an explicit version of what you get when you inspect this object in interactive mode, except with this version you can pass formatting options. The options are the same as those exposed in ibis.options.interactive
.
Parameters
max_rows
int | None
Maximum number of rows to display
None
max_columns
int | None
Maximum number of columns to display
None
max_length
int | None
Maximum length for pretty-printed arrays and maps
None
max_string
int | None
Maximum length for pretty-printed strings
None
max_depth
int | None
Maximum depth for nested data types
None
console_width
int | float | None
Width of the console in characters. If not specified, the width will be inferred from the console.
None
Examples
>>> import ibis
>>> t = ibis.examples.penguins.fetch()
Because the console_width is too small, only 2 columns are shown even though we specified up to 3.
>>> t.preview(
... max_rows= 3 ,
... max_columns= 3 ,
... max_string= 8 ,
... console_width= 30 ,
... )
┏━━━━━━━━━┳━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━╇━━━┩
│ string │ string │ … │
├─────────┼──────────┼───┤
│ Adelie │ Torgers… │ … │
│ Adelie │ Torgers… │ … │
│ Adelie │ Torgers… │ … │
│ … │ … │ … │
└─────────┴──────────┴───┘
relabel
Deprecated in favor of Table.rename
.
relocate
relocate(* columns, before= None , after= None , ** kwargs)
Relocate columns
before or after other specified columns.
Parameters
columns
str | s
.Selector
Columns to relocate. Selectors are accepted.
()
before
str | s
.Selector
| None
A column name or selector to insert the new columns before.
None
after
str | s
.Selector
| None
A column name or selector. Columns in columns
are relocated after the last column selected in after
.
None
kwargs
str
Additional column names to relocate, renaming argument values to keyword argument names.
{}
Returns
Table
A table with the columns relocated.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> import ibis.selectors as s
>>> t = ibis.memtable(dict (a= [1 ], b= [1 ], c= [1 ], d= ["a" ], e= ["a" ], f= ["a" ]))
>>> t
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃ c ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= "c" )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ b ┃ c ┃ a ┃ d ┃ e ┃ f ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ int64 │ string │ string │ string │
├───────┼───────┼───────┼────────┼────────┼────────┤
│ 1 │ 1 │ 1 │ a │ a │ a │
└───────┴───────┴───────┴────────┴────────┴────────┘
>>> t.relocate("f" , before= "b" )
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ f ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ string │ int64 │ int64 │ string │ string │
├───────┼────────┼───────┼───────┼────────┼────────┤
│ 1 │ a │ 1 │ 1 │ a │ a │
└───────┴────────┴───────┴───────┴────────┴────────┘
>>> t.relocate("a" , after= s.last())
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ b ┃ c ┃ d ┃ e ┃ f ┃ a ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ string │ string │ string │ int64 │
├───────┼───────┼────────┼────────┼────────┼───────┤
│ 1 │ 1 │ a │ a │ a │ 1 │
└───────┴───────┴────────┴────────┴────────┴───────┘
Relocate allows renaming
┏━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ ff ┃ a ┃ b ┃ c ┃ d ┃ e ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ string │ int64 │ int64 │ int64 │ string │ string │
├────────┼───────┼───────┼───────┼────────┼────────┤
│ a │ 1 │ 1 │ 1 │ a │ a │
└────────┴───────┴───────┴───────┴────────┴────────┘
You can relocate based on any predicate selector, such as of_type
>>> t.relocate(s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
>>> t.relocate(s.numeric(), after= s.last())
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ d ┃ e ┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ string │ int64 │ int64 │ int64 │
├────────┼────────┼────────┼───────┼───────┼───────┤
│ a │ a │ a │ 1 │ 1 │ 1 │
└────────┴────────┴────────┴───────┴───────┴───────┘
When multiple columns are selected with before
or after
, those selected columns are moved before and after the selectors
input
>>> t = ibis.memtable(dict (a= [1 ], b= ["a" ], c= [1 ], d= ["a" ]))
>>> t.relocate(s.numeric(), after= s.of_type("string" ))
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
>>> t.relocate(s.numeric(), before= s.of_type("string" ))
┏━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ a ┃ c ┃ b ┃ d ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ int64 │ int64 │ string │ string │
├───────┼───────┼────────┼────────┤
│ 1 │ 1 │ a │ a │
└───────┴───────┴────────┴────────┘
When there are duplicate renames in a call to relocate, the last one is preserved
>>> t.relocate(e= "d" , f= "d" )
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ f ┃ a ┃ b ┃ c ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ int64 │ string │ int64 │
├────────┼───────┼────────┼───────┤
│ a │ 1 │ a │ 1 │
└────────┴───────┴────────┴───────┘
However, if there are duplicates that are not part of a rename, the order specified in the relocate call is preserved
>>> t.relocate(
... "b" ,
... s.of_type("string" ), # "b" is a string column, so the selector matches
... )
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ b ┃ d ┃ a ┃ c ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ string │ string │ int64 │ int64 │
├────────┼────────┼───────┼───────┤
│ a │ a │ 1 │ 1 │
└────────┴────────┴───────┴───────┘
rename
rename(method= None , / , ** substitutions)
Rename columns in the table.
Parameters
method
str | Callable [[str ], str | None] | Literal ['snake_case', 'ALL_CAPS'] | Mapping [str , str ] | None
An optional method for renaming columns. May be one of: - A format string to use to rename all columns, like "prefix_{name}"
. - A function from old name to new name. If the function returns None
the old name is used. - The literal strings "snake_case"
or "ALL_CAPS"
to rename all columns using a snake_case
or "ALL_CAPS"`` naming convention respectively. - A mapping from new name to old name. Existing columns not present in the mapping will passthrough with their original name. |
None| | substitutions | [str](
str) | Columns to be explicitly renamed, expressed as
new_name=old_name`keyword arguments. |
`
Returns
Table
A renamed table expression
Examples
>>> import ibis
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> first3 = s.index[:3 ] # first 3 columns
>>> t = ibis.examples.penguins_raw_raw.fetch().select(first3)
>>> t
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ studyName ┃ Sample Number ┃ Species ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 2 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 3 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 4 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 5 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 6 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 7 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 8 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 9 │ Adelie Penguin (Pygoscelis adeliae) │
│ PAL0708 │ 10 │ Adelie Penguin (Pygoscelis adeliae) │
│ … │ … │ … │
└───────────┴───────────────┴─────────────────────────────────────┘
Rename specific columns by passing keyword arguments like `new_name=“old_name”``
>>> t.rename(study_name= "studyName" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name ┃ Sample Number ┃ Species ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using a format string
>>> t.rename("p_ {name} " ).head(1 )
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ p_studyName ┃ p_Sample Number ┃ p_Species ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├─────────────┼─────────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└─────────────┴─────────────────┴─────────────────────────────────────┘
Rename all columns using a snake_case convention
>>> t.rename("snake_case" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ study_name ┃ sample_number ┃ species ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using an ALL_CAPS convention
>>> t.rename("ALL_CAPS" ).head(1 )
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDY_NAME ┃ SAMPLE_NUMBER ┃ SPECIES ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├────────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└────────────┴───────────────┴─────────────────────────────────────┘
Rename all columns using a callable
>>> t.rename(str .upper).head(1 )
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ STUDYNAME ┃ SAMPLE NUMBER ┃ SPECIES ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │ string │
├───────────┼───────────────┼─────────────────────────────────────┤
│ PAL0708 │ 1 │ Adelie Penguin (Pygoscelis adeliae) │
└───────────┴───────────────┴─────────────────────────────────────┘
rowid
A unique integer per row.
Any further meaning behind this expression is backend dependent. Generally this corresponds to some index into the database storage (for example, SQLite and DuckDB’s rowid
).
For a monotonically increasing row number, see ibis.row_number
.
sample
sample(fraction, * , method= 'row' , seed= None )
Sample a fraction of rows from a table.
Sampling is by definition a random operation. Some backends support specifying a seed
for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.
In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed
set.
Parameters
fraction
float
The percentage of rows to include in the sample, expressed as a float between 0 and 1.
required
method
Literal ['row', 'block']
The sampling method to use. The default is “row”, which includes each row with a probability of fraction
. If method is “block”, some backends may instead sample a fraction of blocks of rows (where “block” is a backend dependent definition), which may be significantly more efficient (at the cost of a less statistically random sample). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.
'row'
seed
int | None
An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]
). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.
None
Returns
Table
The input table, with fraction
of rows selected.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x" : [1 , 2 , 3 , 4 ], "y" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
│ 4 │ d │
└───────┴────────┘
Sample approximately half the rows, with a seed specified for reproducibility.
>>> t.sample(0.5 , seed= 1234 )
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 2 │ b │
│ 3 │ c │
└───────┴────────┘
schema
Return the Schema for this table.
Returns
Schema
The table’s schema.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t.schema()
ibis.Schema {
species string
island string
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm int64
body_mass_g int64
sex string
year int64
}
select
select(* exprs, ** named_exprs)
Compute a new table expression using exprs
and named_exprs
.
Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs
is reserved and cannot be used to name an expression. This behavior will be removed in v4.
Parameters
exprs
ir
.Value
| str | Iterable [ir
.Value
| str ]
Column expression, string, or list of column expressions and strings.
()
named_exprs
ir
.Value
| str
Column expressions
{}
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190 │ 3650 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181 │ 3625 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195 │ 4675 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193 │ 3475 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190 │ 4250 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Simple projection
>>> t.select("island" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
In that simple case, you could also just use python’s indexing syntax
>>> t[["island" , "bill_length_mm" ]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
Projection by zero-indexed column position
>>> t.select(t[0 ], t[4 ]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────────┤
│ Adelie │ 181 │
│ Adelie │ 186 │
│ Adelie │ 195 │
│ Adelie │ NULL │
│ Adelie │ 193 │
└─────────┴───────────────────┘
Projection with renaming and compute in one call
>>> t.select(next_year= t.year + 1 ).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
You can do the same thing with a named expression, and using the deferred API
>>> from ibis import _
>>> t.select((_.year + 1 ).name("next_year" )).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
Projection with aggregation expressions
>>> t.select("island" , bill_mean= t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼───────────┤
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
└───────────┴───────────┘
Projection with a selector
>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~ s.cols("year" )).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ int64 │ int64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 39.1 │ 18.7 │ 181 │ 3750 │
│ 39.5 │ 17.4 │ 186 │ 3800 │
│ 40.3 │ 18.0 │ 195 │ 3250 │
│ NULL │ NULL │ NULL │ NULL │
│ 36.7 │ 19.3 │ 193 │ 3450 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
Projection + aggregation across multiple columns
>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~ s.cols("year" ), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
sql
Run a SQL query against a table expression.
Parameters
query
str
Query string
required
dialect
str | None
Optional string indicating the dialect of query
. Defaults to the backend’s native dialect.
None
Returns
Table
An opaque table expression
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ Biscoe │ 45.257485 │
│ Dream │ 44.167742 │
│ Torgersen │ 38.950980 │
└───────────┴─────────────────┘
Mix and match ibis expressions with SQL queries
>>> t = ibis.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr = expr.mutate(
... island= _.island.lower(),
... avg_bill_length= _.avg_bill_length.round (1 ),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
│ torgersen │ 39.0 │
└───────────┴─────────────────┘
Because ibis expressions aren’t named, they aren’t visible to subsequent .sql
calls. Use the alias
method to assign a name to an expression.
>>> expr.alias("b" ).sql("SELECT * FROM b WHERE avg_bill_length > 40" )
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
└────────┴─────────────────┘
to_array
Deprecated - use as_scalar
instead.
to_csv
to_csv(path, * , params= None , ** kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.csv.CSVWriter.html for details.
Parameters
path
str | Path
The data source. A string or Path to the CSV file.
required
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.csv.CSVWriter
{}
to_delta
to_delta(path, * , params= None , ** kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
path
str | Path
The data source. A string or Path to the Delta Lake table directory.
required
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to deltalake.writer.write_deltalake method
{}
to_pandas
Convert a table expression to a pandas DataFrame.
Parameters
kwargs
Same as keyword arguments to execute
{}
to_pandas_batches
to_pandas_batches(limit= None , params= None , chunk_size= 1000000 , ** kwargs)
Execute expression and return an iterator of pandas DataFrames.
This method is eager and will execute the associated expression immediately.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
params
Mapping [ir
.Value
, Any ] | None
Mapping of scalar parameter expressions to value.
None
chunk_size
int
Maximum number of rows in each returned DataFrame
.
1000000
kwargs
Any
Keyword arguments
{}
to_parquet
to_parquet(path, * , params= None , ** kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html for details.
Parameters
path
str | Path
The data source. A string or Path to the parquet file.
required
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.parquet.ParquetWriter
{}
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> penguins.to_parquet(tempfile.mktemp())
Partition on a single column.
>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by= "year" )
Partition on multiple columns.
>>> penguins.to_parquet(tempfile.mkdtemp(), partition_by= ("year" , "island" ))
to_parquet_dir
to_parquet_dir(directory, * , params= None , ** kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
See https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html for details.
Parameters
directory
str | Path
The data target. A string or Path to the directory where the parquet file will be written.
required
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
**kwargs
Any
Additional keyword arguments passed to pyarrow.dataset.write_dataset
{}
to_polars
to_polars(params= None , limit= None , ** kwargs)
Execute expression and return results as a polars dataframe.
This method is eager and will execute the associated expression immediately.
Parameters
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
kwargs
Any
Keyword arguments
{}
Returns
DataFrame
A polars dataframe holding the results of the executed expression.
to_pyarrow
to_pyarrow(params= None , limit= None , ** kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
params
Mapping [ir
.Scalar
, Any ] | None
Mapping of scalar parameter expressions to value.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
kwargs
Any
Keyword arguments
{}
Returns
Table
A pyarrow table holding the results of the executed expression.
to_pyarrow_batches
to_pyarrow_batches(limit= None , params= None , chunk_size= 1000000 , ** kwargs)
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
params
Mapping [ir
.Value
, Any ] | None
Mapping of scalar parameter expressions to value.
None
chunk_size
int
Maximum number of rows in each returned record batch.
1000000
kwargs
Any
Keyword arguments
{}
Returns
results
RecordBatchReader
to_torch
to_torch(params= None , limit= None , ** kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
params
Mapping [ir
.Scalar
, Any ] | None
Parameters to substitute into the expression.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None
means no limit.
None
kwargs
Any
Keyword arguments passed into the backend’s to_torch
implementation.
{}
Returns
dict [str , torch
.Tensor
]
A dictionary of torch tensors, keyed by column name.
try_cast
Cast the columns of a table.
If the cast fails for a row, the value is returned as NULL
or NaN
depending on backend behavior.
Parameters
schema
SchemaLike
Mapping, schema or iterable of pairs to use for casting
required
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["1" , "2" , "3" ], "b" : ["2.2" , "3.3" , "book" ]})
>>> t.try_cast({"a" : "int" , "b" : "float" })
┏━━━━━━━┳━━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━━┩
│ int64 │ float64 │
├───────┼─────────┤
│ 1 │ 2.2 │
│ 2 │ 3.3 │
│ 3 │ NULL │
└───────┴─────────┘
unbind
Return an expression built on UnboundTable
instead of backend-specific objects.
Examples
>>> import ibis
>>> import pandas as pd
>>> duckdb_con = ibis.duckdb.connect ()
>>> polars_con = ibis.polars.connect ()
>>> for backend in (duckdb_con, polars_con):
... t = backend.create_table("t" , pd.DataFrame({"a" : [1 , 2 , 3 ]}))
>>> bound_table = duckdb_con.table("t" )
>>> bound_table.get_backend().name
>>> unbound_table = bound_table.unbind()
>>> polars_con.execute(unbound_table)
union
union(table, * rest, distinct= False )
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
False
Returns
Table
A new table containing the union of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2) # union all by default
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2, distinct= True ).order_by("a" )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
unnest
unnest(column, offset= None , keep_empty= False )
Unnest an array column
from a table.
When unnesting an existing column the newly unnested column replaces the existing column.
Parameters
column
Array column to unnest.
required
offset
str | None
Name of the resulting index column.
None
keep_empty
bool
Keep empty array values as NULL
in the output table, as well as existing NULL
values.
False
Returns
Table
Table with the array column column
unnested.
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
Construct a table expression with an array column.
>>> t = ibis.memtable({"x" : [[1 , 2 ], [], None , [3 , 4 , 5 ]], "y" : [1 , 2 , 3 , 4 ]})
>>> t
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ array<int64> │ int64 │
├────────────────┼───────┤
│ [ 1 , 2 ] │ 1 │
│ [] │ 2 │
│ NULL │ 3 │
│ [ 3 , 4 , ... +1 ] │ 4 │
└────────────────┴───────┘
Unnest the array column x
, replacing the existing x
column.
┏━━━━━━━┳━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 1 │ 1 │
│ 2 │ 1 │
│ 3 │ 4 │
│ 4 │ 4 │
│ 5 │ 4 │
└───────┴───────┘
Unnest the array column x
with an offset. The offset
parameter is the name of the resulting index column.
>>> t.unnest(t.x, offset= "idx" )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ x ┃ y ┃ idx ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │
├───────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 1 │ 1 │
│ 3 │ 4 │ 0 │
│ 4 │ 4 │ 1 │
│ 5 │ 4 │ 2 │
└───────┴───────┴───────┘
Unnest the array column x
keep empty array values as NULL
in the output table.
>>> t.unnest(_.x, offset= "idx" , keep_empty= True )
┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ x ┃ y ┃ idx ┃
┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │
├───────┼───────┼───────┤
│ 1 │ 1 │ 0 │
│ 2 │ 1 │ 1 │
│ 3 │ 4 │ 0 │
│ 4 │ 4 │ 1 │
│ 5 │ 4 │ 2 │
│ NULL │ 2 │ NULL │
│ NULL │ 3 │ NULL │
└───────┴───────┴───────┘
If you need to preserve the row order of the preserved empty arrays or null values use row_number
to create an index column before calling unnest
.
>>> (
... t.mutate(original_row= ibis.row_number())
... .unnest("x" , offset= "idx" , keep_empty= True )
... .relocate("original_row" )
... .order_by("original_row" )
... )
┏━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ original_row ┃ x ┃ y ┃ idx ┃
┡━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │
├──────────────┼───────┼───────┼───────┤
│ 0 │ 1 │ 1 │ 0 │
│ 0 │ 2 │ 1 │ 1 │
│ 1 │ NULL │ 2 │ NULL │
│ 2 │ NULL │ 3 │ NULL │
│ 3 │ 3 │ 4 │ 0 │
│ 3 │ 4 │ 4 │ 1 │
│ 3 │ 5 │ 4 │ 2 │
└──────────────┴───────┴───────┴───────┘
You can also unnest more complex expressions, and the resulting column will be projected as the last expression in the result.
>>> t.unnest(_.x.map (lambda v: v + 1 ).name("plus_one" ))
┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ x ┃ y ┃ plus_one ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ array<int64> │ int64 │ int64 │
├────────────────┼───────┼──────────┤
│ [ 1 , 2 ] │ 1 │ 2 │
│ [ 1 , 2 ] │ 1 │ 3 │
│ [ 3 , 4 , ... +1 ] │ 4 │ 4 │
│ [ 3 , 4 , ... +1 ] │ 4 │ 5 │
│ [ 3 , 4 , ... +1 ] │ 4 │ 6 │
└────────────────┴───────┴──────────┘
unpack
Project the struct fields of each of columns
into self
.
Existing fields are retained in the projection.
Parameters
columns
str
String column names to project into self
.
()
Returns
Table
The child table with struct fields of each of columns
projected.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> lines = '''
... {"name": "a", "pos": {"lat": 10.1, "lon": 30.3 }}
... {"name": "b", "pos": {"lat": 10.2, "lon": 30.2 }}
... {"name": "c", "pos": {"lat": 10.3, "lon": 30.1 }}
... '''
>>> with open ("/tmp/lines.json" , "w" ) as f:
... nbytes = f.write(lines) # nbytes is unused
>>> t = ibis.read_json("/tmp/lines.json" )
>>> t
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name ┃ pos ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ struct<lat: float64, lon: float64> │
├────────┼────────────────────────────────────┤
│ a │ { 'lat' : 10.1 , 'lon' : 30.3 } │
│ b │ { 'lat' : 10.2 , 'lon' : 30.2 } │
│ c │ { 'lat' : 10.3 , 'lon' : 30.1 } │
└────────┴────────────────────────────────────┘
┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ lat ┃ lon ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ string │ float64 │ float64 │
├────────┼─────────┼─────────┤
│ a │ 10.1 │ 30.3 │
│ b │ 10.2 │ 30.2 │
│ c │ 10.3 │ 30.1 │
└────────┴─────────┴─────────┘
value_counts
Compute a frequency table of this table’s values.
Parameters
name
str | None
The name to use for the frequency column. A suitable name will be automatically generated if not provided.
None
Returns
Table
Frequency table of this table’s values.
Examples
>>> from ibis import examples
>>> ibis.options.interactive = True
>>> t = examples.penguins.fetch()
>>> t.head()
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ int64 │ int64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ 3750 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ 3800 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ 3250 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ 3450 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.year.value_counts(name= "n" ).order_by("year" )
┏━━━━━━━┳━━━━━━━┓
┃ year ┃ n ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 2007 │ 110 │
│ 2008 │ 114 │
│ 2009 │ 120 │
└───────┴───────┘
>>> t[["year" , "island" ]].value_counts().order_by("year" , "island" )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Dream │ 46 │
│ 2007 │ Torgersen │ 20 │
│ 2008 │ Biscoe │ 64 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Torgersen │ 16 │
│ 2009 │ Biscoe │ 60 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Torgersen │ 16 │
└───────┴───────────┴───────────────────┘
view
Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.
visualize
visualize(
format = 'svg' ,
* ,
label_edges= False ,
verbose= False ,
node_attr= None ,
node_attr_getter= None ,
edge_attr= None ,
edge_attr_getter= None ,
)
Visualize an expression as a GraphViz graph in the browser.
Parameters
format
str
Image output format. These are specified by the graphviz
Python library.
'svg'
label_edges
bool
Show operation input names as edge labels
False
verbose
bool
Print the graphviz DOT code to stderr if True
False
node_attr
Mapping [str , str ] | None
Mapping of (attribute, value)
pairs set for all nodes. Options are specified by the graphviz
Python library.
None
node_attr_getter
NodeAttributeGetter
| None
Callback taking a node and returning a mapping of (attribute, value)
pairs for that node. Options are specified by the graphviz
Python library.
None
edge_attr
Mapping [str , str ] | None
Mapping of (attribute, value)
pairs set for all edges. Options are specified by the graphviz
Python library.
None
edge_attr_getter
EdgeAttributeGetter
| None
Callback taking two adjacent nodes and returning a mapping of (attribute, value)
pairs for the edge between those nodes. Options are specified by the graphviz
Python library.
None
Examples
Open the visualization of an expression in default browser:
>>> import ibis
>>> import ibis.expr.operations as ops
>>> left = ibis.table(dict (a= "int64" , b= "string" ), name= "left" )
>>> right = ibis.table(dict (b= "string" , c= "int64" , d= "string" ), name= "right" )
>>> expr = left.inner_join(right, "b" ).select(left.a, b= right.c, c= right.d)
>>> expr.visualize(
... format = "svg" ,
... label_edges= True ,
... node_attr= {"fontname" : "Roboto Mono" , "fontsize" : "10" },
... node_attr_getter= lambda node: isinstance (node, ops.Field) and {"shape" : "oval" },
... edge_attr= {"fontsize" : "8" },
... edge_attr_getter= lambda u, v: isinstance (u, ops.Field) and {"color" : "red" },
... ) # quartodoc: +SKIP