pandas
Install
Install Ibis and dependencies for the pandas backend:
Install with the pandas
extra:
pip install 'ibis-framework[pandas]'
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
Install for pandas:
conda install -c conda-forge ibis-pandas
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
Install for pandas:
mamba install -c conda-forge ibis-pandas
And connect:
import ibis
= ibis.pandas.connect() con
- 1
- Adjust connection parameters as needed.
User Defined functions (UDF)
Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.
Elementwise Functions
An element-wise function is a function that takes N rows as input and produces N rows of output. log
, exp
, and floor
are examples of element-wise functions.
Here’s how to define an element-wise function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
return x + 1.0
Reduction Functions
A reduction is a function that takes N rows as input and produces 1 row as output. sum
, mean
and count
are examples of reductions. In the context of a GROUP BY
, reductions produce 1 row of output per group.
Here’s how to define a reduction function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
return 2 * series.mean()
Analytic Functions
An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.
Here’s how to define an analytic function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
return (series - series.mean()) / series.std()
Details of pandas UDFs
- Element-wise provide support for applying your UDF to any combination of scalar values and columns.
- Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
- Analytic functions work in both grouped and non-grouped settings
- The objects you receive as input arguments are either
pandas.Series
or Python/NumPy scalars.
!!! warning “Keyword arguments must be given a default”
Any keyword arguments must be given a default value or the function **will
not work**.
A common Python convention is to set the default value to None
and handle setting it to something not None
in the body of the function.
Using add_one
from above as an example, the following call will receive a pandas.Series
for the x
argument:
import ibis
import pandas as pd
= pd.DataFrame({'a': [1, 2, 3]})
df = ibis.pandas.connect({'df': df})
con = con.table('df')
t = add_one(t.a)
expr expr
And this will receive the int
1:
= add_one(1)
expr expr
Since the pandas backend passes around **kwargs
you can accept **kwargs
in your function:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
return x + 2.0
Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.
For example:
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
if y is None:
= 2.0
y return x + y
pandas.Backend
execute
execute(self, query, params=None, limit='default', **kwargs)
Execute an expression.
to_pyarrow
to_pyarrow(self, expr, params=None, limit=None, **kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr | Ibis expression to export to pyarrow | required |
params |
Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr | Ibis expression to export to pyarrow | required |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
params |
Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
chunk_size |
int | Maximum number of rows in each returned record batch. | 1000000 |
kwargs |
Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
results | RecordBatchReader |