pandas

https://pandas.pydata.org/

Install

Install Ibis and dependencies for the pandas backend:

Install with the pandas extra:

pip install 'ibis-framework[pandas]'

And connect:

import ibis

con = ibis.pandas.connect()
1
Adjust connection parameters as needed.

Install for pandas:

conda install -c conda-forge ibis-pandas

And connect:

import ibis

con = ibis.pandas.connect()
1
Adjust connection parameters as needed.

Install for pandas:

mamba install -c conda-forge ibis-pandas

And connect:

import ibis

con = ibis.pandas.connect()
1
Adjust connection parameters as needed.

User Defined functions (UDF)

Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.

Elementwise Functions

An element-wise function is a function that takes N rows as input and produces N rows of output. log, exp, and floor are examples of element-wise functions.

Here’s how to define an element-wise function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
    return x + 1.0

Reduction Functions

A reduction is a function that takes N rows as input and produces 1 row as output. sum, mean and count are examples of reductions. In the context of a GROUP BY, reductions produce 1 row of output per group.

Here’s how to define a reduction function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
    return 2 * series.mean()

Analytic Functions

An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.

Here’s how to define an analytic function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
    return (series - series.mean()) / series.std()

Details of pandas UDFs

  • Element-wise provide support for applying your UDF to any combination of scalar values and columns.
  • Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
  • Analytic functions work in both grouped and non-grouped settings
  • The objects you receive as input arguments are either pandas.Series or Python/NumPy scalars.
Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function will not work.

A common Python convention is to set the default value to None and handle setting it to something not None in the body of the function.

Using add_one from above as an example, the following call will receive a pandas.Series for the x argument:

import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr

And this will receive the int 1:

expr = add_one(1)
expr

Since the pandas backend passes around **kwargs you can accept **kwargs in your function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
    return x + 2.0

Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.

For example:

import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
    if y is None:
    y = 2.0
    return x + y

pandas.Backend

add_operation

add_operation(self, operation)

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.

compile

compile(self, expr, *args, **kwargs)

Compile an expression.

connect

connect(self, *args, **kwargs)

Connect to the database.

Parameters

Name Type Description Default
*args Mandatory connection parameters, see the docstring of do_connect for details. ()
**kwargs Extra connection parameters, see the docstring of do_connect for details. {}

Notes

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns

Type Description
ibis.backends.BaseBackend An instance of the backend

create_table

create_table(self, name, obj=None, *, schema=None, database=None, temp=None, overwrite=False)

Create a table.

create_view

create_view(self, name, obj, *, database=None, overwrite=False)

Create a new view from an expression.

Parameters

Name Type Description Default
name str Name of the new view. required
obj ibis.ibis.Table An Ibis table expression that will be used to create the view. required
database str | None Name of the database where the view will be created, if not provided the database’s default is used. None
overwrite bool Whether to clobber an existing view with the same name False

Returns

Type Description
Table The view that was created.

disconnect

disconnect(self)

Close the connection to the backend.

drop_table

drop_table(self, name, *, force=False)

Drop a table.

Parameters

Name Type Description Default
name str Name of the table to drop. required
database str | None Name of the database where the table exists, if not the default. None
force bool If False, an exception is raised if the table does not exist. False

drop_view

drop_view(self, name, *, force=False)

Drop a view.

Parameters

Name Type Description Default
name str Name of the view to drop. required
database str | None Name of the database where the view exists, if not the default. None
force bool If False, an exception is raised if the view does not exist. False

execute

execute(self, expr)

Execute an expression.

from_dataframe

from_dataframe(self, df, name='df', client=None)

Construct an ibis table from a pandas DataFrame.

Parameters

Name Type Description Default
df pandas.pandas.DataFrame A pandas DataFrame required
name str The name of the pandas DataFrame 'df'
client ibis.backends.pandas.BasePandasBackend | None Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created None

Returns

Type Description
Table A table expression

get_schema

get_schema(self, table_name, *, database=None)

has_operation

has_operation(cls, operation)

Return whether the backend implements support for operation.

Parameters

Name Type Description Default
operation type[ibis.ibis.Value] A class corresponding to an operation. required

Returns

Type Description
bool Whether the backend implements the operation.

Examples

>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True

list_tables

list_tables(self, like=None, database=None)

Return the list of table names in the current database.

Parameters

Name Type Description Default
like A pattern in Python’s regex format. None
database Unused in the pandas backend. None

Returns

Type Description
list[str] The list of the table names that match the pattern like.

read_csv

read_csv(self, source, table_name=None, **kwargs)

Register a CSV file as a table in the current session.

Parameters

Name Type Description Default
source str | pathlib.pathlib.Path The data source. Can be a local or remote file, pathlike objects also accepted. required
table_name str | None An optional name to use for the created table. This defaults to a generated name. None
**kwargs typing.Any Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more information. {}

Returns

Type Description
ibis.ibis.Table The just-registered table

read_delta

read_delta(self, source, table_name=None, **kwargs)

Register a Delta Lake table in the current database.

Parameters

Name Type Description Default
source str | pathlib.Path The data source. Must be a directory containing a Delta Lake table. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs typing.Any Additional keyword arguments passed to the underlying backend or library. {}

Returns

Type Description
ibis.ibis.Table The just-registered table.

read_json

read_json(self, path, table_name=None, **kwargs)

Register a JSON file as a table in the current backend.

Parameters

Name Type Description Default
path str | pathlib.Path The data source. A string or Path to the JSON file. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs typing.Any Additional keyword arguments passed to the backend loading function. {}

Returns

Type Description
ibis.ibis.Table The just-registered table

read_parquet

read_parquet(self, source, table_name=None, **kwargs)

Register a parquet file as a table in the current session.

Parameters

Name Type Description Default
source str | pathlib.pathlib.Path The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. required
table_name str | None An optional name to use for the created table. This defaults to a generated name. None
**kwargs typing.Any Additional keyword arguments passed to Pandas loading function. See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html for more information. {}

Returns

Type Description
ibis.ibis.Table The just-registered table

reconnect

reconnect(self)

Reconnect to the database already configured with connect.

register_options

register_options(cls)

Register custom backend options.

rename_table

rename_table(self, old_name, new_name)

Rename an existing table.

Parameters

Name Type Description Default
old_name str The old name of the table. required
new_name str The new name of the table. required

table

table(self, name, schema=None)

Construct a table expression.

Parameters

Name Type Description Default
name str Table name required
database tuple[str, str] | str | None Database name If not provided, the current database is used. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). ::: {.callout-note} ## Ibis does not use the word schema to refer to database hierarchy. A collection of tables is referred to as a database. A collection of database is referred to as a catalog. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology. ::: None

Returns

Type Description
Table Table expression

to_csv

to_csv(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Table The ibis expression to execute and persist to CSV. required
path str | pathlib.Path The data source. A string or Path to the CSV file. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
kwargs typing.Any Additional keyword arguments passed to pyarrow.csv.CSVWriter {}
https required

to_delta

to_delta(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Table The ibis expression to execute and persist to Delta Lake table. required
path str | pathlib.Path The data source. A string or Path to the Delta Lake table. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
kwargs typing.Any Additional keyword arguments passed to deltalake.writer.write_deltalake method {}

to_pandas

to_pandas(self, expr, *, params=None, limit=None, **kwargs)

Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.

Note

This method is a wrapper around execute.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to execute. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs typing.Any Keyword arguments {}

to_pandas_batches

to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute an Ibis expression and return an iterator of pandas DataFrames.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to execute. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
chunk_size int Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. 1000000
kwargs typing.Any Keyword arguments {}

Returns

Type Description
collections.abc.Iterator[pandas.pandas.DataFrame] An iterator of pandas DataFrames.

to_parquet

to_parquet(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Table The ibis expression to execute and persist to parquet. required
path str | pathlib.Path The data source. A string or Path to the parquet file. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
**kwargs typing.Any Additional keyword arguments passed to pyarrow.parquet.ParquetWriter {}
https required

to_polars

to_polars(self, expr, *, params=None, limit=None, **kwargs)

Execute expression and return results in as a polars DataFrame.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to export to polars. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs typing.Any Keyword arguments {}

Returns

Type Description
dataframe A polars DataFrame holding the results of the executed expression.

to_pyarrow

to_pyarrow(self, expr, params=None, limit=None, **kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to export to pyarrow required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs typing.Any Keyword arguments {}

Returns

Type Description
Table A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute expression and return a RecordBatchReader.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to export to pyarrow required
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
chunk_size int Maximum number of rows in each returned record batch. 1000000
kwargs typing.Any Keyword arguments {}

Returns

Type Description
results RecordBatchReader

to_torch

to_torch(self, expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to execute. required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Parameters to substitute into the expression. None
limit int | str | None An integer to effect a specific row limit. A value of None means no limit. None
kwargs typing.Any Keyword arguments passed into the backend’s to_torch implementation. {}

Returns

Type Description
dict[str, torch.torch.Tensor] A dictionary of torch tensors, keyed by column name.
Back to top