Skip to content

Pandas

Ibis's pandas backend is available in core Ibis.

Install

Install ibis and dependencies for the Pandas backend:

pip install 'ibis-framework'
conda install -c conda-forge ibis-framework
mamba install -c conda-forge ibis-framework

Connect

API

Create a client by passing in a dictionary of paths to ibis.pandas.connect.

See ibis.backends.pandas.Backend.do_connect for connection parameter information.

ibis.pandas.connect is a thin wrapper around ibis.backends.pandas.Backend.do_connect.

Connection Parameters

do_connect(self, dictionary)

Construct a client from a dictionary of pandas DataFrames.

Parameters:

Name Type Description Default
dictionary MutableMapping[str, pd.DataFrame]

Mutable mapping of string table names to pandas DataFrames.

required

Examples:

>>> import ibis
>>> ibis.pandas.connect({"t": pd.DataFrame({"a": [1, 2, 3]})})
<ibis.backends.pandas.Backend at 0x...>

Backend API

Backend (BasePandasBackend)

Attributes

current_database inherited property readonly

Return the name of the current database.

Backends that don't support different databases will return None.

Returns:

Type Description
str | None

Name of the current database.

db_identity: str cached inherited property writable

Return the identity of the database.

Multiple connections to the same database will return the same value for db_identity.

The default implementation assumes connection parameters uniquely specify the database.

Returns:

Type Description
str

Database identity

tables cached inherited property writable

An accessor for tables in the database.

Tables may be accessed by name using either index or attribute access:

Examples:

>>> con = ibis.sqlite.connect("example.db")
>>> people = con.tables['people']  # access via index
>>> people = con.tables.people  # access via attribute
version: str inherited property readonly

Return the version of the backend engine.

For database servers, return the server version.

For others such as SQLite and pandas return the version of the underlying library or application.

Returns:

Type Description
str

The backend version

Classes

Options (BaseModel) inherited pydantic-model
Attributes
enable_trace: bool pydantic-field

Enable tracing for execution.

Methods

add_operation(self, operation) inherited

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend. For example, in SQL backends, a NullLiteral operation could be translated to the string "NULL".

Examples:

>>> @ibis.sqlite.add_operation(ibis.expr.operations.NullLiteral)
... def _null_literal(translator, expression):
...     return 'NULL'
compile(self, expr, *args, **kwargs) inherited

Compile an expression.

connect(self, *args, **kwargs) inherited

Connect to the database.

Parameters:

Name Type Description Default
args None

Connection parameters

()
kwargs None

Additional connection parameters

{}

Returns:

Type Description
BaseBackend

An instance of the backend

create_database(self, name, force=False) inherited

Create a new database.

Not all backends implement this method.

Parameters:

Name Type Description Default
name str

Name of the new database.

required
force bool

If False, an exception is raised if the database already exists.

False
create_table(self, table_name, obj=None, schema=None) inherited

Create a table.

create_view(self, name, expr, database=None) inherited

Create a view.

Parameters:

Name Type Description Default
name str

Name for the new view.

required
expr ir.Table

An Ibis table expression that will be used to extract the query of the view.

required
database str | None

Name of the database where the view will be created, if not the default.

None
database(self, name=None) inherited

Return a Database object for the name database.

DEPRECATED: database is deprecated; use equivalent methods in the backend

Parameters:

Name Type Description Default
name None

Name of the database to return the object for.

None

Returns:

Type Description
Database

A database object for the specified database.

drop_table(self, name, database=None, force=False) inherited

Drop a table.

Parameters:

Name Type Description Default
name str

Name of the table to drop.

required
database str | None

Name of the database where the table exists, if not the default.

None
force bool

If False, an exception is raised if the table does not exist.

False
drop_view(self, name, database=None, force=False) inherited

Drop a view.

Parameters:

Name Type Description Default
name str

Name of the view to drop.

required
database str | None

Name of the database where the view exists, if not the default.

None
force bool

If False, an exception is raised if the view does not exist.

False
execute(self, query, params=None, limit='default', **kwargs)

Execute an expression.

exists_database(self, name) inherited

Return whether a database name exists in the current connection.

DEPRECATED: exists_database is deprecated as of v2.0; use name in client.list_databases()

Parameters:

Name Type Description Default
name str

Database to check for existence

required

Returns:

Type Description
bool

Whether name exists

exists_table(self, name, database=None) inherited

Return whether a table name exists in the database.

DEPRECATED: exists_table is deprecated as of v2.0; use name in client.list_tables()

Parameters:

Name Type Description Default
name str

Table name

required
database str | None

Database to check if given

None

Returns:

Type Description
bool

Whether name is a table

from_dataframe(self, df, name='df', client=None) inherited

Construct an ibis table from a pandas DataFrame.

Parameters:

Name Type Description Default
df pd.DataFrame

A pandas DataFrame

required
name str

The name of the pandas DataFrame

'df'
client BasePandasBackend | None

Client dictionary will be mutated with the name of the DataFrame, if not provided a new client is created

None

Returns:

Type Description
ir.Table

A table expression

list_databases(self, like=None) inherited

List existing databases in the current connection.

Parameters:

Name Type Description Default
like None

A pattern in Python's regex format to filter returned database names.

None

Returns:

Type Description
list[str]

The database names that exist in the current connection, that match the like pattern if provided.

list_tables(self, like=None, database=None) inherited

Return the list of table names in the current database.

For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.

Parameters:

Name Type Description Default
like str

A pattern in Python's regex format.

None
database str

The database to list tables of, if not the current one.

None

Returns:

Type Description
list[str]

The list of the table names that match the pattern like.

table(self, name, schema=None) inherited

Return a table expression from the database.

DEPRECATED: table is deprecated as of v2.0; change the current database before calling .table()

verify(self, expr, params=None) inherited

Verify expr is an expression that can be compiled.

DEPRECATED: verify is deprecated as of v2.0; compile and capture TranslationError instead

User Defined functions (UDF)

Ibis supports defining three kinds of user-defined functions for operations on expressions targeting the pandas backend: element-wise, reduction, and analytic.

Elementwise Functions

An element-wise function is a function that takes N rows as input and produces N rows of output. log, exp, and floor are examples of element-wise functions.

Here's how to define an element-wise function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
    return x + 1.0

Reduction Functions

A reduction is a function that takes N rows as input and produces 1 row as output. sum, mean and count are examples of reductions. In the context of a GROUP BY, reductions produce 1 row of output per group.

Here's how to define a reduction function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
    return 2 * series.mean()

Analytic Functions

An analytic function is like an element-wise function in that it takes N rows as input and produces N rows of output. The key difference is that analytic functions can be applied per group using window functions. Z-score is an example of an analytic function.

Here's how to define an analytic function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
    return (series - series.mean()) / series.std()

Details of Pandas UDFs

  • Element-wise provide support for applying your UDF to any combination of scalar values and columns.
  • Reductions provide support for whole column aggregations, grouped aggregations, and application of your function over a window.
  • Analytic functions work in both grouped and non-grouped settings
  • The objects you receive as input arguments are either pandas.Series or Python/NumPy scalars.

Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function will not work.

A common Python convention is to set the default value to None and handle setting it to something not None in the body of the function.

Using add_one from above as an example, the following call will receive a pandas.Series for the x argument:

import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr

And this will receive the int 1:

expr = add_one(1)
expr

Since the pandas backend passes around **kwargs you can accept **kwargs in your function:

import ibis.expr.datatypes as dt
from ibis.backends.pandas import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
    return x + 2.0

Or you can leave them out as we did in the example above. You can also optionally accept specific keyword arguments.

For example:

import ibis.expr.datatypes as dt
from ibis.backends.pandas import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
    if y is None:
    y = 2.0
    return x + y

Last update: March 1, 2022