Skip to content

Datafusion

New in v2.1

The Datafusion backend is experimental and is subject to backwards incompatible changes.

Install

Install ibis and dependencies for the Datafusion backend:

pip install 'ibis-framework[datafusion]'
conda install -c conda-forge ibis-datafusion
mamba install -c conda-forge ibis-datafusion

Connect

API

Create a client by passing in a dictionary of paths to ibis.datafusion.connect.

See ibis.backends.datafusion.Backend.do_connect for connection parameter information.

ibis.datafusion.connect is a thin wrapper around ibis.backends.datafusion.Backend.do_connect.

Connection Parameters

do_connect(self, config)

Create a Datafusion backend for use with Ibis.

Parameters:

Name Type Description Default
config Mapping[str, str | Path] | df.ExecutionContext

Mapping of table names to files.

required

Examples:

>>> import ibis
>>> config = {"t": "path/to/file.parquet", "s": "path/to/file.csv"}
>>> ibis.datafusion.connect(config)

Backend API

Backend (BaseBackend)

Attributes

version property readonly

Return the version of the backend engine.

For database servers, return the server version.

For others such as SQLite and pandas return the version of the underlying library or application.

Returns:

Type Description
str

The backend version

Methods

add_operation(self, operation) inherited

Add a translation function to the backend for a specific operation.

Operations are defined in ibis.expr.operations, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend. For example, in SQL backends, a NullLiteral operation could be translated to the string "NULL".

Examples:

>>> @ibis.sqlite.add_operation(ibis.expr.operations.NullLiteral)
... def _null_literal(translator, expression):
...     return 'NULL'
compile(self, expr, params=None, **kwargs)

Compile an expression.

connect(self, *args, **kwargs) inherited

Connect to the database.

Parameters:

Name Type Description Default
args None

Connection parameters

()
kwargs None

Additional connection parameters

{}

Returns:

Type Description
BaseBackend

An instance of the backend

create_database(self, name, force=False) inherited

Create a new database.

Not all backends implement this method.

Parameters:

Name Type Description Default
name str

Name of the new database.

required
force bool

If False, an exception is raised if the database already exists.

False
create_table(self, name, obj=None, schema=None, database=None) inherited

Create a new table.

Not all backends implement this method.

Parameters:

Name Type Description Default
name str

Name of the new table.

required
obj pd.DataFrame | ir.TableExpr | None

An Ibis table expression or pandas table that will be used to extract the schema and the data of the new table. If not provided, schema must be given.

None
schema ibis.Schema | None

The schema for the new table. Only one of schema or obj can be provided.

None
database str | None

Name of the database where the table will be created, if not the default.

None
create_view(self, name, expr, database=None) inherited

Create a view.

Parameters:

Name Type Description Default
name str

Name for the new view.

required
expr ir.TableExpr

An Ibis table expression that will be used to extract the query of the view.

required
database str | None

Name of the database where the view will be created, if not the default.

None
current_database(self)

Return the name of the current database.

Backends that don't support different databases will return None.

Returns:

Type Description
str

Name of the current database.

database(self, name=None) inherited

Return a Database object for the name database.

DEPRECATED: database is deprecated; use equivalent methods in the backend

Parameters:

Name Type Description Default
name str | None

Name of the database to return the object for.

None

Returns:

Type Description
Database

A database object for the specified database.

drop_table(self, name, database=None, force=False) inherited

Drop a table.

Parameters:

Name Type Description Default
name str

Name of the table to drop.

required
database str | None

Name of the database where the table exists, if not the default.

None
force bool

If False, an exception is raised if the table does not exist.

False
drop_view(self, name, database=None, force=False) inherited

Drop a view.

Parameters:

Name Type Description Default
name str

Name of the view to drop.

required
database str | None

Name of the database where the view exists, if not the default.

None
force bool

If False, an exception is raised if the view does not exist.

False
execute(self, expr, params=None, limit='default', **kwargs)

Execute an expression.

exists_database(self, name) inherited

Return whether a database name exists in the current connection.

DEPRECATED: exists_database is deprecated as of v2.0; use name in client.list_databases()

Parameters:

Name Type Description Default
name str

Database to check for existence

required

Returns:

Type Description
bool

Whether name exists

exists_table(self, name, database=None) inherited

Return whether a table name exists in the database.

DEPRECATED: exists_table is deprecated as of v2.0; use name in client.list_tables()

Parameters:

Name Type Description Default
name str

Table name

required
database str | None

Database to check if given

None

Returns:

Type Description
bool

Whether name is a table

has_operation(operation) classmethod

Return whether the backend implements support for operation.

Parameters:

Name Type Description Default
operation type[ops.ValueOp]

A class corresponding to an operation.

required

Examples:

>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True

Returns:

Type Description
bool

Whether the backend implements the operation.

list_databases(self, like=None)

List existing databases in the current connection.

Parameters:

Name Type Description Default
like str | None

A pattern in Python's regex format to filter returned database names.

None

Returns:

Type Description
list[str]

The database names that exist in the current connection, that match the like pattern if provided.

list_tables(self, like=None, database=None)

List the available tables.

register_csv(self, name, path, schema=None)

Register a CSV file with with name located at path.

Parameters:

Name Type Description Default
name str

The name of the table

required
path str | Path

The path to the CSV file

required
schema sch.Schema | None

An optional schema

None
register_parquet(self, name, path)

Register a parquet file with with name located at path.

Parameters:

Name Type Description Default
name str

The name of the table

required
path str | Path

The path to the Parquet file

required
table(self, name, schema=None)

Get an ibis expression representing a DataFusion table.

Parameters:

Name Type Description Default
name str

The name of the table to retreive

required
schema sch.Schema | None

An optional schema for the table

None

Returns:

Type Description
ir.TableExpr

A table expression

verify(self, expr, params=None) inherited

Verify expr is an expression that can be compiled.

DEPRECATED: verify is deprecated as of v2.0; compile and capture TranslationError instead


Last update: March 1, 2022