Skip to content

Datafusion

New in v2.1

The Datafusion backend is experimental and is subject to backwards incompatible changes.

Install

Install ibis and dependencies for the Datafusion backend:

pip install 'ibis-framework[datafusion]'
conda install -c conda-forge ibis-datafusion
mamba install -c conda-forge ibis-datafusion

Connect

API

Create a client by passing in a dictionary of paths to ibis.datafusion.connect.

See ibis.backends.datafusion.Backend.do_connect for connection parameter information.

ibis.datafusion.connect is a thin wrapper around ibis.backends.datafusion.Backend.do_connect.

Connection Parameters

do_connect(self, config)

Create a Datafusion backend for use with Ibis.

Parameters

config Mapping of table names to files.

Examples

import ibis config = {"t": "path/to/file.parquet", "s": "path/to/file.csv"} ibis.datafusion.connect(config)

Backend API

Backend (BaseBackend)

Attributes

version property readonly

Return the version of the backend engine.

For database servers, return the server version.

For others such as SQLite and pandas return the version of the underlying library or application.

Returns

str The backend version

Methods

compile(self, expr, params=None, **kwargs)

Compile an expression.

current_database(self)

Return the name of the current database.

Backends that don't support different databases will return None.

Returns

str | None Name of the current database.

execute(self, expr, params=None, limit='default', **kwargs)

Execute an expression.

has_operation(operation) classmethod

Return whether the backend implements support for operation.

Parameters

operation A class corresponding to an operation.

Returns

bool Whether the backend implements the operation.

Examples

import ibis import ibis.expr.operations as ops ibis.sqlite.has_operation(ops.ArrayIndex) False ibis.postgres.has_operation(ops.ArrayIndex) True

list_databases(self, like=None)

List existing databases in the current connection.

Parameters

like A pattern in Python's regex format to filter returned database names.

Returns

list[str] The database names that exist in the current connection, that match the like pattern if provided.

list_tables(self, like=None, database=None)

List the available tables.

register_csv(self, name, path, schema=None)

Register a CSV file with with name located at path.

Parameters

name The name of the table path The path to the CSV file schema An optional schema

register_parquet(self, name, path)

Register a parquet file with with name located at path.

Parameters

name The name of the table path The path to the Parquet file

table(self, name, schema=None)

Get an ibis expression representing a DataFusion table.

Parameters

name The name of the table to retreive schema An optional schema for the table

Returns

Table A table expression


Last update: March 1, 2022