Skip to content

DuckDB

Introduced in v3.0

duckdb >= 0.5.0 requires duckdb-engine >= 0.6.2

If you encounter problems when using duckdb >= 0.5.0 you may need to upgrade duckdb-engine to at least version 0.6.2.

See this issue for more details.

Install

Install ibis and dependencies for the DuckDB backend:

pip install 'ibis-framework[duckdb]'
conda install -c conda-forge ibis-duckdb
mamba install -c conda-forge ibis-duckdb

Connect

API

Create a client by passing in a path to a DuckDB database to ibis.duckdb.connect.

See ibis.backends.duckdb.Backend.do_connect for connection parameter information.

ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.

Connection Parameters

do_connect(database=':memory:', path=None, read_only=False, temp_directory=None, **config)

Create an Ibis client connected to a DuckDB database.

Parameters:

Name Type Description Default
database str | Path

Path to a duckdb database.

':memory:'
path str | Path

Deprecated, use database instead.

None
read_only bool

Whether the database is read-only.

False
temp_directory Path | str | None

Directory to use for spilling to disk. Only set by default for in-memory connections.

None
config Any

DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values.

{}

Examples:

>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>

Backend API

Backend

Bases: BaseAlchemyBackend

Functions

read_csv(source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information.

{}

Returns:

Type Description
ir.Table

The just-registered table

read_in_memory(dataframe, table_name=None)

Register a Pandas DataFrame or pyarrow Table as a table in the current database.

Parameters:

Name Type Description Default
dataframe pd.DataFrame | pa.Table

The data source.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None

Returns:

Type Description
ir.Table

The just-registered table

read_json(source_list, table_name=None, **kwargs)

Read newline-delimited JSON into an ibis table.

This feature requires duckdb>=0.7.0

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

File or list of files

required
table_name str | None

Optional table name

None
kwargs

Additional keyword arguments passed to DuckDB's read_json_auto function

{}

Returns:

Type Description
Table

An ibis table expression

read_parquet(source_list, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | Iterable[str]

The data source(s). May be a path to a file, an iterable of files, or directory of parquet files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information.

{}

Returns:

Type Description
ir.Table

The just-registered table

read_postgres(uri, table_name=None, schema='public')

Register a table from a postgres instance into a DuckDB table.

Parameters:

Name Type Description Default
uri

The postgres URI in form 'postgres://user:password@host:port'

required
table_name str | None

The table to read

None
schema str

PostgreSQL schema where table_name resides

'public'

Returns:

Type Description
ir.Table

The just-registered table.

read_sqlite(path, table_name=None)

Register a table from a SQLite database into a DuckDB table.

Parameters:

Name Type Description Default
path str | Path

The path to the SQLite database

required
table_name str | None

The table to read

None

Returns:

Type Description
ir.Table

The just-registered table.

Examples:

>>> import ibis
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("ci/ibis-testing-data/ibis_testing.db", table_name="diamonds")
>>> t.head().execute()
        carat      cut color clarity  depth  table  price     x     y     z
    0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
    1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
    2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31
    3   0.29  Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63
    4   0.31     Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75
register(source, table_name=None, **kwargs)

Register a data source as a table in the current database.

Parameters:

Name Type Description Default
source str | Path | Any

The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
**kwargs Any

Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information.

{}

Returns:

Type Description
ir.Table

The just-registered table

to_csv(expr, path, *, params=None, header=True, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to CSV.

required
path str | Path

The data source. A string or Path to the CSV file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
header bool

Whether to write the column names as the first line of the CSV file.

True
kwargs Any

DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv.html#parameters

{}
to_parquet(expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters:

Name Type Description Default
expr ir.Table

The ibis expression to execute and persist to parquet.

required
path str | Path

The data source. A string or Path to the parquet file.

required
params Mapping[ir.Scalar, Any] | None

Mapping of scalar parameter expressions to value.

None
kwargs Any

DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet#writing-to-parquet-files for details

{}

Examples:

Write out an expression to a single parquet file.

>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "penguins.parquet")

Write out an expression to a hive-partitioned parquet file.

>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "penguins_hive_dir", partition_by="year")
>>> # partition on multiple columns
>>> con.to_parquet(penguins, "penguins_hive_dir", partition_by=("year", "island"))
to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **_)

Return a stream of record batches.

The returned RecordBatchReader contains a cursor with an unbounded lifetime.

For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.

Parameters:

Name Type Description Default
expr ir.Expr

Ibis expression

required
params Mapping[ir.Scalar, Any] | None

Bound parameters

None
limit int | str | None

Limit the result to this number of rows

None
chunk_size int

DuckDB returns 1024 size batches regardless of what argument is passed.

1000000

Last update: January 4, 2023