Skip to content

DuckDB

filebadge

exportbadge

duckdb >= 0.5.0 requires duckdb-engine >= 0.6.2

If you encounter problems when using duckdb >= 0.5.0 you may need to upgrade duckdb-engine to at least version 0.6.2.

See this issue for more details.

Install

Install ibis and dependencies for the DuckDB backend:

pip install 'ibis-framework[duckdb]'
conda install -c conda-forge ibis-duckdb
mamba install -c conda-forge ibis-duckdb

Connect

ibis.duckdb.connect

con = ibis.duckdb.connect()  # (1)
  1. Use an ephemeral, in-memory database
con = ibis.duckdb.connect("mydb.duckdb")  # (1)
  1. Connect to, or create, a local DuckDB file

ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.

Connection Parameters

do_connect(database=':memory:', read_only=False, temp_directory=None, **config)

Create an Ibis client connected to a DuckDB database.

Parameters:

Name Type Description Default
database str | Path

Path to a duckdb database.

':memory:'
read_only bool

Whether the database is read-only.

False
temp_directory Path | str | None

Directory to use for spilling to disk. Only set by default for in-memory connections.

None
config Any

DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values.

{}

Examples:

>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>

ibis.connect URL format

In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly formatted DuckDB connection URL to ibis.connect

con = ibis.connect("duckdb:///path/to/local/file")
con = ibis.connect("duckdb://") # (1)
  1. ephemeral, in-memory database

File Support

read_csv(source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information.

{}

read_parquet(source_list, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | Iterable[str]

The data source(s). May be a path to a file, an iterable of files, or directory of parquet files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any

Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information.

{}

read_json(source_list, table_name=None, **kwargs)

Read newline-delimited JSON into an ibis table.

This feature requires duckdb>=0.7.0

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

File or list of files

required
table_name str | None

Optional table name

None
kwargs

Additional keyword arguments passed to DuckDB's read_json_auto function

{}

read_in_memory(source, table_name=None)

Register a Pandas DataFrame or pyarrow object as a table in the current database.

Parameters:

Name Type Description Default
source pd.DataFrame | pa.Table | pa.RecordBatchReader

The data source.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None

read_sqlite(path, table_name=None)

Register a table from a SQLite database into a DuckDB table.

Parameters:

Name Type Description Default
path str | Path

The path to the SQLite database

required
table_name str | None

The table to read

None

read_postgres(uri, table_name=None, schema='public')

Register a table from a postgres instance into a DuckDB table.

Parameters:

Name Type Description Default
uri

The postgres URI in form 'postgres://user:password@host:port'

required
table_name str | None

The table to read

None
schema str

PostgreSQL schema where table_name resides

'public'

Last update: June 2, 2023