DuckDB

Connect to DuckDB with Ibis, the portable dataframe library

https://duckdb.org

Install

Install Ibis and dependencies for the DuckDB backend:

Install with the duckdb extra:

pip install 'ibis-framework[duckdb]'

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Install for DuckDB:

conda install -c conda-forge ibis-duckdb

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Install for DuckDB:

mamba install -c conda-forge ibis-duckdb

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Connect

ibis.duckdb.connect

Connect to an in-memory database:

con = ibis.duckdb.connect()

Connect to, or create, a local DuckDB file

con = ibis.duckdb.connect("mydb.duckdb")
Note

ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.

Connection Parameters

do_connect

do_connect(self, database=':memory:', read_only=False, temp_directory=None, extensions=None, **config)

Create an Ibis client connected to a DuckDB database.

Parameters
Name Type Description Default
database str | Path Path to a duckdb database. ':memory:'
read_only bool Whether the database is read-only. False
temp_directory str | Path | None Directory to use for spilling to disk. Only set by default for in-memory connections. None
extensions Sequence[str] | None A list of duckdb extensions to install/load upon connection. None
config Any DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. {}
Examples
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>

ibis.connect URL format

In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly formatted DuckDB connection URL to ibis.connect

import ibis

con = ibis.connect("duckdb://local.ddb")

Without an empty path, ibis.connect will connect to an ephemeral, in-memory database.

con = ibis.connect("duckdb://")

MotherDuck

The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md: or motherduck:. ibis will trigger the authentication prompt in-browser.

import ibis

con = ibis.duckdb.connect("md:")
Note

Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables)

con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS


Token successfully retrieved ✅
You can store it as an environment variable to avoid having to log in again:
  $ export motherduck_token='****************'

['penguins']

duckdb.Backend

attach_sqlite

attach_sqlite(self, path, overwrite=False, all_varchar=False)

Attach a SQLite database to the current DuckDB session.

Parameters

Name Type Description Default
path str | Path The path to the SQLite database. required
overwrite bool Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. False
all_varchar bool Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. False

Examples

>>> import ibis
>>> import sqlite3
>>> with sqlite3.connect("/tmp/attach_sqlite.db") as con:
...     con.execute("DROP TABLE IF EXISTS t")
...     con.execute("CREATE TABLE t (a INT, b TEXT)")
...     con.execute(
...         "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...     )
...
<...>
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]
>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
['t']

create_schema

create_schema(self, name, database=None, force=False)

Create a schema named name in database.

Parameters

Name Type Description Default
name str Name of the schema to create. required
database str | None Name of the database in which to create the schema. If None, the current database is used. None
force bool If False, an exception is raised if the schema exists. False

drop_schema

drop_schema(self, name, database=None, force=False)

Drop the schema with name in database.

Parameters

Name Type Description Default
name str Name of the schema to drop. required
database str | None Name of the database to drop the schema from. If None, the current database is used. None
force bool If False, an exception is raised if the schema does not exist. False

drop_table

drop_table(self, name, database=None, force=False)

Drop a table.

Parameters

Name Type Description Default
name str Table to drop required
database str | None Database to drop table from None
force bool Check for existence before dropping False

fetch_from_cursor

fetch_from_cursor(self, cursor, schema)

list_databases

list_databases(self, like=None)

list_schemas

list_schemas(self, like=None, database=None)

List existing schemas in the current connection.

Parameters

Name Type Description Default
like str | None A pattern in Python’s regex format to filter returned schema names. None
database str | None The database to list schemas from. If None, the current database is searched. None

Returns

Type Description
list[str] The schema names that exist in the current connection, that match the like pattern if provided.

list_tables

list_tables(self, like=None, database=None, schema=None)

List tables and views.

Parameters

Name Type Description Default
like str | None Regex to filter by table/view name. None
database str | None Database name. If not passed, uses the current database. Only supported with MotherDuck. None
schema str | None Schema name. If not passed, uses the current schema. None

Returns

Type Description
list[str] List of table and view names.

Examples

>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
['foo']
>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
['bar', 'foo']
>>> con.create_schema("my_schema")
>>> con.list_tables(schema="my_schema")
[]
>>> with con.begin() as c:
...     c.exec_driver_sql(
...         "CREATE TABLE my_schema.baz (a INTEGER)"
...     )
...
<...>
>>> con.list_tables(schema="my_schema")
['baz']

load_extension

load_extension(self, extension)

Install and load a duckdb extension by name or path.

Parameters

Name Type Description Default
extension str The extension name or path. required

read_csv

read_csv(self, source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters

Name Type Description Default
source_list str | list[str] | tuple[str] The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. {}

Returns

Type Description
ir.Table The just-registered table

read_delta

read_delta(self, source_table, table_name=None, **kwargs)

Register a Delta Lake table as a table in the current database.

Parameters

Name Type Description Default
source_table str The data source. Must be a directory containing a Delta Lake table. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to deltalake.DeltaTable. {}

Returns

Type Description
ir.Table The just-registered table.

read_in_memory

read_in_memory(self, source, table_name=None)

Register a Pandas DataFrame or pyarrow object as a table in the current database.

Parameters

Name Type Description Default
source pd.DataFrame | pa.Table | pa.RecordBatchReader The data source. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None

Returns

Type Description
ir.Table The just-registered table

read_json

read_json(self, source_list, table_name=None, **kwargs)

Read newline-delimited JSON into an ibis table.

This feature requires duckdb>=0.7.0

Parameters

Name Type Description Default
source_list str | list[str] | tuple[str] File or list of files required
table_name str | None Optional table name None
**kwargs Additional keyword arguments passed to DuckDB’s read_json_auto function {}

Returns

Type Description
Table An ibis table expression

read_parquet

read_parquet(self, source_list, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters

Name Type Description Default
source_list str | Iterable[str] The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. {}

Returns

Type Description
ir.Table The just-registered table

read_postgres

read_postgres(self, uri, table_name=None, schema='public')

Register a table from a postgres instance into a DuckDB table.

Parameters

Name Type Description Default
uri str A postgres URI of the form postgres://user:password@host:port required
table_name str | None The table to read None
schema str PostgreSQL schema where table_name resides 'public'

Returns

Type Description
ir.Table The just-registered table.

read_sqlite

read_sqlite(self, path, table_name=None)

Register a table from a SQLite database into a DuckDB table.

Parameters

Name Type Description Default
path str | Path The path to the SQLite database required
table_name str | None The table to read None

Returns

Type Description
ir.Table The just-registered table.

Examples

>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> with sqlite3.connect("/tmp/sqlite.db") as con:
...     con.execute("DROP TABLE IF EXISTS t")
...     con.execute("CREATE TABLE t (a INT, b TEXT)")
...     con.execute(
...         "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...     )
...
<...>
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ a      │
2 │ b      │
3 │ c      │
└───────┴────────┘

register

register(self, source, table_name=None, **kwargs)

Register a data source as a table in the current database.

Parameters

Name Type Description Default
source str | Path | Any The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. {}

Returns

Type Description
ir.Table The just-registered table

to_csv

to_csv(self, expr, path, *, params=None, header=True, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Table The ibis expression to execute and persist to CSV. required
path str | Path The data source. A string or Path to the CSV file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
header bool Whether to write the column names as the first line of the CSV file. True
**kwargs Any DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv.html#parameters {}

to_parquet

to_parquet(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Table The ibis expression to execute and persist to parquet. required
path str | Path The data source. A string or Path to the parquet file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet#writing-to-parquet-files for details {}

Examples

Write out an expression to a single parquet file.

>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")

Write out an expression to a hive-partitioned parquet file.

>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)

Partition on a single column.

>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")

Partition on multiple columns.

>>> con.to_parquet(
...     penguins, tempfile.mkdtemp(), partition_by=("year", "island")
... )

to_pyarrow

to_pyarrow(self, expr, *, params=None, limit=None, **_)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to export to pyarrow required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

Returns

Type Description
Table A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **_)

Return a stream of record batches.

The returned RecordBatchReader contains a cursor with an unbounded lifetime.

For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression required
params Mapping[ir.Scalar, Any] | None Bound parameters None
limit int | str | None Limit the result to this number of rows None
chunk_size int ::: {.callout-warning} ## DuckDB returns 1024 size batches regardless of what argument is passed. ::: 1000000

to_torch

to_torch(self, expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to execute. required
params Mapping[ir.Scalar, Any] | None Parameters to substitute into the expression. None
limit int | str | None An integer to effect a specific row limit. A value of None means no limit. None
kwargs Any Keyword arguments passed into the backend’s to_torch implementation. {}

Returns

Type Description
dict[str, torch.Tensor] A dictionary of torch tensors, keyed by column name.
Back to top