DuckDB

Connect to DuckDB with Ibis, the portable dataframe library

https://duckdb.org

Install

Install Ibis and dependencies for the DuckDB backend:

Install with the duckdb extra:

pip install 'ibis-framework[duckdb]'

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Install for DuckDB:

conda install -c conda-forge ibis-duckdb

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Install for DuckDB:

mamba install -c conda-forge ibis-duckdb

And connect:

import ibis

con = ibis.duckdb.connect()
1
Adjust connection parameters as needed.

Connect

ibis.duckdb.connect

Connect to an in-memory database:

con = ibis.duckdb.connect()

Connect to, or create, a local DuckDB file

con = ibis.duckdb.connect("mydb.duckdb")
Note

ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.

Connection Parameters

do_connect

do_connect(self, database=':memory:', read_only=False, temp_directory=None, extensions=None, **config)

Create an Ibis client connected to a DuckDB database.

Parameters
Name Type Description Default
database str | Path Path to a duckdb database. ':memory:'
read_only bool Whether the database is read-only. False
temp_directory str | Path | None Directory to use for spilling to disk. Only set by default for in-memory connections. None
extensions Sequence[str] | None A list of duckdb extensions to install/load upon connection. None
config Any DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. {}
Examples
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>

ibis.connect URL format

In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect:

import ibis

con = ibis.connect("duckdb://local.ddb")

Given an empty path, ibis.connect will connect to an ephemeral, in-memory database.

con = ibis.connect("duckdb://")

MotherDuck

The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md: or motherduck:. ibis will trigger the authentication prompt in-browser.

import ibis

con = ibis.duckdb.connect("md:")
Note

Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables)

con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS


Token successfully retrieved ✅
You can store it as an environment variable to avoid having to log in again:
  $ export motherduck_token='****************'

['penguins']

Geospatial

The DuckDB backend has experimental support for geospatial operations.

To enable geospatial operations, install the geospatial extra or otherwise install the additional required dependencies:

See read_geo for some tips on reading in geospatial data.

pip install 'ibis-framework[geospatial]'
conda install -c conda-forge geopandas 'shapely>=2,<3'
mamba install -c conda-forge geopandas 'shapely>=2,<3'

duckdb.Backend

attach

attach(self, path, name=None, read_only=False)

Attach another DuckDB database to the current DuckDB session.

Parameters

Name Type Description Default
path str | Path Path to the database to attach. required
name str | None Name to attach the database as. Defaults to the basename of path. None
read_only bool Whether to attach the database as read-only. False

attach_sqlite

attach_sqlite(self, path, overwrite=False, all_varchar=False)

Attach a SQLite database to the current DuckDB session.

Parameters

Name Type Description Default
path str | Path The path to the SQLite database. required
overwrite bool Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. False
all_varchar bool Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. False

Examples

>>> import ibis
>>> import sqlite3
>>> with sqlite3.connect("/tmp/attach_sqlite.db") as con:
...     con.execute("DROP TABLE IF EXISTS t")
...     con.execute("CREATE TABLE t (a INT, b TEXT)")
...     con.execute(
...         "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...     )
<...>
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]
>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
['t']

compile

compile(self, expr, limit=None, params=None, pretty=False, **kwargs)

Compile an Ibis expression to a SQL string.

connect

connect(self, *args, **kwargs)

Connect to the database.

Parameters

Name Type Description Default
*args Mandatory connection parameters, see the docstring of do_connect for details. ()
**kwargs Extra connection parameters, see the docstring of do_connect for details. {}

Notes

This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.

Returns

Type Description
BaseBackend An instance of the backend

create_database

create_database(self, name, catalog=None, force=False)

Create a database named name in catalog.

Parameters

Name Type Description Default
name str Name of the database to create. required
catalog str | None Name of the catalog in which to create the database. If None, the current catalog is used. None
force bool If False, an exception is raised if the database exists. False

create_schema

create_schema(self, name, database=None, force=False)

create_table

create_table(self, name, obj=None, *, schema=None, database=None, temp=False, overwrite=False)

Create a table in DuckDB.

Parameters

Name Type Description Default
name str Name of the table to create required
obj ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None The data with which to populate the table; optional, but at least one of obj or schema must be specified None
schema ibis.Schema | None The schema of the table to create; optional, but at least one of obj or schema must be specified None
database str | None The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). None
temp bool Create a temporary table False
overwrite bool If True, replace the table if it already exists, otherwise fail if the table exists False

create_view

create_view(self, name, obj, *, database=None, schema=None, overwrite=False)

detach

detach(self, name)

Detach a database from the current DuckDB session.

Parameters

Name Type Description Default
name str The name of the database to detach. required

disconnect

disconnect(self)

drop_database

drop_database(self, name, catalog=None, force=False)

Drop the database with name in catalog.

Parameters

Name Type Description Default
name str Name of the schema to drop. required
catalog str | None Name of the catalog to drop the database from. If None, the current catalog is used. None
force bool If False, an exception is raised if the database does not exist. False

drop_schema

drop_schema(self, name, database=None, force=False)

drop_table

drop_table(self, name, database=None, force=False)

drop_view

drop_view(self, name, *, database=None, schema=None, force=False)

execute

execute(self, expr, params=None, limit='default', **_)

Execute an expression.

get_schema

get_schema(self, table_name, *, catalog=None, database=None)

Compute the schema of a table.

Parameters

Name Type Description Default
table_name str May not be fully qualified. Use database if you want to qualify the identifier. required
catalog str | None Catalog name None
database str | None Database name None

Returns

Type Description
sch.Schema Ibis schema

has_operation

has_operation(cls, operation)

insert

insert(self, table_name, obj, schema=None, database=None, overwrite=False)

Insert data into a table.

Ibis does not use the word schema to refer to database hierarchy.

A collection of table is referred to as a database. A collection of database is referred to as a catalog.

These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.

Parameters

Name Type Description Default
table_name str The name of the table to which data needs will be inserted required
obj pd.DataFrame | ir.Table | list | dict The source data or expression to insert required
schema str | None [deprecated] The name of the schema that the table is located in None
database str | None Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). None
overwrite bool If True then replace existing contents of table False

list_catalogs

list_catalogs(self, like=None)

list_databases

list_databases(self, like=None, catalog=None)

List existing databases in the current connection.

Ibis does not use the word schema to refer to database hierarchy.

A collection of table is referred to as a database. A collection of database is referred to as a catalog.

These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.

Parameters

Name Type Description Default
like str | None A pattern in Python’s regex format to filter returned database names. None
catalog str | None The catalog to list databases from. If None, the current catalog is searched. None

Returns

Type Description
list[str] The database names that exist in the current connection, that match the like pattern if provided.

list_schemas

list_schemas(self, like=None, database=None)

list_tables

list_tables(self, like=None, database=None, schema=None)

List tables and views.

Ibis does not use the word schema to refer to database hierarchy.

A collection of tables is referred to as a database. A collection of database is referred to as a catalog.

These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.

Parameters

Name Type Description Default
like str | None Regex to filter by table/view name. None
database tuple[str, str] | str | None Database location. If not passed, uses the current database. By default uses the current database (self.current_database) and catalog (self.current_catalog). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database", or as a tuple of strings ("catalog", "database"). None
schema str | None [deprecated] Schema name. If not passed, uses the current schema. None

Returns

Type Description
list[str] List of table and view names.

Examples

>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
['foo']
>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
['bar', 'foo']
>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]
>>> with con.begin() as c:
...     c.exec_driver_sql("CREATE TABLE my_database.baz (a INTEGER)")
<...>
>>> con.list_tables(database="my_database")
['baz']

load_extension

load_extension(self, extension, force_install=False)

Install and load a duckdb extension by name or path.

Parameters

Name Type Description Default
extension str The extension name or path. required
force_install bool Force reinstallation of the extension. False

raw_sql

raw_sql(self, query, **kwargs)

read_csv

read_csv(self, source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters

Name Type Description Default
source_list str | list[str] | tuple[str] The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. {}

Returns

Type Description
ir.Table The just-registered table

read_delta

read_delta(self, source_table, table_name=None, **kwargs)

Register a Delta Lake table as a table in the current database.

Parameters

Name Type Description Default
source_table str The data source. Must be a directory containing a Delta Lake table. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to deltalake.DeltaTable. {}

Returns

Type Description
ir.Table The just-registered table.

read_geo

read_geo(self, source, table_name=None, **kwargs)

Register a GEO file as a table in the current database.

Parameters

Name Type Description Default
source str The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. {}

Returns

Type Description
ir.Table The just-registered table

read_in_memory

read_in_memory(self, source, table_name=None)

Register an in-memory table object in the current database.

Supported objects include pandas DataFrame, a Polars DataFrame/LazyFrame, or a PyArrow Table or RecordBatchReader.

Parameters

Name Type Description Default
source pd.DataFrame | pa.Table | pa.RecordBatchReader | pl.DataFrame | pl.LazyFrame The data source. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None

Returns

Type Description
ir.Table The just-registered table

read_json

read_json(self, source_list, table_name=None, **kwargs)

Read newline-delimited JSON into an ibis table.

This feature requires duckdb>=0.7.0

Parameters

Name Type Description Default
source_list str | list[str] | tuple[str] File or list of files required
table_name str | None Optional table name None
**kwargs Additional keyword arguments passed to DuckDB’s read_json_auto function {}

Returns

Type Description
Table An ibis table expression

read_mysql

read_mysql(self, uri, *, catalog, table_name=None)

Register a table from a MySQL instance into a DuckDB table.

Parameters

Name Type Description Default
uri str A mysql URI of the form mysql://user:password@host:port/database required
catalog str User-defined alias given to the MySQL database that is being attached to DuckDB required
table_name str | None The table to read None

Returns

Type Description
ir.Table The just-registered table.

read_parquet

read_parquet(self, source_list, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters

Name Type Description Default
source_list str | Iterable[str] The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. {}

Returns

Type Description
ir.Table The just-registered table

read_postgres

read_postgres(self, uri, *, table_name=None, database='public')

Register a table from a postgres instance into a DuckDB table.

Ibis does not use the word schema to refer to database hierarchy.

A collection of table is referred to as a database. A collection of database is referred to as a catalog.

These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.

Parameters

Name Type Description Default
uri str A postgres URI of the form postgres://user:password@host:port required
table_name str | None The table to read None
database str PostgreSQL database (schema) where table_name resides 'public'

Returns

Type Description
ir.Table The just-registered table.

read_sqlite

read_sqlite(self, path, *, table_name=None)

Register a table from a SQLite database into a DuckDB table.

Parameters

Name Type Description Default
path str | Path The path to the SQLite database required
table_name str | None The table to read None

Returns

Type Description
ir.Table The just-registered table.

Examples

>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> with sqlite3.connect("/tmp/sqlite.db") as con:
...     con.execute("DROP TABLE IF EXISTS t")
...     con.execute("CREATE TABLE t (a INT, b TEXT)")
...     con.execute(
...         "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...     )
<...>
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite(path="/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
1 │ a      │
2 │ b      │
3 │ c      │
└───────┴────────┘

reconnect

reconnect(self)

Reconnect to the database already configured with connect.

register

register(self, source, table_name=None, **kwargs)

Register a data source as a table in the current database.

Parameters

Name Type Description Default
source str | Path | Any The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs Any Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. {}

Returns

Type Description
ir.Table The just-registered table

register_filesystem

register_filesystem(self, filesystem)

Register an fsspec filesystem object with DuckDB.

This allow a user to read from any fsspec compatible filesystem using read_csv, read_parquet, read_json, etc.

Note

Creating an fsspec filesystem requires that the corresponding backend-specific fsspec helper library is installed.

e.g. to connect to Google Cloud Storage, gcsfs must be installed.

Parameters

Name Type Description Default
filesystem AbstractFileSystem The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. required

Examples

>>> import ibis
>>> import fsspec
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
...     "gcs://ibis-examples/data/band_members.csv.gz",
...     table_name="band_members",
... )
DatabaseTable: band_members
  name string
  band string

register_options

register_options(cls)

Register custom backend options.

rename_table

rename_table(self, old_name, new_name)

Rename an existing table.

Parameters

Name Type Description Default
old_name str The old name of the table. required
new_name str The new name of the table. required

sql

sql(self, query, schema=None, dialect=None)

table

table(self, name, schema=None, database=None)

Construct a table expression.

Parameters

Name Type Description Default
name str Table name required
schema str | None [deprecated] Schema name None
database str | None Database name None

Returns

Type Description
Table Table expression

to_csv

to_csv(self, expr, path, *, params=None, header=True, **kwargs)

Write the results of executing the given expression to a CSV file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Table The ibis expression to execute and persist to CSV. required
path str | Path The data source. A string or Path to the CSV file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
header bool Whether to write the column names as the first line of the CSV file. True
**kwargs Any DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters {}

to_delta

to_delta(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a Delta Lake table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Table The ibis expression to execute and persist to Delta Lake table. required
path str | Path The data source. A string or Path to the Delta Lake table. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
kwargs Any Additional keyword arguments passed to deltalake.writer.write_deltalake method {}

to_pandas

to_pandas(self, expr, *, params=None, limit=None, **kwargs)

Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.

Note

This method is a wrapper around execute.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to execute. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

to_pandas_batches

to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)

Execute an Ibis expression and return an iterator of pandas DataFrames.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to execute. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
chunk_size int Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. 1000000
kwargs Any Keyword arguments {}

Returns

Type Description
Iterator[pd.DataFrame] An iterator of pandas DataFrames.

to_parquet

to_parquet(self, expr, path, *, params=None, **kwargs)

Write the results of executing the given expression to a parquet file.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Table The ibis expression to execute and persist to parquet. required
path str | Path The data source. A string or Path to the parquet file. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
**kwargs Any DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet#writing-to-parquet-files for details {}

Examples

Write out an expression to a single parquet file.

>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")

Write out an expression to a hive-partitioned parquet file.

>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)

Partition on a single column.

>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")

Partition on multiple columns.

>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))

to_polars

to_polars(self, expr, *, params=None, limit=None, **kwargs)

Execute expression and return results in as a polars DataFrame.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to export to polars. required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

Returns

Type Description
dataframe A polars DataFrame holding the results of the executed expression.

to_pyarrow

to_pyarrow(self, expr, *, params=None, limit=None, **_)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to export to pyarrow required
params Mapping[ir.Scalar, Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs Any Keyword arguments {}

Returns

Type Description
Table A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **_)

Return a stream of record batches.

The returned RecordBatchReader contains a cursor with an unbounded lifetime.

For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression required
params Mapping[ir.Scalar, Any] | None Bound parameters None
limit int | str | None Limit the result to this number of rows None
chunk_size int ::: {.callout-warning} ## DuckDB returns 1024 size batches regardless of what argument is passed. ::: 1000000

to_torch

to_torch(self, expr, *, params=None, limit=None, **kwargs)

Execute an expression and return results as a dictionary of torch tensors.

Parameters

Name Type Description Default
expr ir.Expr Ibis expression to execute. required
params Mapping[ir.Scalar, Any] | None Parameters to substitute into the expression. None
limit int | str | None An integer to effect a specific row limit. A value of None means no limit. None
kwargs Any Keyword arguments passed into the backend’s to_torch implementation. {}

Returns

Type Description
dict[str, torch.Tensor] A dictionary of torch tensors, keyed by column name.

truncate_table

truncate_table(self, name, database=None, schema=None)

Delete all rows from a table.

Ibis does not use the word schema to refer to database hierarchy.

A collection of tables is referred to as a database. A collection of database is referred to as a catalog. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.

Parameters

Name Type Description Default
name str Table name required
database str | None Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). None
schema str | None [deprecated] Schema name None
Back to top