import ibis
= ibis.connect("duckdb://local.ddb") con
DuckDB
Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb
extra:
pip install 'ibis-framework[duckdb]'
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
= ibis.duckdb.connect() con
Connect to, or create, a local DuckDB file
= ibis.duckdb.connect("mydb.duckdb") con
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters
do_connect
do_connect(self, database=':memory:', read_only=False, temp_directory=None, extensions=None, **config)
Create an Ibis client connected to a DuckDB database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
database |
str | Path | Path to a duckdb database. | ':memory:' |
read_only |
bool | Whether the database is read-only. | False |
temp_directory |
str | Path | None | Directory to use for spilling to disk. Only set by default for in-memory connections. | None |
extensions |
Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
config |
Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>
ibis.connect
URL format
In addition to ibis.duckdb.connect
, you can also connect to DuckDB by passing a properly formatted DuckDB connection URL to ibis.connect
Without an empty path, ibis.connect
will connect to an ephemeral, in-memory database.
= ibis.connect("duckdb://") con
MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md:
or motherduck:
. ibis
will trigger the authentication prompt in-browser.
import ibis
= ibis.duckdb.connect("md:") con
Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables
)
con.list_tables()open the SSO authorization page in your default browser.
Attempting to automatically 1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅as an environment variable to avoid having to log in again:
You can store it ='****************'
$ export motherduck_token
'penguins'] [
duckdb.Backend
attach_sqlite
attach_sqlite(self, path, overwrite=False, all_varchar=False)
Attach a SQLite database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | Path | The path to the SQLite database. | required |
overwrite |
bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
all_varchar |
bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> with sqlite3.connect("/tmp/attach_sqlite.db") as con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )
...<...>
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
't'] [
create_schema
create_schema(self, name, database=None, force=False)
Create a schema named name
in database
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the schema to create. | required |
database |
str | None | Name of the database in which to create the schema. If None , the current database is used. |
None |
force |
bool | If False , an exception is raised if the schema exists. |
False |
drop_schema
drop_schema(self, name, database=None, force=False)
Drop the schema with name
in database
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the schema to drop. | required |
database |
str | None | Name of the database to drop the schema from. If None , the current database is used. |
None |
force |
bool | If False , an exception is raised if the schema does not exist. |
False |
drop_table
drop_table(self, name, database=None, force=False)
Drop a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Table to drop | required |
database |
str | None | Database to drop table from | None |
force |
bool | Check for existence before dropping | False |
fetch_from_cursor
fetch_from_cursor(self, cursor, schema)
list_databases
list_databases(self, like=None)
list_schemas
list_schemas(self, like=None, database=None)
List existing schemas in the current connection.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | A pattern in Python’s regex format to filter returned schema names. | None |
database |
str | None | The database to list schemas from. If None , the current database is searched. |
None |
Returns
Type | Description |
---|---|
list[str] | The schema names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(self, like=None, database=None, schema=None)
List tables and views.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | Regex to filter by table/view name. | None |
database |
str | None | Database name. If not passed, uses the current database. Only supported with MotherDuck. | None |
schema |
str | None | Schema name. If not passed, uses the current schema. | None |
Returns
Type | Description |
---|---|
list[str] | List of table and view names. |
Examples
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
'foo']
[>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
'bar', 'foo']
[>>> con.create_schema("my_schema")
>>> con.list_tables(schema="my_schema")
[]>>> with con.begin() as c:
... c.exec_driver_sql("CREATE TABLE my_schema.baz (a INTEGER)"
...
... )
...<...>
>>> con.list_tables(schema="my_schema")
'baz'] [
load_extension
load_extension(self, extension)
Install and load a duckdb extension by name or path.
Parameters
Name | Type | Description | Default |
---|---|---|---|
extension |
str | The extension name or path. | required |
read_csv
read_csv(self, source_list, table_name=None, **kwargs)
Register a CSV file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table |
read_delta
read_delta(self, source_table, table_name=None, **kwargs)
Register a Delta Lake table as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_table |
str | The data source. Must be a directory containing a Delta Lake table. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table. |
read_in_memory
read_in_memory(self, source, table_name=None)
Register a Pandas DataFrame or pyarrow object as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
pd.DataFrame | pa.Table | pa.RecordBatchReader | The data source. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table |
read_json
read_json(self, source_list, table_name=None, **kwargs)
Read newline-delimited JSON into an ibis table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str] | File or list of files | required |
table_name |
str | None | Optional table name | None |
**kwargs |
Additional keyword arguments passed to DuckDB’s read_json_auto function |
{} |
Returns
Type | Description |
---|---|
Table | An ibis table expression |
read_parquet
read_parquet(self, source_list, table_name=None, **kwargs)
Register a parquet file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | Iterable[str] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table |
read_postgres
read_postgres(self, uri, table_name=None, schema='public')
Register a table from a postgres instance into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri |
str | A postgres URI of the form postgres://user:password@host:port |
required |
table_name |
str | None | The table to read | None |
schema |
str | PostgreSQL schema where table_name resides |
'public' |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table. |
read_sqlite
read_sqlite(self, path, table_name=None)
Register a table from a SQLite database into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | Path | The path to the SQLite database | required |
table_name |
str | None | The table to read | None |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> with sqlite3.connect("/tmp/sqlite.db") as con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )
...<...>
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤1 │ a │
│ 2 │ b │
│ 3 │ c │
│ └───────┴────────┘
register
register(self, source, table_name=None, **kwargs)
Register a data source as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
str | Path | Any | The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
Any | Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Type | Description |
---|---|
ir.Table | The just-registered table |
to_csv
to_csv(self, expr, path, *, params=None, header=True, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table | The ibis expression to execute and persist to CSV. | required |
path |
str | Path | The data source. A string or Path to the CSV file. | required |
params |
Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
header |
bool | Whether to write the column names as the first line of the CSV file. | True |
**kwargs |
Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv.html#parameters | {} |
to_parquet
to_parquet(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table | The ibis expression to execute and persist to parquet. | required |
path |
str | Path | The data source. A string or Path to the parquet file. | required |
params |
Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs |
Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet#writing-to-parquet-files for details | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")
Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")
Partition on multiple columns.
>>> con.to_parquet(
=("year", "island")
... penguins, tempfile.mkdtemp(), partition_by ... )
to_pyarrow
to_pyarrow(self, expr, *, params=None, limit=None, **_)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr | Ibis expression to export to pyarrow | required |
params |
Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **_)
Return a stream of record batches.
The returned RecordBatchReader
contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr | Ibis expression | required |
params |
Mapping[ir.Scalar, Any] | None | Bound parameters | None |
limit |
int | str | None | Limit the result to this number of rows | None |
chunk_size |
int | ::: {.callout-warning} ## DuckDB returns 1024 size batches regardless of what argument is passed. ::: | 1000000 |
to_torch
to_torch(self, expr, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr | Ibis expression to execute. | required |
params |
Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs |
Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Type | Description |
---|---|
dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |