DuckDB¶
Introduced in v3.0
duckdb
>= 0.5.0 requires duckdb-engine
>= 0.6.2
If you encounter problems when using duckdb
>= 0.5.0 you may need to
upgrade duckdb-engine
to at least version 0.6.2.
See this issue for more details.
Install¶
Install ibis and dependencies for the DuckDB backend:
pip install 'ibis-framework[duckdb]'
conda install -c conda-forge ibis-duckdb
mamba install -c conda-forge ibis-duckdb
Connect¶
API¶
Create a client by passing in a path to a DuckDB database to ibis.duckdb.connect
.
See ibis.backends.duckdb.Backend.do_connect
for connection parameter information.
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters¶
do_connect(database=':memory:', path=None, read_only=False, temp_directory=None, **config)
¶
Create an Ibis client connected to a DuckDB database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
database |
str | Path
|
Path to a duckdb database. |
':memory:'
|
path |
str | Path
|
Deprecated, use |
None
|
read_only |
bool
|
Whether the database is read-only. |
False
|
temp_directory |
Path | str | None
|
Directory to use for spilling to disk. Only set by default for in-memory connections. |
None
|
config |
Any
|
DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. |
{}
|
Examples:
>>> import ibis
>>> ibis.duckdb.connect("database.ddb", threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at ...>
Backend API¶
Backend
¶
Bases: BaseAlchemyBackend
Functions¶
read_csv(source_list, table_name=None, **kwargs)
¶
Register a CSV file as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str]
|
The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
kwargs |
Any
|
Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. |
{}
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
read_in_memory(dataframe, table_name=None)
¶
Register a Pandas DataFrame or pyarrow Table as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe |
pd.DataFrame | pa.Table
|
The data source. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
read_json(source_list, table_name=None, **kwargs)
¶
Read newline-delimited JSON into an ibis table.
This feature requires duckdb>=0.7.0
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | list[str] | tuple[str]
|
File or list of files |
required |
table_name |
str | None
|
Optional table name |
None
|
kwargs |
Additional keyword arguments passed to DuckDB's |
{}
|
Returns:
Type | Description |
---|---|
Table
|
An ibis table expression |
read_parquet(source_list, table_name=None, **kwargs)
¶
Register a parquet file as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_list |
str | Iterable[str]
|
The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
kwargs |
Any
|
Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. |
{}
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
read_postgres(uri, table_name=None, schema='public')
¶
Register a table from a postgres instance into a DuckDB table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uri |
The postgres URI in form 'postgres://user:password@host:port' |
required | |
table_name |
str | None
|
The table to read |
None
|
schema |
str
|
PostgreSQL schema where |
'public'
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table. |
read_sqlite(path, table_name=None)
¶
Register a table from a SQLite database into a DuckDB table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str | Path
|
The path to the SQLite database |
required |
table_name |
str | None
|
The table to read |
None
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table. |
Examples:
>>> import ibis
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("ci/ibis-testing-data/ibis_testing.db", table_name="diamonds")
>>> t.head().execute()
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
register(source, table_name=None, **kwargs)
¶
Register a data source as a table in the current database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str | Path | Any
|
The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. |
required |
table_name |
str | None
|
An optional name to use for the created table. This defaults to a sequentially generated name. |
None
|
**kwargs |
Any
|
Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. |
{}
|
Returns:
Type | Description |
---|---|
ir.Table
|
The just-registered table |
to_csv(expr, path, *, params=None, header=True, **kwargs)
¶
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table
|
The ibis expression to execute and persist to CSV. |
required |
path |
str | Path
|
The data source. A string or Path to the CSV file. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
header |
bool
|
Whether to write the column names as the first line of the CSV file. |
True
|
kwargs |
Any
|
DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv.html#parameters |
{}
|
to_parquet(expr, path, *, params=None, **kwargs)
¶
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Table
|
The ibis expression to execute and persist to parquet. |
required |
path |
str | Path
|
The data source. A string or Path to the parquet file. |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Mapping of scalar parameter expressions to value. |
None
|
kwargs |
Any
|
DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet#writing-to-parquet-files for details |
{}
|
Examples:
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "penguins.parquet")
Write out an expression to a hive-partitioned parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "penguins_hive_dir", partition_by="year")
>>> # partition on multiple columns
>>> con.to_parquet(penguins, "penguins_hive_dir", partition_by=("year", "island"))
to_pyarrow_batches(expr, *, params=None, limit=None, chunk_size=1000000, **_)
¶
Return a stream of record batches.
The returned RecordBatchReader
contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr |
ir.Expr
|
Ibis expression |
required |
params |
Mapping[ir.Scalar, Any] | None
|
Bound parameters |
None
|
limit |
int | str | None
|
Limit the result to this number of rows |
None
|
chunk_size |
int
|
DuckDB returns 1024 size batches regardless of what argument is passed. |
1000000
|