import ibis
con = ibis.connect("duckdb://local.ddb")ibis.connect URL format
DuckDB

Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb extra:
pip install 'ibis-framework[duckdb]'And connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdbAnd connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdbAnd connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
con = ibis.duckdb.connect()Connect to, or create, a local DuckDB file
con = ibis.duckdb.connect("mydb.duckdb")ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.
Connection Parameters
do_connect
do_connect(self, database=':memory:', read_only=False, extensions=None, **config)
Create an Ibis client connected to a DuckDB database.
Before, we had special handling if the user passed the temp_directory parameter, setting a custom default, and creating intermediate directories if necessary. Now, we do nothing, and just pass the value directly to DuckDB. You may need to add Path(your_temp_dir).mkdir(exists_ok=True, parents=True) to your code to maintain the old behavior.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| database | str | Path | Path to a duckdb database. | ':memory:' |
| read_only | bool | Whether the database is read-only. | False |
| extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
| config | Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at 0x...>In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect:
Given an empty path, ibis.connect will connect to an ephemeral, in-memory database.
con = ibis.connect("duckdb://")Cloud bucket reads
DuckDB has a secret management system that is designed to support reading data from both public and private cloud blob storage systems like Amazon’s S3.
To make your life easier, you should probably start with the CREDENTIAL_CHAIN provider, which mimics the default behavior of the aws CLI and SDKs.
Ibis doesn’t have an Ibis-native API for dealing with DuckDB’s secrets, but you can of course run raw_sql to set them up:
con.raw_sql("CREATE SECRET s3 (TYPE S3, PROVIDER CREDENTIAL_CHAIN)")Assuming you’ve got the appropriate authorizations in AWS, this should allow DuckDB to read from any bucket you’re authorized to access.
MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md: or motherduck:. ibis will trigger the authentication prompt in-browser.
import ibis
con = ibis.duckdb.connect("md:")Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables)
con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅
You can store it as an environment variable to avoid having to log in again:
$ export motherduck_token='****************'
['penguins']Geospatial
The DuckDB backend has experimental support for geospatial operations.
To enable geospatial operations, install the geospatial extra or otherwise install the additional required dependencies:
See read_geo for some tips on reading in geospatial data.
pip install 'ibis-framework[geospatial]'conda install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'mamba install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'duckdb.Backend
attach
attach(self, path, name=None, read_only=False)
Attach another DuckDB database to the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | Path to the database to attach. | required |
| name | str | None | Name to attach the database as. Defaults to the basename of path. |
None |
| read_only | bool | Whether to attach the database as read-only. | False |
attach_sqlite
attach_sqlite(self, path, overwrite=False, all_varchar=False)
Attach a SQLite database to the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The path to the SQLite database. | required |
| overwrite | bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
| all_varchar | bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> con = sqlite3.connect("/tmp/attach_sqlite.db")
>>> with con:
... con.execute("DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
... )
<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]
>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
['t']compile
compile(self, expr, /, *, limit=None, params=None, pretty=False)
Compile an expression to a SQL string.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | An ibis expression to compile. | required |
| limit | str | int | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| params | Mapping[ir.Expr, Any] | None | Mapping of scalar parameter expressions to value. | None |
| pretty | bool | Pretty print the SQL query during compilation. | False |
Returns
| Name | Type | Description |
|---|---|---|
| str | Compiled expression |
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| *args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
| **kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.
Returns
| Name | Type | Description |
|---|---|---|
| BaseBackend | An instance of the backend |
create_database
create_database(self, name, /, *, catalog=None, force=False)
Create a database named name in catalog.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the database to create. | required |
| catalog | str | None | Name of the catalog in which to create the database. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database exists. |
False |
create_table
create_table(self, name, /, obj=None, *, schema=None, database=None, temp=False, overwrite=False)
Create a table in DuckDB.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the table to create | required |
| obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but at least one of obj or schema must be specified |
None |
| schema | SchemaLike | None | The schema of the table to create; optional, but at least one of obj or schema must be specified |
None |
| database | str | None | The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| temp | bool | Create a temporary table | False |
| overwrite | bool | If True, replace the table if it already exists, otherwise fail if the table exists |
False |
create_view
create_view(self, name, /, obj, *, database=None, overwrite=False)
Create a view from an Ibis expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the view to create. | required |
| obj | ir.Table | The Ibis expression to create the view from. | required |
| database | str | None | The database that the view should be created in. | None |
| overwrite | bool | If True, replace an existing view with the same name. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | A table expression representing the view. |
detach
detach(self, name)
Detach a database from the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the database to detach. | required |
disconnect
disconnect(self)
Disconnect from the backend.
drop_database
drop_database(self, name, /, *, catalog=None, force=False)
Drop the database with name in catalog.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the schema to drop. | required |
| catalog | str | None | Name of the catalog to drop the database from. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database does not exist. |
False |
drop_table
drop_table(self, name, /, *, database=None, force=False)
Drop a table from the backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the table to drop | required |
| database | tuple[str, str] | str | None | The database that the table is located in. | None |
| force | bool | If True, do not raise an error if the table does not exist. |
False |
drop_view
drop_view(self, name, /, *, database=None, force=False)
Drop a view from the backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the view to drop. | required |
| database | str | None | The database that the view is located in. | None |
| force | bool | If True, do not raise an error if the view does not exist. |
False |
execute
execute(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an expression.
from_connection
from_connection(cls, con, /, *, extensions=None)
Create an Ibis client from an existing connection to a DuckDB database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| con | duckdb.DuckDBPyConnection | An existing connection to a DuckDB database. | required |
| extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
get_schema
get_schema(self, table_name, *, catalog=None, database=None)
Compute the schema of a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
| catalog | str | None | Catalog name | None |
| database | str | None | Database name | None |
Returns
| Name | Type | Description |
|---|---|---|
| sch.Schema | Ibis schema |
has_operation
has_operation(cls, operation, /)
Return whether the backend supports the given operation.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| operation | type[ops.Value] | Operation type, a Python class object. | required |
insert
insert(self, name, /, obj, *, database=None, overwrite=False)
Insert data into a table.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the table to which data needs will be inserted | required |
| obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(self, *, like=None)
List existing catalogs in the current connection.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format to filter returned catalog names. | None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The catalog names that exist in the current connection, that match the like pattern if provided. |
list_databases
list_databases(self, *, like=None, catalog=None)
List existing databases in the current connection.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
| catalog | str | None | The catalog to list databases from. If None, the current catalog is searched. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(self, *, like=None, database=None)
The table names that match like in the given database.
For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format. | None |
| database | tuple[str, str] | str | None | The database, or (catalog, database) from which to list tables. For backends that support a single-level table hierarchy, you can pass in a string like "bar". For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). If not provided, the current database (and catalog, if applicable for this backend) is used. See the Table Hierarchy Concepts Guide for more info. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The list of the table names that match the pattern like. |
Examples
This example uses the DuckDB backend, but the list_tables API works the same for other backends.
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
['foo']
>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
['bar', 'foo']
>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]
>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
['baz']load_extension
load_extension(self, extension, force_install=False)
Install and load a duckdb extension by name or path.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| extension | str | The extension name or path. | required |
| force_install | bool | Force reinstallation of the extension. | False |
raw_sql
raw_sql(self, query, **kwargs)
read_csv
read_csv(self, paths, /, *, table_name=None, columns=None, types=None, **kwargs)
Register a CSV file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| paths | str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| columns | Mapping[str, str | dt.DataType] | None | An optional mapping of all column names to their types. | None |
| types | Mapping[str, str | dt.DataType] | None | An optional mapping of a subset of column names to their types. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
Examples
Generate some data
>>> import tempfile
>>> data = b'''
... lat,lon,geom
... 1.0,2.0,POINT (1 2)
... 2.0,3.0,POINT (2 3)
... '''
>>> with tempfile.NamedTemporaryFile(delete=False) as f:
... nbytes = f.write(data)Import Ibis
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> con = ibis.duckdb.connect()Read the raw CSV file
>>> t = con.read_csv(f.name)
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ string │
├─────────┼─────────┼─────────────┤
│ 1.0 │ 2.0 │ POINT (1 2) │
│ 2.0 │ 3.0 │ POINT (2 3) │
└─────────┴─────────┴─────────────┘Load the spatial extension and read the CSV file again, using specific column types
>>> con.load_extension("spatial")
>>> t = con.read_csv(f.name, types={"geom": "geometry"})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ geospatial:geometry │
├─────────┼─────────┼──────────────────────┤
│ 1.0 │ 2.0 │ <POINT (1 2)> │
│ 2.0 │ 3.0 │ <POINT (2 3)> │
└─────────┴─────────┴──────────────────────┘read_delta
read_delta(self, path, /, *, table_name=None, **kwargs)
Register a Delta Lake table as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. Must be a directory containing a Delta Lake table. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| kwargs | Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_geo
read_geo(self, path, /, *, table_name=None, **kwargs)
Register a geospatial data file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_json
read_json(self, paths, /, *, table_name=None, columns=None, **kwargs)
Read newline-delimited JSON into an ibis table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| paths | str | list[str] | tuple[str] | File or list of files | required |
| table_name | str | None | Optional table name | None |
| columns | Mapping[str, str] | None | Optional mapping from string column name to duckdb type string. | None |
| **kwargs | Additional keyword arguments passed to DuckDB’s read_json_auto function. See https://duckdb.org/docs/data/json/overview.html#json-loading for parameters and more information about reading JSON. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | An ibis table expression |
read_mysql
read_mysql(self, uri, /, *, catalog, table_name=None)
Register a table from a MySQL instance into a DuckDB table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| uri | str | A mysql URI of the form mysql://user:password@host:port/database |
required |
| catalog | str | User-defined alias given to the MySQL database that is being attached to DuckDB | required |
| table_name | str | None | The table to read | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_parquet
read_parquet(self, paths, /, *, table_name=None, **kwargs)
Register a parquet file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| paths | str | Path | Iterable[str | Path] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_postgres
read_postgres(self, uri, /, *, table_name=None, database='public')
Register a table from a postgres instance into a DuckDB table.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| uri | str | A postgres URI of the form postgres://user:password@host:port |
required |
| table_name | str | None | The table to read | None |
| database | str | PostgreSQL database (schema) where table_name resides |
'public' |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_sqlite
read_sqlite(self, path, /, *, table_name=None)
Register a table from a SQLite database into a DuckDB table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The path to the SQLite database | required |
| table_name | str | None | The table to read | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> con = sqlite3.connect("/tmp/sqlite.db")
>>> with con:
... con.execute("DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
... )
<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
└───────┴────────┘read_xlsx
read_xlsx(self, path, /, *, sheet=None, range=None, **kwargs)
Read an Excel file into a DuckDB table. This requires duckdb>=1.2.0.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The path to the Excel file. | required |
| sheet | str | None | The name of the sheet to read, eg ‘Sheet3’. | None |
| range | str | None | The range of cells to read, eg ‘A5:Z’. | None |
| kwargs | Additional args passed to the backend’s read function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
See Also
Examples
>>> import os
>>> import ibis
>>> t = ibis.memtable({"a": [1, 2, 3], "b": ["a", "b", "c"]})
>>> con = ibis.duckdb.connect()
>>> con.to_xlsx(t, "/tmp/test.xlsx", header=True)
>>> assert os.path.exists("/tmp/test.xlsx")
>>> t = con.read_xlsx("/tmp/test.xlsx")
>>> t.columns
('a', 'b')reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_filesystem
register_filesystem(self, filesystem)
Register an fsspec filesystem object with DuckDB.
This allow a user to read from any fsspec compatible filesystem using read_csv, read_parquet, read_json, etc.
Creating an fsspec filesystem requires that the corresponding backend-specific fsspec helper library is installed.
e.g. to connect to Google Cloud Storage, gcsfs must be installed.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filesystem | AbstractFileSystem | The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. | required |
Examples
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
... "gcs://ibis-examples/data/band_members.csv.gz",
... table_name="band_members",
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │
└────────┴─────────┘register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| old_name | str | The old name of the table. | required |
| new_name | str | The new name of the table. | required |
sql
sql(self, query, /, *, schema=None, dialect=None)
Create an Ibis table expression from a SQL query.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| query | str | A SQL query string | required |
| schema | SchemaLike | None | The schema of the query. If not provided, Ibis will try to infer the schema of the query. | None |
| dialect | str | None | The SQL dialect of the query. If not provided, the backend’s dialect is assumed. This argument can be useful when the query is written in a different dialect from the backend. | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The table expression representing the query |
table
table(self, name, /, *, database=None)
to_csv
to_csv(self, expr, /, path, *, params=None, header=True, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
| path | str | Path | The data source. A string or Path to the CSV file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| header | bool | Whether to write the column names as the first line of the CSV file. | True |
| kwargs | Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters | {} |
to_delta
to_delta(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
| path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_geo
to_geo(self, expr, /, path, *, format, layer_creation_options=None, params=None, limit=None, **kwargs)
Write the results of executing expr to a geospatial output.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | Ibis expression to execute and persist to geospatial output. | required |
| path | str | Path | A string or Path to the desired output file location. | required |
| format | str | The format of the geospatial output. One of GDAL’s supported vector formats. The list of vector formats is located here: https://gdal.org/en/latest/drivers/vector/index.html | required |
| layer_creation_options | Mapping[str, Any] | None | A mapping of layer creation options. | None |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| kwargs | Any | Additional keyword arguments passed to the DuckDB COPY command. |
{} |
Examples
>>> import os
>>> import tempfile
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _Load some geospatial data
>>> con = ibis.duckdb.connect()
>>> zones = ibis.examples.zones.fetch(backend=con)
>>> zones[["zone", "geom"]].head()
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ zone ┃ geom ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ geospatial:geometry │
├───────────────────────────────────────┼──────────────────────────────────────┤
│ │ <POLYGON ((933100.918 192536.086, │
│ Newark Airport │ 933091.011 192572.175, 933088.585 │
│ │ 192604.9...> │
│ │ <MULTIPOLYGON (((1033269.244 │
│ Jamaica Bay │ 172126.008, 1033439.643 170883.946, │
│ │ 1033473.265...> │
│ │ <POLYGON ((1026308.77 256767.698, │
│ Allerton/Pelham Gardens │ 1026495.593 256638.616, 1026567.23 │
│ │ 256589....> │
│ │ <POLYGON ((992073.467 203714.076, │
│ Alphabet City │ 992068.667 203711.502, 992061.716 │
│ │ 203711.7...> │
│ │ <POLYGON ((935843.31 144283.336, │
│ Arden Heights │ 936046.565 144173.418, 936387.922 │
│ │ 143967.75...> │
└───────────────────────────────────────┴──────────────────────────────────────┘Write to a GeoJSON file
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,
... path=os.path.join(tmpdir, "zones.geojson"),
... format="geojson",
... )Write to a Shapefile
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,
... path=os.path.join(tmpdir, "zones.shp"),
... format="ESRI Shapefile",
... )to_json
to_json(self, expr, /, path, *, compression='auto', dateformat=None, timestampformat=None)
Write the results of expr to a json file of [{column -> value}, …] objects.
This method is eager and will execute the associated expression immediately. See https://duckdb.org/docs/sql/statements/copy.html#json-options for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
| path | str | Path | URLs such as S3 buckets are supported. | required |
| compression | Literal['auto', 'none', 'gzip', 'zstd'] | Compression codec to use. One of “auto”, “none”, “gzip”, “zstd”. | 'auto' |
| dateformat | str | None | Date format string. | None |
| timestampformat | str | None | Timestamp format string. | None |
to_pandas
to_pandas(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.
This method is a wrapper around execute.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, /, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrames.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Iterator[pd.DataFrame] | An iterator of pandas DataFrames. |
to_parquet
to_parquet(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| path | str | Path | The data source. A string or Path to the parquet file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files for details. | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")Partition on multiple columns.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))to_parquet_dir
to_parquet_dir(self, expr, /, directory, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
| https | required |
to_polars
to_polars(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to polars. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression to a pyarrow object.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to pyarrow | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| result | If the passed expression is a Table, a pyarrow table is returned. If the passed expression is a Column, a pyarrow array is returned. If the passed expression is a Scalar, a pyarrow scalar is returned. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, /, *, params=None, limit=None, chunk_size=1000000, **_)
Return a stream of record batches.
The returned RecordBatchReader contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression | required |
| params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
| limit | int | str | None | Limit the result to this number of rows | None |
| chunk_size | int | The number of rows to fetch per batch | 1000000 |
to_torch
to_torch(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
to_xlsx
to_xlsx(self, expr, /, path, *, sheet='Sheet1', header=False, params=None, **kwargs)
Write a table to an Excel file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | Ibis table expression to write to an excel file. | required |
| path | str | Path | Excel output path. | required |
| sheet | str | The name of the sheet to write to, eg ‘Sheet3’. | 'Sheet1' |
| header | bool | Whether to include the column names as the first row. | False |
| params | Mapping[ir.Scalar, Any] | None | Additional Ibis expression parameters to pass to the backend’s write function. | None |
| kwargs | Any | Additional arguments passed to the backend’s write function. | {} |
Notes
Requires DuckDB >= 1.2.0.
See Also
Examples
>>> import os
>>> import ibis
>>> t = ibis.memtable({"a": [1, 2, 3], "b": ["a", "b", "c"]})
>>> con = ibis.duckdb.connect()
>>> con.to_xlsx(t, "/tmp/test.xlsx")
>>> os.path.exists("/tmp/test.xlsx")
Truetruncate_table
truncate_table(self, name, /, *, database=None)
Delete all rows from a table.
schema to refer to database hierarchy.
A collection of tables is referred to as a database. A collection of database is referred to as a catalog. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Table name | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |