import ibis
con = ibis.connect("duckdb://local.ddb")ibis.connect URL format
DuckDB

Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb extra:
pip install 'ibis-framework[duckdb]'And connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdbAnd connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdbAnd connect:
import ibis
con = ibis.duckdb.connect()- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
con = ibis.duckdb.connect()Connect to, or create, a local DuckDB file
con = ibis.duckdb.connect("mydb.duckdb")ibis.duckdb.connect is a thin wrapper around ibis.backends.duckdb.Backend.do_connect.
Connection Parameters
do_connect
do_connect(['self', "database=':memory:'", 'read_only=False', 'temp_directory=None', 'extensions=None', '**config'])
Create an Ibis client connected to a DuckDB database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| database | str | Path | Path to a duckdb database. | ':memory:' |
| read_only | bool | Whether the database is read-only. | False |
| temp_directory | str | Path | None | Directory to use for spilling to disk. Only set by default for in-memory connections. | None |
| extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
| config | Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at 0x...>In addition to ibis.duckdb.connect, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect:
Given an empty path, ibis.connect will connect to an ephemeral, in-memory database.
con = ibis.connect("duckdb://")MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md: or motherduck:. ibis will trigger the authentication prompt in-browser.
import ibis
con = ibis.duckdb.connect("md:")Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables)
con.list_tables()
Attempting to automatically open the SSO authorization page in your default browser.
1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅
You can store it as an environment variable to avoid having to log in again:
$ export motherduck_token='****************'
['penguins']Geospatial
The DuckDB backend has experimental support for geospatial operations.
To enable geospatial operations, install the geospatial extra or otherwise install the additional required dependencies:
See read_geo for some tips on reading in geospatial data.
pip install 'ibis-framework[geospatial]'conda install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'mamba install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'duckdb.Backend
attach
attach(['self', 'path', 'name=None', 'read_only=False'])
Attach another DuckDB database to the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | Path to the database to attach. | required |
| name | str | None | Name to attach the database as. Defaults to the basename of path. |
None |
| read_only | bool | Whether to attach the database as read-only. | False |
attach_sqlite
attach_sqlite(['self', 'path', 'overwrite=False', 'all_varchar=False'])
Attach a SQLite database to the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The path to the SQLite database. | required |
| overwrite | bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
| all_varchar | bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> with sqlite3.connect("/tmp/attach_sqlite.db") as con:
... con.execute("DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
... )
<...>
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]
>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
['t']compile
compile(['self', 'expr', 'limit=None', 'params=None', 'pretty=False'])
Compile an Ibis expression to a SQL string.
connect
connect(['self', '*args', '**kwargs'])
Connect to the database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| *args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
| **kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.
Returns
| Name | Type | Description |
|---|---|---|
| BaseBackend | An instance of the backend |
create_database
create_database(['self', 'name', 'catalog=None', 'force=False'])
Create a database named name in catalog.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the database to create. | required |
| catalog | str | None | Name of the catalog in which to create the database. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database exists. |
False |
create_table
create_table(['self', 'name', 'obj=None', '*', 'schema=None', 'database=None', 'temp=False', 'overwrite=False'])
Create a table in DuckDB.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the table to create | required |
| obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but at least one of obj or schema must be specified |
None |
| schema | SchemaLike | None | The schema of the table to create; optional, but at least one of obj or schema must be specified |
None |
| database | str | None | The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| temp | bool | Create a temporary table | False |
| overwrite | bool | If True, replace the table if it already exists, otherwise fail if the table exists |
False |
create_view
create_view(['self', 'name', 'obj', '*', 'database=None', 'overwrite=False'])
Create a new view from an expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the new view. | required |
| obj | ir.Table | An Ibis table expression that will be used to create the view. | required |
| database | str | None | Name of the database where the view will be created, if not provided the database’s default is used. | None |
| overwrite | bool | Whether to clobber an existing view with the same name | False |
Returns
| Name | Type | Description |
|---|---|---|
| Table | The view that was created. |
detach
detach(['self', 'name'])
Detach a database from the current DuckDB session.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the database to detach. | required |
disconnect
disconnect(['self'])
Close the connection to the backend.
drop_database
drop_database(['self', 'name', 'catalog=None', 'force=False'])
Drop the database with name in catalog.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the schema to drop. | required |
| catalog | str | None | Name of the catalog to drop the database from. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database does not exist. |
False |
drop_table
drop_table(['self', 'name', 'database=None', 'force=False'])
Drop a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the table to drop. | required |
| database | str | None | Name of the database where the table exists, if not the default. | None |
| force | bool | If False, an exception is raised if the table does not exist. |
False |
drop_view
drop_view(['self', 'name', '*', 'database=None', 'force=False'])
Drop a view.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the view to drop. | required |
| database | str | None | Name of the database where the view exists, if not the default. | None |
| force | bool | If False, an exception is raised if the view does not exist. |
False |
execute
execute(['self', 'expr', 'params=None', "limit='default'", '**_'])
Execute an expression.
from_connection
from_connection(['cls', 'con', 'extensions=None'])
Create an Ibis client from an existing connection to a DuckDB database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| con | duckdb.DuckDBPyConnection | An existing connection to a DuckDB database. | required |
| extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
get_schema
get_schema(['self', 'table_name', '*', 'catalog=None', 'database=None'])
Compute the schema of a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
| catalog | str | None | Catalog name | None |
| database | str | None | Database name | None |
Returns
| Name | Type | Description |
|---|---|---|
| sch.Schema | Ibis schema |
has_operation
has_operation(['cls', 'operation'])
Return whether the backend implements support for operation.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| operation | type[ops.Value] | A class corresponding to an operation. | required |
Returns
| Name | Type | Description |
|---|---|---|
| bool | Whether the backend implements the operation. |
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
Trueinsert
insert(['self', 'table_name', 'obj', 'database=None', 'overwrite=False'])
Insert data into a table.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | The name of the table to which data needs will be inserted | required |
| obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(['self', 'like=None'])
list_databases
list_databases(['self', 'like=None', 'catalog=None'])
List existing databases in the current connection.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
| catalog | str | None | The catalog to list databases from. If None, the current catalog is searched. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(['self', 'like=None', 'database=None'])
List tables and views.
schema to refer to database hierarchy.
A collection of tables is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | Regex to filter by table/view name. | None |
| database | tuple[str, str] | str | None | Database location. If not passed, uses the current database. By default uses the current database (self.current_database) and catalog (self.current_catalog). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database", or as a tuple of strings ("catalog", "database"). |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | List of table and view names. |
Examples
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
['foo']
>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
['bar', 'foo']
>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]
>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
['baz']load_extension
load_extension(['self', 'extension', 'force_install=False'])
Install and load a duckdb extension by name or path.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| extension | str | The extension name or path. | required |
| force_install | bool | Force reinstallation of the extension. | False |
raw_sql
raw_sql(['self', 'query', '**kwargs'])
read_csv
read_csv(['self', 'source_list', 'table_name=None', 'columns=None', 'types=None', '**kwargs'])
Register a CSV file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source_list | str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| columns | Mapping[str, str | dt.DataType] | None | An optional mapping of all column names to their types. | None |
| types | Mapping[str, str | dt.DataType] | None | An optional mapping of a subset of column names to their types. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
Examples
Generate some data
>>> import tempfile
>>> data = b'''
... lat,lon,geom
... 1.0,2.0,POINT (1 2)
... 2.0,3.0,POINT (2 3)
... '''
>>> with tempfile.NamedTemporaryFile(delete=False) as f:
... nbytes = f.write(data)Import Ibis
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> con = ibis.duckdb.connect()Read the raw CSV file
>>> t = con.read_csv(f.name)
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ string │
├─────────┼─────────┼─────────────┤
│ 1.0 │ 2.0 │ POINT (1 2) │
│ 2.0 │ 3.0 │ POINT (2 3) │
└─────────┴─────────┴─────────────┘Load the spatial extension and read the CSV file again, using specific column types
>>> con.load_extension("spatial")
>>> t = con.read_csv(f.name, types={"geom": "geometry"})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ geospatial:geometry │
├─────────┼─────────┼──────────────────────┤
│ 1.0 │ 2.0 │ <POINT (1 2)> │
│ 2.0 │ 3.0 │ <POINT (2 3)> │
└─────────┴─────────┴──────────────────────┘read_delta
read_delta(['self', 'source_table', 'table_name=None', '**kwargs'])
Register a Delta Lake table as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source_table | str | The data source. Must be a directory containing a Delta Lake table. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_geo
read_geo(['self', 'source', 'table_name=None', '**kwargs'])
Register a GEO file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source | str | The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_in_memory
read_in_memory(['self', 'source', 'table_name=None'])
Register an in-memory table object in the current database.
Supported objects include pandas DataFrame, a Polars DataFrame/LazyFrame, or a PyArrow Table or RecordBatchReader.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source | pd.DataFrame | pa.Table | pa.RecordBatchReader | pl.DataFrame | pl.LazyFrame | The data source. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_json
read_json(['self', 'source_list', 'table_name=None', 'columns=None', '**kwargs'])
Read newline-delimited JSON into an ibis table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source_list | str | list[str] | tuple[str] | File or list of files | required |
| table_name | str | None | Optional table name | None |
| columns | Mapping[str, str] | None | Optional mapping from string column name to duckdb type string. | None |
| **kwargs | Additional keyword arguments passed to DuckDB’s read_json_auto function. See https://duckdb.org/docs/data/json/overview.html#json-loading for parameters and more information about reading JSON. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | An ibis table expression |
read_mysql
read_mysql(['self', 'uri', '*', 'catalog', 'table_name=None'])
Register a table from a MySQL instance into a DuckDB table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| uri | str | A mysql URI of the form mysql://user:password@host:port/database |
required |
| catalog | str | User-defined alias given to the MySQL database that is being attached to DuckDB | required |
| table_name | str | None | The table to read | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_parquet
read_parquet(['self', 'source_list', 'table_name=None', '**kwargs'])
Register a parquet file as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source_list | str | Iterable[str] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_postgres
read_postgres(['self', 'uri', '*', 'table_name=None', "database='public'"])
Register a table from a postgres instance into a DuckDB table.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| uri | str | A postgres URI of the form postgres://user:password@host:port |
required |
| table_name | str | None | The table to read | None |
| database | str | PostgreSQL database (schema) where table_name resides |
'public' |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_sqlite
read_sqlite(['self', 'path', '*', 'table_name=None'])
Register a table from a SQLite database into a DuckDB table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The path to the SQLite database | required |
| table_name | str | None | The table to read | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> with sqlite3.connect("/tmp/sqlite.db") as con:
... con.execute("DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... "INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
... )
<...>
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite(path="/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
└───────┴────────┘reconnect
reconnect(['self'])
Reconnect to the database already configured with connect.
register
register(['self', 'source', 'table_name=None', '**kwargs'])
Register a data source as a table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| source | str | Path | Any | The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
register_filesystem
register_filesystem(['self', 'filesystem'])
Register an fsspec filesystem object with DuckDB.
This allow a user to read from any fsspec compatible filesystem using read_csv, read_parquet, read_json, etc.
Creating an fsspec filesystem requires that the corresponding backend-specific fsspec helper library is installed.
e.g. to connect to Google Cloud Storage, gcsfs must be installed.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filesystem | AbstractFileSystem | The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. | required |
Examples
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
... "gcs://ibis-examples/data/band_members.csv.gz",
... table_name="band_members",
... )
>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │
└────────┴─────────┘register_options
register_options(['cls'])
Register custom backend options.
rename_table
rename_table(['self', 'old_name', 'new_name'])
Rename an existing table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| old_name | str | The old name of the table. | required |
| new_name | str | The new name of the table. | required |
sql
sql(['self', 'query', 'schema=None', 'dialect=None'])
table
table(['self', 'name', 'database=None'])
Construct a table expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Table name | required |
| database | str | None | Database name | None |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
to_csv
to_csv(['self', 'expr', 'path', '*', 'params=None', 'header=True', '**kwargs'])
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
| path | str | Path | The data source. A string or Path to the CSV file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| header | bool | Whether to write the column names as the first line of the CSV file. | True |
| **kwargs | Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters | {} |
to_delta
to_delta(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
| path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_pandas
to_pandas(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.
This method is a wrapper around execute.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**kwargs'])
Execute an Ibis expression and return an iterator of pandas DataFrames.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. |
None |
| chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Iterator[pd.DataFrame] | An iterator of pandas DataFrames. |
to_parquet
to_parquet(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| path | str | Path | The data source. A string or Path to the parquet file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files for details. | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")Partition on multiple columns.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))to_parquet_dir
to_parquet_dir(['self', 'expr', 'directory', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
| https | required |
to_polars
to_polars(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to polars. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(['self', 'expr', '*', 'params=None', 'limit=None', '**_'])
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to pyarrow | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**_'])
Return a stream of record batches.
The returned RecordBatchReader contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
chunk_size argument is passed.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression | required |
| params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
| limit | int | str | None | Limit the result to this number of rows | None |
| chunk_size | int | The number of rows to fetch per batch | 1000000 |
to_torch
to_torch(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an expression and return results as a dictionary of torch tensors.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(['self', 'name', 'database=None'])
Delete all rows from a table.
schema to refer to database hierarchy.
A collection of tables is referred to as a database. A collection of database is referred to as a catalog. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Table name | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |