import ibis
= ibis.connect("duckdb://local.ddb") con
ibis.connect
URL format
DuckDB
Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb
extra:
pip install 'ibis-framework[duckdb]'
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
= ibis.duckdb.connect() con
Connect to, or create, a local DuckDB file
= ibis.duckdb.connect("mydb.duckdb") con
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters
do_connect
do_connect(['self', "database=':memory:'", 'read_only=False', 'extensions=None', '**config'])
Create an Ibis client connected to a DuckDB database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
database | str | Path | Path to a duckdb database. | ':memory:' |
read_only | bool | Whether the database is read-only. | False |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
config | Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at 0x...>
In addition to ibis.duckdb.connect
, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect
:
Given an empty path, ibis.connect
will connect to an ephemeral, in-memory database.
= ibis.connect("duckdb://") con
MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md:
or motherduck:
. ibis
will trigger the authentication prompt in-browser.
import ibis
= ibis.duckdb.connect("md:") con
Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables
)
con.list_tables()open the SSO authorization page in your default browser.
Attempting to automatically 1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅as an environment variable to avoid having to log in again:
You can store it ='****************'
$ export motherduck_token
'penguins'] [
Geospatial
The DuckDB backend has experimental support for geospatial operations.
To enable geospatial operations, install the geospatial
extra or otherwise install the additional required dependencies:
See read_geo
for some tips on reading in geospatial data.
pip install 'ibis-framework[geospatial]'
conda install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
mamba install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
duckdb.Backend
attach
attach(['self', 'path', 'name=None', 'read_only=False'])
Attach another DuckDB database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | Path to the database to attach. | required |
name | str | None | Name to attach the database as. Defaults to the basename of path . |
None |
read_only | bool | Whether to attach the database as read-only. | False |
attach_sqlite
attach_sqlite(['self', 'path', 'overwrite=False', 'all_varchar=False'])
Attach a SQLite database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database. | required |
overwrite | bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
all_varchar | bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> with sqlite3.connect("/tmp/attach_sqlite.db") as con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
't'] [
compile
compile(['self', 'expr', 'limit=None', 'params=None', 'pretty=False'])
Compile an Ibis expression to a SQL string.
connect
connect(['self', '*args', '**kwargs'])
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Name | Type | Description |
---|---|---|
BaseBackend | An instance of the backend |
create_database
create_database(['self', 'name', 'catalog=None', 'force=False'])
Create a database named name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the database to create. | required |
catalog | str | None | Name of the catalog in which to create the database. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database exists. |
False |
create_table
create_table(['self', 'name', 'obj=None', '*', 'schema=None', 'database=None', 'temp=False', 'overwrite=False'])
Create a table in DuckDB.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to create | required |
obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but at least one of obj or schema must be specified |
None |
schema | SchemaLike | None | The schema of the table to create; optional, but at least one of obj or schema must be specified |
None |
database | str | None | The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
temp | bool | Create a temporary table | False |
overwrite | bool | If True , replace the table if it already exists, otherwise fail if the table exists |
False |
create_view
create_view(['self', 'name', 'obj', '*', 'database=None', 'overwrite=False'])
Create a new view from an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the new view. | required |
obj | ir.Table | An Ibis table expression that will be used to create the view. | required |
database | str | None | Name of the database where the view will be created, if not provided the database’s default is used. | None |
overwrite | bool | Whether to clobber an existing view with the same name | False |
Returns
Name | Type | Description |
---|---|---|
Table | The view that was created. |
detach
detach(['self', 'name'])
Detach a database from the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the database to detach. | required |
disconnect
disconnect(['self'])
Close the connection to the backend.
drop_database
drop_database(['self', 'name', 'catalog=None', 'force=False'])
Drop the database with name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the schema to drop. | required |
catalog | str | None | Name of the catalog to drop the database from. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database does not exist. |
False |
drop_table
drop_table(['self', 'name', 'database=None', 'force=False'])
Drop a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to drop. | required |
database | str | None | Name of the database where the table exists, if not the default. | None |
force | bool | If False , an exception is raised if the table does not exist. |
False |
drop_view
drop_view(['self', 'name', '*', 'database=None', 'force=False'])
Drop a view.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the view to drop. | required |
database | str | None | Name of the database where the view exists, if not the default. | None |
force | bool | If False , an exception is raised if the view does not exist. |
False |
execute
execute(['self', 'expr', 'params=None', "limit='default'", '**_'])
Execute an expression.
from_connection
from_connection(['cls', 'con', 'extensions=None'])
Create an Ibis client from an existing connection to a DuckDB database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
con | duckdb.DuckDBPyConnection | An existing connection to a DuckDB database. | required |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
get_schema
get_schema(['self', 'table_name', '*', 'catalog=None', 'database=None'])
Compute the schema of a table
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
catalog | str | None | Catalog name | None |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
sch.Schema | Ibis schema |
has_operation
has_operation(['cls', 'operation'])
Return whether the backend implements support for operation
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
operation | type[ops.Value] | A class corresponding to an operation. | required |
Returns
Name | Type | Description |
---|---|---|
bool | Whether the backend implements the operation. |
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True
insert
insert(['self', 'table_name', 'obj', 'database=None', 'overwrite=False'])
Insert data into a table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name | str | The name of the table to which data needs will be inserted | required |
obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(['self', 'like=None'])
list_databases
list_databases(['self', 'like=None', 'catalog=None'])
List existing databases in the current connection.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
catalog | str | None | The catalog to list databases from. If None , the current catalog is searched. |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(['self', 'like=None', 'database=None'])
List tables and views.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | Regex to filter by table/view name. | None |
database | tuple[str, str] | str | None | Database location. If not passed, uses the current database. By default uses the current database (self.current_database ) and catalog (self.current_catalog ). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database" , or as a tuple of strings ("catalog", "database") . |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | List of table and view names. |
Examples
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
'foo']
[>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
'bar', 'foo']
[>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
'baz'] [
load_extension
load_extension(['self', 'extension', 'force_install=False'])
Install and load a duckdb extension by name or path.
Parameters
Name | Type | Description | Default |
---|---|---|---|
extension | str | The extension name or path. | required |
force_install | bool | Force reinstallation of the extension. | False |
raw_sql
raw_sql(['self', 'query', '**kwargs'])
read_csv
read_csv(['self', 'source_list', 'table_name=None', 'columns=None', 'types=None', '**kwargs'])
Register a CSV file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
columns | Mapping[str, str | dt.DataType] | None | An optional mapping of all column names to their types. | None |
types | Mapping[str, str | dt.DataType] | None | An optional mapping of a subset of column names to their types. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
Examples
Generate some data
>>> import tempfile
>>> data = b'''
... lat,lon,geom
... 1.0,2.0,POINT (1 2)
... 2.0,3.0,POINT (2 3)
... '''
>>> with tempfile.NamedTemporaryFile(delete=False) as f:
= f.write(data) ... nbytes
Import Ibis
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> con = ibis.duckdb.connect()
Read the raw CSV file
>>> t = con.read_csv(f.name)
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ string │
├─────────┼─────────┼─────────────┤1.0 │ 2.0 │ POINT (1 2) │
│ 2.0 │ 3.0 │ POINT (2 3) │
│ └─────────┴─────────┴─────────────┘
Load the spatial
extension and read the CSV file again, using specific column types
>>> con.load_extension("spatial")
>>> t = con.read_csv(f.name, types={"geom": "geometry"})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ geospatial:geometry │
├─────────┼─────────┼──────────────────────┤1.0 │ 2.0 │ <POINT (1 2)> │
│ 2.0 │ 3.0 │ <POINT (2 3)> │
│ └─────────┴─────────┴──────────────────────┘
read_delta
read_delta(['self', 'source_table', 'table_name=None', '**kwargs'])
Register a Delta Lake table as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_table | str | The data source. Must be a directory containing a Delta Lake table. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_geo
read_geo(['self', 'source', 'table_name=None', '**kwargs'])
Register a GEO file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | str | The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_in_memory
read_in_memory(['self', 'source', 'table_name=None'])
Register an in-memory table object in the current database.
Supported objects include pandas DataFrame, a Polars DataFrame/LazyFrame, or a PyArrow Table or RecordBatchReader.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | pd.DataFrame | pa.Table | pa.RecordBatchReader | pl.DataFrame | pl.LazyFrame | The data source. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_json
read_json(['self', 'source_list', 'table_name=None', 'columns=None', '**kwargs'])
Read newline-delimited JSON into an ibis table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | list[str] | tuple[str] | File or list of files | required |
table_name | str | None | Optional table name | None |
columns | Mapping[str, str] | None | Optional mapping from string column name to duckdb type string. | None |
**kwargs | Additional keyword arguments passed to DuckDB’s read_json_auto function. See https://duckdb.org/docs/data/json/overview.html#json-loading for parameters and more information about reading JSON. |
{} |
Returns
Name | Type | Description |
---|---|---|
Table | An ibis table expression |
read_mysql
read_mysql(['self', 'uri', '*', 'catalog', 'table_name=None'])
Register a table from a MySQL instance into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A mysql URI of the form mysql://user:password@host:port/database |
required |
catalog | str | User-defined alias given to the MySQL database that is being attached to DuckDB | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_parquet
read_parquet(['self', 'source_list', 'table_name=None', '**kwargs'])
Register a parquet file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | Iterable[str] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_postgres
read_postgres(['self', 'uri', '*', 'table_name=None', "database='public'"])
Register a table from a postgres instance into a DuckDB table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A postgres URI of the form postgres://user:password@host:port |
required |
table_name | str | None | The table to read | None |
database | str | PostgreSQL database (schema) where table_name resides |
'public' |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_sqlite
read_sqlite(['self', 'path', '*', 'table_name=None'])
Register a table from a SQLite database into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> with sqlite3.connect("/tmp/sqlite.db") as con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite(path="/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤1 │ a │
│ 2 │ b │
│ 3 │ c │
│ └───────┴────────┘
reconnect
reconnect(['self'])
Reconnect to the database already configured with connect.
register
register(['self', 'source', 'table_name=None', '**kwargs'])
Register a data source as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | str | Path | Any | The data source(s). May be a path to a file or directory of parquet/csv files, an iterable of parquet or CSV files, a pandas dataframe, a pyarrow table or dataset, or a postgres URI. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading functions for CSV or parquet. See https://duckdb.org/docs/data/csv and https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
register_filesystem
register_filesystem(['self', 'filesystem'])
Register an fsspec
filesystem object with DuckDB.
This allow a user to read from any fsspec
compatible filesystem using read_csv
, read_parquet
, read_json
, etc.
Creating an fsspec
filesystem requires that the corresponding backend-specific fsspec
helper library is installed.
e.g. to connect to Google Cloud Storage, gcsfs
must be installed.
Parameters
Name | Type | Description | Default |
---|---|---|---|
filesystem | AbstractFileSystem | The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. | required |
Examples
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
"gcs://ibis-examples/data/band_members.csv.gz",
... ="band_members",
... table_name
... )>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │ └────────┴─────────┘
register_options
register_options(['cls'])
Register custom backend options.
rename_table
rename_table(['self', 'old_name', 'new_name'])
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name | str | The old name of the table. | required |
new_name | str | The new name of the table. | required |
sql
sql(['self', 'query', 'schema=None', 'dialect=None'])
table
table(['self', 'name', 'database=None'])
Construct a table expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
Table | Table expression |
to_csv
to_csv(['self', 'expr', 'path', '*', 'params=None', 'header=True', '**kwargs'])
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
path | str | Path | The data source. A string or Path to the CSV file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
header | bool | Whether to write the column names as the first line of the CSV file. | True |
**kwargs | Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters | {} |
to_delta
to_delta(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_geo
to_geo(['self', 'expr', 'path', '*', 'format', 'layer_creation_options=None', 'params=None', 'limit=None', '**kwargs'])
Write the results of executing expr
to a geospatial output.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | Ibis expression to execute and persist to geospatial output. | required |
path | str | Path | A string or Path to the desired output file location. | required |
format | str | The format of the geospatial output. One of GDAL’s supported vector formats. The list of vector formats is located here: https://gdal.org/en/latest/drivers/vector/index.html | required |
layer_creation_options | Mapping[str, Any] | None | A mapping of layer creation options. | None |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Additional keyword arguments passed to the DuckDB COPY command. |
{} |
Examples
>>> import os
>>> import tempfile
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _
Load some geospatial data
>>> con = ibis.duckdb.connect()
>>> zones = ibis.examples.zones.fetch(backend=con)
>>> zones[["zone", "geom"]].head()
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ zone ┃ geom ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ geospatial:geometry │
├───────────────────────────────────────┼──────────────────────────────────────┤<POLYGON ((933100.918 192536.086, │
│ │ 933091.011 192572.175, 933088.585 │
│ Newark Airport │ 192604.9...> │
│ │ <MULTIPOLYGON (((1033269.244 │
│ │ 172126.008, 1033439.643 170883.946, │
│ Jamaica Bay │ 1033473.265...> │
│ │ <POLYGON ((1026308.77 256767.698, │
│ │ /Pelham Gardens │ 1026495.593 256638.616, 1026567.23 │
│ Allerton256589....> │
│ │ <POLYGON ((992073.467 203714.076, │
│ │ 992068.667 203711.502, 992061.716 │
│ Alphabet City │ 203711.7...> │
│ │ <POLYGON ((935843.31 144283.336, │
│ │ 936046.565 144173.418, 936387.922 │
│ Arden Heights │ 143967.75...> │
│ │ └───────────────────────────────────────┴──────────────────────────────────────┘
Write to a GeoJSON file
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.geojson"),
... pathformat="geojson",
... ... )
Write to a Shapefile
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.shp"),
... pathformat="ESRI Shapefile",
... ... )
to_pandas
to_pandas(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**kwargs'])
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Iterator[pd.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
path | str | Path | The data source. A string or Path to the parquet file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files for details. | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")
Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")
Partition on multiple columns.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))
to_parquet_dir
to_parquet_dir(['self', 'expr', 'directory', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
https | required |
to_polars
to_polars(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to polars. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(['self', 'expr', '*', 'params=None', 'limit=None', '**_'])
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to pyarrow | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**_'])
Return a stream of record batches.
The returned RecordBatchReader
contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression | required |
params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
limit | int | str | None | Limit the result to this number of rows | None |
chunk_size | int | The number of rows to fetch per batch | 1000000 |
to_torch
to_torch(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Name | Type | Description |
---|---|---|
dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(['self', 'name', 'database=None'])
Delete all rows from a table.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |