import ibis
= ibis.connect("duckdb://local.ddb") con
ibis.connect
URL format
DuckDB
Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb
extra:
pip install 'ibis-framework[duckdb]'
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
= ibis.duckdb.connect() con
Connect to, or create, a local DuckDB file
= ibis.duckdb.connect("mydb.duckdb") con
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters
do_connect
do_connect(self, database=':memory:', read_only=False, extensions=None, **config)
Create an Ibis client connected to a DuckDB database.
Before, we had special handling if the user passed the temp_directory
parameter, setting a custom default, and creating intermediate directories if necessary. Now, we do nothing, and just pass the value directly to DuckDB. You may need to add Path(your_temp_dir).mkdir(exists_ok=True, parents=True)
to your code to maintain the old behavior.
Parameters
Name | Type | Description | Default |
---|---|---|---|
database | str | Path | Path to a duckdb database. | ':memory:' |
read_only | bool | Whether the database is read-only. | False |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
config | Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at 0x...>
In addition to ibis.duckdb.connect
, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect
:
Given an empty path, ibis.connect
will connect to an ephemeral, in-memory database.
= ibis.connect("duckdb://") con
Cloud bucket reads
DuckDB has a secret management system that is designed to support reading data from both public and private cloud blob storage systems like Amazon’s S3.
To make your life easier, you should probably start with the CREDENTIAL_CHAIN
provider, which mimics the default behavior of the aws
CLI and SDKs.
Ibis doesn’t have an Ibis-native API for dealing with DuckDB’s secrets, but you can of course run raw_sql
to set them up:
"CREATE SECRET s3 (TYPE S3, PROVIDER CREDENTIAL_CHAIN)") con.raw_sql(
<duckdb.duckdb.DuckDBPyConnection at 0x7ffff0ed5a70>
Assuming you’ve got the appropriate authorizations in AWS, this should allow DuckDB to read from any bucket you’re authorized to access.
MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md:
or motherduck:
. ibis
will trigger the authentication prompt in-browser.
import ibis
= ibis.duckdb.connect("md:") con
Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables
)
con.list_tables()open the SSO authorization page in your default browser.
Attempting to automatically 1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅as an environment variable to avoid having to log in again:
You can store it ='****************'
$ export motherduck_token
'penguins'] [
Geospatial
The DuckDB backend has experimental support for geospatial operations.
To enable geospatial operations, install the geospatial
extra or otherwise install the additional required dependencies:
See read_geo
for some tips on reading in geospatial data.
pip install 'ibis-framework[geospatial]'
conda install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
mamba install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
duckdb.Backend
attach
attach(self, path, name=None, read_only=False)
Attach another DuckDB database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | Path to the database to attach. | required |
name | str | None | Name to attach the database as. Defaults to the basename of path . |
None |
read_only | bool | Whether to attach the database as read-only. | False |
attach_sqlite
attach_sqlite(self, path, overwrite=False, all_varchar=False)
Attach a SQLite database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database. | required |
overwrite | bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
all_varchar | bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> con = sqlite3.connect("/tmp/attach_sqlite.db")
>>> with con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
't'] [
compile
compile(self, expr, /, *, limit=None, params=None, pretty=False)
Compile an expression to a SQL string.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | An ibis expression to compile. | required |
limit | str | int | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
params | Mapping[ir.Expr, Any] | None | Mapping of scalar parameter expressions to value. | None |
pretty | bool | Pretty print the SQL query during compilation. | False |
Returns
Name | Type | Description |
---|---|---|
str | Compiled expression |
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Name | Type | Description |
---|---|---|
BaseBackend | An instance of the backend |
create_database
create_database(self, name, /, *, catalog=None, force=False)
Create a database named name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the database to create. | required |
catalog | str | None | Name of the catalog in which to create the database. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database exists. |
False |
create_table
create_table(self, name, /, obj=None, *, schema=None, database=None, temp=False, overwrite=False)
Create a table in DuckDB.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to create | required |
obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but at least one of obj or schema must be specified |
None |
schema | SchemaLike | None | The schema of the table to create; optional, but at least one of obj or schema must be specified |
None |
database | str | None | The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
temp | bool | Create a temporary table | False |
overwrite | bool | If True , replace the table if it already exists, otherwise fail if the table exists |
False |
create_view
create_view(self, name, /, obj, *, database=None, overwrite=False)
Create a view from an Ibis expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the view to create. | required |
obj | ir.Table | The Ibis expression to create the view from. | required |
database | str | None | The database that the view should be created in. | None |
overwrite | bool | If True , replace an existing view with the same name. |
False |
Returns
Name | Type | Description |
---|---|---|
ir.Table | A table expression representing the view. |
detach
detach(self, name)
Detach a database from the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the database to detach. | required |
disconnect
disconnect(self)
Disconnect from the backend.
drop_database
drop_database(self, name, /, *, catalog=None, force=False)
Drop the database with name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the schema to drop. | required |
catalog | str | None | Name of the catalog to drop the database from. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database does not exist. |
False |
drop_table
drop_table(self, name, /, *, database=None, force=False)
Drop a table from the backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the table to drop | required |
database | tuple[str, str] | str | None | The database that the table is located in. | None |
force | bool | If True , do not raise an error if the table does not exist. |
False |
drop_view
drop_view(self, name, /, *, database=None, force=False)
Drop a view from the backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the view to drop. | required |
database | str | None | The database that the view is located in. | None |
force | bool | If True , do not raise an error if the view does not exist. |
False |
execute
execute(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an expression.
from_connection
from_connection(cls, con, /, *, extensions=None)
Create an Ibis client from an existing connection to a DuckDB database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
con | duckdb.DuckDBPyConnection | An existing connection to a DuckDB database. | required |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
get_schema
get_schema(self, table_name, *, catalog=None, database=None)
Compute the schema of a table
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
catalog | str | None | Catalog name | None |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
sch.Schema | Ibis schema |
has_operation
has_operation(cls, operation, /)
Return whether the backend supports the given operation.
Parameters
Name | Type | Description | Default |
---|---|---|---|
operation | type[ops.Value] | Operation type, a Python class object. | required |
insert
insert(self, name, /, obj, *, database=None, overwrite=False)
Insert data into a table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the table to which data needs will be inserted | required |
obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(self, *, like=None)
list_databases
list_databases(self, *, like=None, catalog=None)
List existing databases in the current connection.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
catalog | str | None | The catalog to list databases from. If None , the current catalog is searched. |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(self, *, like=None, database=None)
List tables and views.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | Regex to filter by table/view name. | None |
database | tuple[str, str] | str | None | Database location. If not passed, uses the current database. By default uses the current database (self.current_database ) and catalog (self.current_catalog ). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database" , or as a tuple of strings ("catalog", "database") . |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | List of table and view names. |
Examples
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
'foo']
[>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
'bar', 'foo']
[>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
'baz'] [
load_extension
load_extension(self, extension, force_install=False)
Install and load a duckdb extension by name or path.
Parameters
Name | Type | Description | Default |
---|---|---|---|
extension | str | The extension name or path. | required |
force_install | bool | Force reinstallation of the extension. | False |
raw_sql
raw_sql(self, query, **kwargs)
read_csv
read_csv(self, paths, /, *, table_name=None, columns=None, types=None, **kwargs)
Register a CSV file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
paths | str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
columns | Mapping[str, str | dt.DataType] | None | An optional mapping of all column names to their types. | None |
types | Mapping[str, str | dt.DataType] | None | An optional mapping of a subset of column names to their types. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
Examples
Generate some data
>>> import tempfile
>>> data = b'''
... lat,lon,geom
... 1.0,2.0,POINT (1 2)
... 2.0,3.0,POINT (2 3)
... '''
>>> with tempfile.NamedTemporaryFile(delete=False) as f:
= f.write(data) ... nbytes
Import Ibis
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> con = ibis.duckdb.connect()
Read the raw CSV file
>>> t = con.read_csv(f.name)
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ string │
├─────────┼─────────┼─────────────┤1.0 │ 2.0 │ POINT (1 2) │
│ 2.0 │ 3.0 │ POINT (2 3) │
│ └─────────┴─────────┴─────────────┘
Load the spatial
extension and read the CSV file again, using specific column types
>>> con.load_extension("spatial")
>>> t = con.read_csv(f.name, types={"geom": "geometry"})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ geospatial:geometry │
├─────────┼─────────┼──────────────────────┤1.0 │ 2.0 │ <POINT (1 2)> │
│ 2.0 │ 3.0 │ <POINT (2 3)> │
│ └─────────┴─────────┴──────────────────────┘
read_delta
read_delta(self, path, /, *, table_name=None, **kwargs)
Register a Delta Lake table as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The data source. Must be a directory containing a Delta Lake table. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
kwargs | Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_geo
read_geo(self, path, /, *, table_name=None, **kwargs)
Register a geospatial data file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_json
read_json(self, paths, /, *, table_name=None, columns=None, **kwargs)
Read newline-delimited JSON into an ibis table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
paths | str | list[str] | tuple[str] | File or list of files | required |
table_name | str | None | Optional table name | None |
columns | Mapping[str, str] | None | Optional mapping from string column name to duckdb type string. | None |
**kwargs | Additional keyword arguments passed to DuckDB’s read_json_auto function. See https://duckdb.org/docs/data/json/overview.html#json-loading for parameters and more information about reading JSON. |
{} |
Returns
Name | Type | Description |
---|---|---|
Table | An ibis table expression |
read_mysql
read_mysql(self, uri, /, *, catalog, table_name=None)
Register a table from a MySQL instance into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A mysql URI of the form mysql://user:password@host:port/database |
required |
catalog | str | User-defined alias given to the MySQL database that is being attached to DuckDB | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_parquet
read_parquet(self, paths, /, *, table_name=None, **kwargs)
Register a parquet file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
paths | str | Path | Iterable[str | Path] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_postgres
read_postgres(self, uri, /, *, table_name=None, database='public')
Register a table from a postgres instance into a DuckDB table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A postgres URI of the form postgres://user:password@host:port |
required |
table_name | str | None | The table to read | None |
database | str | PostgreSQL database (schema) where table_name resides |
'public' |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_sqlite
read_sqlite(self, path, /, *, table_name=None)
Register a table from a SQLite database into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> con = sqlite3.connect("/tmp/sqlite.db")
>>> with con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite("/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤1 │ a │
│ 2 │ b │
│ 3 │ c │
│ └───────┴────────┘
reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_filesystem
register_filesystem(self, filesystem)
Register an fsspec
filesystem object with DuckDB.
This allow a user to read from any fsspec
compatible filesystem using read_csv
, read_parquet
, read_json
, etc.
Creating an fsspec
filesystem requires that the corresponding backend-specific fsspec
helper library is installed.
e.g. to connect to Google Cloud Storage, gcsfs
must be installed.
Parameters
Name | Type | Description | Default |
---|---|---|---|
filesystem | AbstractFileSystem | The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. | required |
Examples
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
"gcs://ibis-examples/data/band_members.csv.gz",
... ="band_members",
... table_name
... )>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │ └────────┴─────────┘
register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name | str | The old name of the table. | required |
new_name | str | The new name of the table. | required |
sql
sql(self, query, /, *, schema=None, dialect=None)
Create an Ibis table expression from a SQL query.
Parameters
Name | Type | Description | Default |
---|---|---|---|
query | str | A SQL query string | required |
schema | SchemaLike | None | The schema of the query. If not provided, Ibis will try to infer the schema of the query. | None |
dialect | str | None | The SQL dialect of the query. If not provided, the backend’s dialect is assumed. This argument can be useful when the query is written in a different dialect from the backend. | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The table expression representing the query |
table
table(self, name, /, *, database=None)
Construct a table expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
Table | Table expression |
to_csv
to_csv(self, expr, /, path, *, params=None, header=True, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
path | str | Path | The data source. A string or Path to the CSV file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
header | bool | Whether to write the column names as the first line of the CSV file. | True |
kwargs | Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters | {} |
to_delta
to_delta(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_geo
to_geo(self, expr, /, path, *, format, layer_creation_options=None, params=None, limit=None, **kwargs)
Write the results of executing expr
to a geospatial output.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | Ibis expression to execute and persist to geospatial output. | required |
path | str | Path | A string or Path to the desired output file location. | required |
format | str | The format of the geospatial output. One of GDAL’s supported vector formats. The list of vector formats is located here: https://gdal.org/en/latest/drivers/vector/index.html | required |
layer_creation_options | Mapping[str, Any] | None | A mapping of layer creation options. | None |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Additional keyword arguments passed to the DuckDB COPY command. |
{} |
Examples
>>> import os
>>> import tempfile
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _
Load some geospatial data
>>> con = ibis.duckdb.connect()
>>> zones = ibis.examples.zones.fetch(backend=con)
>>> zones[["zone", "geom"]].head()
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ zone ┃ geom ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ geospatial:geometry │
├───────────────────────────────────────┼──────────────────────────────────────┤<POLYGON ((933100.918 192536.086, │
│ │ 933091.011 192572.175, 933088.585 │
│ Newark Airport │ 192604.9...> │
│ │ <MULTIPOLYGON (((1033269.244 │
│ │ 172126.008, 1033439.643 170883.946, │
│ Jamaica Bay │ 1033473.265...> │
│ │ <POLYGON ((1026308.77 256767.698, │
│ │ /Pelham Gardens │ 1026495.593 256638.616, 1026567.23 │
│ Allerton256589....> │
│ │ <POLYGON ((992073.467 203714.076, │
│ │ 992068.667 203711.502, 992061.716 │
│ Alphabet City │ 203711.7...> │
│ │ <POLYGON ((935843.31 144283.336, │
│ │ 936046.565 144173.418, 936387.922 │
│ Arden Heights │ 143967.75...> │
│ │ └───────────────────────────────────────┴──────────────────────────────────────┘
Write to a GeoJSON file
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.geojson"),
... pathformat="geojson",
... ... )
Write to a Shapefile
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.shp"),
... pathformat="ESRI Shapefile",
... ... )
to_json
to_json(self, expr, /, path, *, compression='auto', dateformat=None, timestampformat=None)
Write the results of expr
to a json file of [{column -> value}, …] objects.
This method is eager and will execute the associated expression immediately. See https://duckdb.org/docs/sql/statements/copy.html#json-options for more info.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | URLs such as S3 buckets are supported. | required |
compression | Literal['auto', 'none', 'gzip', 'zstd'] | Compression codec to use. One of “auto”, “none”, “gzip”, “zstd”. | 'auto' |
dateformat | str | None | Date format string. | None |
timestampformat | str | None | Timestamp format string. | None |
to_pandas
to_pandas(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, /, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py . |
None |
chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Iterator[pd.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
path | str | Path | The data source. A string or Path to the parquet file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files for details. | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")
Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")
Partition on multiple columns.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))
to_parquet_dir
to_parquet_dir(self, expr, /, directory, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
https | required |
to_polars
to_polars(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to polars. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to pyarrow | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, /, *, params=None, limit=None, chunk_size=1000000, **_)
Return a stream of record batches.
The returned RecordBatchReader
contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression | required |
params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
limit | int | str | None | Limit the result to this number of rows | None |
chunk_size | int | The number of rows to fetch per batch | 1000000 |
to_torch
to_torch(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Name | Type | Description |
---|---|---|
dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(self, name, /, *, database=None)
Delete all rows from a table.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |