import ibis
= ibis.connect("duckdb://local.ddb") con
ibis.connect
URL format
DuckDB
Install
Install Ibis and dependencies for the DuckDB backend:
Install with the duckdb
extra:
pip install 'ibis-framework[duckdb]'
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
conda install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Install for DuckDB:
mamba install -c conda-forge ibis-duckdb
And connect:
import ibis
= ibis.duckdb.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.duckdb.connect
Connect to an in-memory database:
= ibis.duckdb.connect() con
Connect to, or create, a local DuckDB file
= ibis.duckdb.connect("mydb.duckdb") con
ibis.duckdb.connect
is a thin wrapper around ibis.backends.duckdb.Backend.do_connect
.
Connection Parameters
do_connect
do_connect(self, database=':memory:', read_only=False, extensions=None, **config)
Create an Ibis client connected to a DuckDB database.
Before, we had special handling if the user passed the temp_directory
parameter, setting a custom default, and creating intermediate directories if necessary. Now, we do nothing, and just pass the value directly to DuckDB. You may need to add Path(your_temp_dir).mkdir(exists_ok=True, parents=True)
to your code to maintain the old behavior.
Parameters
Name | Type | Description | Default |
---|---|---|---|
database | str | Path | Path to a duckdb database. | ':memory:' |
read_only | bool | Whether the database is read-only. | False |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
config | Any | DuckDB configuration parameters. See the DuckDB configuration documentation for possible configuration values. | {} |
Examples
>>> import ibis
>>> ibis.duckdb.connect(threads=4, memory_limit="1GB")
<ibis.backends.duckdb.Backend object at 0x...>
In addition to ibis.duckdb.connect
, you can also connect to DuckDB by passing a properly-formatted DuckDB connection URL to ibis.connect
:
Given an empty path, ibis.connect
will connect to an ephemeral, in-memory database.
= ibis.connect("duckdb://") con
MotherDuck
The DuckDB backend supports MotherDuck. If you have an account, you can connect to MotherDuck by passing in the string md:
or motherduck:
. ibis
will trigger the authentication prompt in-browser.
import ibis
= ibis.duckdb.connect("md:") con
Authentication to MotherDuck will trigger on the first call that requires retrieving information (in this case list_tables
)
con.list_tables()open the SSO authorization page in your default browser.
Attempting to automatically 1. Please open this link to login into your account: https://auth.motherduck.com/activate
2. Enter the following code: ZSRQ-GJQS
Token successfully retrieved ✅as an environment variable to avoid having to log in again:
You can store it ='****************'
$ export motherduck_token
'penguins'] [
Geospatial
The DuckDB backend has experimental support for geospatial operations.
To enable geospatial operations, install the geospatial
extra or otherwise install the additional required dependencies:
See read_geo
for some tips on reading in geospatial data.
pip install 'ibis-framework[geospatial]'
conda install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
mamba install -c conda-forge ibis-framework geopandas 'shapely>=2,<3'
duckdb.Backend
attach
attach(self, path, name=None, read_only=False)
Attach another DuckDB database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | Path to the database to attach. | required |
name | str | None | Name to attach the database as. Defaults to the basename of path . |
None |
read_only | bool | Whether to attach the database as read-only. | False |
attach_sqlite
attach_sqlite(self, path, overwrite=False, all_varchar=False)
Attach a SQLite database to the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database. | required |
overwrite | bool | Allow overwriting any tables or views that already exist in your current session with the contents of the SQLite database. | False |
all_varchar | bool | Set all SQLite columns to type VARCHAR to avoid type errors on ingestion. |
False |
Examples
>>> import ibis
>>> import sqlite3
>>> con = sqlite3.connect("/tmp/attach_sqlite.db")
>>> with con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> con.list_tables()
[]>>> con.attach_sqlite("/tmp/attach_sqlite.db")
>>> con.list_tables()
't'] [
compile
compile(self, expr, limit=None, params=None, pretty=False)
Compile an Ibis expression to a SQL string.
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Name | Type | Description |
---|---|---|
BaseBackend | An instance of the backend |
create_database
create_database(self, name, catalog=None, force=False)
Create a database named name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the database to create. | required |
catalog | str | None | Name of the catalog in which to create the database. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database exists. |
False |
create_table
create_table(self, name, obj=None, *, schema=None, database=None, temp=False, overwrite=False)
Create a table in DuckDB.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to create | required |
obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but at least one of obj or schema must be specified |
None |
schema | SchemaLike | None | The schema of the table to create; optional, but at least one of obj or schema must be specified |
None |
database | str | None | The name of the database in which to create the table; if not passed, the current database is used. For multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
temp | bool | Create a temporary table | False |
overwrite | bool | If True , replace the table if it already exists, otherwise fail if the table exists |
False |
create_view
create_view(self, name, obj, *, database=None, overwrite=False)
Create a new view from an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the new view. | required |
obj | ir.Table | An Ibis table expression that will be used to create the view. | required |
database | str | None | Name of the database where the view will be created, if not provided the database’s default is used. | None |
overwrite | bool | Whether to clobber an existing view with the same name | False |
Returns
Name | Type | Description |
---|---|---|
Table | The view that was created. |
detach
detach(self, name)
Detach a database from the current DuckDB session.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | The name of the database to detach. | required |
disconnect
disconnect(self)
Close the connection to the backend.
drop_database
drop_database(self, name, catalog=None, force=False)
Drop the database with name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the schema to drop. | required |
catalog | str | None | Name of the catalog to drop the database from. If None , the current catalog is used. |
None |
force | bool | If False , an exception is raised if the database does not exist. |
False |
drop_table
drop_table(self, name, database=None, force=False)
Drop a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to drop. | required |
database | str | None | Name of the database where the table exists, if not the default. | None |
force | bool | If False , an exception is raised if the table does not exist. |
False |
drop_view
drop_view(self, name, *, database=None, force=False)
Drop a view.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the view to drop. | required |
database | str | None | Name of the database where the view exists, if not the default. | None |
force | bool | If False , an exception is raised if the view does not exist. |
False |
execute
execute(self, expr, params=None, limit='default', **_)
Execute an expression.
from_connection
from_connection(cls, con, extensions=None)
Create an Ibis client from an existing connection to a DuckDB database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
con | duckdb.DuckDBPyConnection | An existing connection to a DuckDB database. | required |
extensions | Sequence[str] | None | A list of duckdb extensions to install/load upon connection. | None |
get_schema
get_schema(self, table_name, *, catalog=None, database=None)
Compute the schema of a table
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
catalog | str | None | Catalog name | None |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
sch.Schema | Ibis schema |
has_operation
has_operation(cls, operation)
Return whether the backend implements support for operation
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
operation | type[ops.Value] | A class corresponding to an operation. | required |
Returns
Name | Type | Description |
---|---|---|
bool | Whether the backend implements the operation. |
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True
insert
insert(self, table_name, obj, database=None, overwrite=False)
Insert data into a table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name | str | The name of the table to which data needs will be inserted | required |
obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(self, like=None)
list_databases
list_databases(self, like=None, catalog=None)
List existing databases in the current connection.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
catalog | str | None | The catalog to list databases from. If None , the current catalog is searched. |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(self, like=None, database=None)
List tables and views.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | Regex to filter by table/view name. | None |
database | tuple[str, str] | str | None | Database location. If not passed, uses the current database. By default uses the current database (self.current_database ) and catalog (self.current_catalog ). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database" , or as a tuple of strings ("catalog", "database") . |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | List of table and view names. |
Examples
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
'foo']
[>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
'bar', 'foo']
[>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
'baz'] [
load_extension
load_extension(self, extension, force_install=False)
Install and load a duckdb extension by name or path.
Parameters
Name | Type | Description | Default |
---|---|---|---|
extension | str | The extension name or path. | required |
force_install | bool | Force reinstallation of the extension. | False |
raw_sql
raw_sql(self, query, **kwargs)
read_csv
read_csv(self, source_list, table_name=None, columns=None, types=None, **kwargs)
Register a CSV file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | list[str] | tuple[str] | The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
columns | Mapping[str, str | dt.DataType] | None | An optional mapping of all column names to their types. | None |
types | Mapping[str, str | dt.DataType] | None | An optional mapping of a subset of column names to their types. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/csv for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
Examples
Generate some data
>>> import tempfile
>>> data = b'''
... lat,lon,geom
... 1.0,2.0,POINT (1 2)
... 2.0,3.0,POINT (2 3)
... '''
>>> with tempfile.NamedTemporaryFile(delete=False) as f:
= f.write(data) ... nbytes
Import Ibis
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> con = ibis.duckdb.connect()
Read the raw CSV file
>>> t = con.read_csv(f.name)
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ string │
├─────────┼─────────┼─────────────┤1.0 │ 2.0 │ POINT (1 2) │
│ 2.0 │ 3.0 │ POINT (2 3) │
│ └─────────┴─────────┴─────────────┘
Load the spatial
extension and read the CSV file again, using specific column types
>>> con.load_extension("spatial")
>>> t = con.read_csv(f.name, types={"geom": "geometry"})
>>> t
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ lat ┃ lon ┃ geom ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │ float64 │ geospatial:geometry │
├─────────┼─────────┼──────────────────────┤1.0 │ 2.0 │ <POINT (1 2)> │
│ 2.0 │ 3.0 │ <POINT (2 3)> │
│ └─────────┴─────────┴──────────────────────┘
read_delta
read_delta(self, source_table, table_name=None, **kwargs)
Register a Delta Lake table as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_table | str | The data source. Must be a directory containing a Delta Lake table. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to deltalake.DeltaTable. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_geo
read_geo(self, source, table_name=None, **kwargs)
Register a GEO file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | str | The data source(s). Path to a file of geospatial files supported by duckdb. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/extensions/spatial.html#st_read—read-spatial-data-from-files for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_json
read_json(self, source_list, table_name=None, columns=None, **kwargs)
Read newline-delimited JSON into an ibis table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | list[str] | tuple[str] | File or list of files | required |
table_name | str | None | Optional table name | None |
columns | Mapping[str, str] | None | Optional mapping from string column name to duckdb type string. | None |
**kwargs | Additional keyword arguments passed to DuckDB’s read_json_auto function. See https://duckdb.org/docs/data/json/overview.html#json-loading for parameters and more information about reading JSON. |
{} |
Returns
Name | Type | Description |
---|---|---|
Table | An ibis table expression |
read_mysql
read_mysql(self, uri, *, catalog, table_name=None)
Register a table from a MySQL instance into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A mysql URI of the form mysql://user:password@host:port/database |
required |
catalog | str | User-defined alias given to the MySQL database that is being attached to DuckDB | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_parquet
read_parquet(self, source_list, table_name=None, **kwargs)
Register a parquet file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source_list | str | Iterable[str] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to DuckDB loading function. See https://duckdb.org/docs/data/parquet for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_postgres
read_postgres(self, uri, *, table_name=None, database='public')
Register a table from a postgres instance into a DuckDB table.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
uri | str | A postgres URI of the form postgres://user:password@host:port |
required |
table_name | str | None | The table to read | None |
database | str | PostgreSQL database (schema) where table_name resides |
'public' |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
read_sqlite
read_sqlite(self, path, *, table_name=None)
Register a table from a SQLite database into a DuckDB table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The path to the SQLite database | required |
table_name | str | None | The table to read | None |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table. |
Examples
>>> import ibis
>>> import sqlite3
>>> ibis.options.interactive = True
>>> con = sqlite3.connect("/tmp/sqlite.db")
>>> with con:
"DROP TABLE IF EXISTS t")
... con.execute("CREATE TABLE t (a INT, b TEXT)")
... con.execute(
... con.execute("INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c')"
...
... )<...>
>>> con.close()
>>> con = ibis.connect("duckdb://")
>>> t = con.read_sqlite(path="/tmp/sqlite.db", table_name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤1 │ a │
│ 2 │ b │
│ 3 │ c │
│ └───────┴────────┘
reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_filesystem
register_filesystem(self, filesystem)
Register an fsspec
filesystem object with DuckDB.
This allow a user to read from any fsspec
compatible filesystem using read_csv
, read_parquet
, read_json
, etc.
Creating an fsspec
filesystem requires that the corresponding backend-specific fsspec
helper library is installed.
e.g. to connect to Google Cloud Storage, gcsfs
must be installed.
Parameters
Name | Type | Description | Default |
---|---|---|---|
filesystem | AbstractFileSystem | The fsspec filesystem object to register with DuckDB. See https://duckdb.org/docs/guides/python/filesystems for details. | required |
Examples
>>> import ibis
>>> import fsspec
>>> ibis.options.interactive = True
>>> gcs = fsspec.filesystem("gcs")
>>> con = ibis.duckdb.connect()
>>> con.register_filesystem(gcs)
>>> t = con.read_csv(
"gcs://ibis-examples/data/band_members.csv.gz",
... ="band_members",
... table_name
... )>>> t
┏━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━┩
│ string │ string │
├────────┼─────────┤
│ Mick │ Stones │
│ John │ Beatles │
│ Paul │ Beatles │ └────────┴─────────┘
register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name | str | The old name of the table. | required |
new_name | str | The new name of the table. | required |
sql
sql(self, query, schema=None, dialect=None)
table
table(self, name, database=None)
Construct a table expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Database name | None |
Returns
Name | Type | Description |
---|---|---|
Table | Table expression |
to_csv
to_csv(self, expr, path, *, params=None, header=True, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
path | str | Path | The data source. A string or Path to the CSV file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
header | bool | Whether to write the column names as the first line of the CSV file. | True |
**kwargs | Any | DuckDB CSV writer arguments. https://duckdb.org/docs/data/csv/overview.html#parameters | {} |
to_delta
to_delta(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_geo
to_geo(self, expr, path, *, format, layer_creation_options=None, params=None, limit=None, **kwargs)
Write the results of executing expr
to a geospatial output.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | Ibis expression to execute and persist to geospatial output. | required |
path | str | Path | A string or Path to the desired output file location. | required |
format | str | The format of the geospatial output. One of GDAL’s supported vector formats. The list of vector formats is located here: https://gdal.org/en/latest/drivers/vector/index.html | required |
layer_creation_options | Mapping[str, Any] | None | A mapping of layer creation options. | None |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Additional keyword arguments passed to the DuckDB COPY command. |
{} |
Examples
>>> import os
>>> import tempfile
>>> import ibis
>>> ibis.options.interactive = True
>>> from ibis import _
Load some geospatial data
>>> con = ibis.duckdb.connect()
>>> zones = ibis.examples.zones.fetch(backend=con)
>>> zones[["zone", "geom"]].head()
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ zone ┃ geom ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ geospatial:geometry │
├───────────────────────────────────────┼──────────────────────────────────────┤<POLYGON ((933100.918 192536.086, │
│ │ 933091.011 192572.175, 933088.585 │
│ Newark Airport │ 192604.9...> │
│ │ <MULTIPOLYGON (((1033269.244 │
│ │ 172126.008, 1033439.643 170883.946, │
│ Jamaica Bay │ 1033473.265...> │
│ │ <POLYGON ((1026308.77 256767.698, │
│ │ /Pelham Gardens │ 1026495.593 256638.616, 1026567.23 │
│ Allerton256589....> │
│ │ <POLYGON ((992073.467 203714.076, │
│ │ 992068.667 203711.502, 992061.716 │
│ Alphabet City │ 203711.7...> │
│ │ <POLYGON ((935843.31 144283.336, │
│ │ 936046.565 144173.418, 936387.922 │
│ Arden Heights │ 143967.75...> │
│ │ └───────────────────────────────────────┴──────────────────────────────────────┘
Write to a GeoJSON file
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.geojson"),
... pathformat="geojson",
... ... )
Write to a Shapefile
>>> with tempfile.TemporaryDirectory() as tmpdir:
... con.to_geo(
... zones,=os.path.join(tmpdir, "zones.shp"),
... pathformat="ESRI Shapefile",
... ... )
to_json
to_json(self, expr, path, *, compression='auto', dateformat=None, timestampformat=None)
Write the results of expr
to a json file of [{column -> value}, …] objects.
This method is eager and will execute the associated expression immediately. See https://duckdb.org/docs/sql/statements/copy.html#json-options for more info.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | URLs such as S3 buckets are supported. | required |
compression | Literal['auto', 'none', 'gzip', 'zstd'] | Compression codec to use. One of “auto”, “none”, “gzip”, “zstd”. | 'auto' |
dateformat | str | None | Date format string. | None |
timestampformat | str | None | Timestamp format string. | None |
to_pandas
to_pandas(self, expr, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Iterator[pd.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
path | str | Path | The data source. A string or Path to the parquet file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | DuckDB Parquet writer arguments. See https://duckdb.org/docs/data/parquet/overview.html#writing-to-parquet-files for details. | {} |
Examples
Write out an expression to a single parquet file.
>>> import ibis
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
>>> con.to_parquet(penguins, "/tmp/penguins.parquet")
Write out an expression to a hive-partitioned parquet file.
>>> import tempfile
>>> penguins = ibis.examples.penguins.fetch()
>>> con = ibis.get_backend(penguins)
Partition on a single column.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by="year")
Partition on multiple columns.
>>> con.to_parquet(penguins, tempfile.mkdtemp(), partition_by=("year", "island"))
to_parquet_dir
to_parquet_dir(self, expr, directory, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
https | required |
to_polars
to_polars(self, expr, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to polars. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(self, expr, *, params=None, limit=None, **_)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to pyarrow | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **_)
Return a stream of record batches.
The returned RecordBatchReader
contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression | required |
params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
limit | int | str | None | Limit the result to this number of rows | None |
chunk_size | int | The number of rows to fetch per batch | 1000000 |
to_torch
to_torch(self, expr, *, params=None, limit=None, **_)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Keyword arguments passed into the backend’s to_torch implementation. |
required |
Returns
Name | Type | Description |
---|---|---|
dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(self, name, database=None)
Delete all rows from a table.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |