Amazon Athena
Install
Install Ibis and dependencies for the Athena backend:
Install with the athena extra:
pip install 'ibis-framework[athena]'And connect:
import ibis
con = ibis.athena.connect(s3_staging_dir="s3://...")s3_staging_dir argument must be provided.
This argument tells the underlying driver library–pyathena–and ultimately Athena itself where to dump query results.
- Adjust other connection parameters as needed.
Install for Athena:
conda install -c conda-forge ibis-athenaimport ibis
con = ibis.athena.connect(s3_staging_dir="s3://...")s3_staging_dir argument must be provided.
This argument tells the underlying driver library–pyathena–and ultimately Athena itself where to dump query results.
- Adjust other connection parameters as needed.
Install for Athena:
mamba install -c conda-forge ibis-athenaimport ibis
con = ibis.athena.connect(s3_staging_dir="s3://my-bucket/")s3_staging_dir argument must be provided.
This argument tells the underlying driver library–pyathena–and ultimately Athena itself where to dump query results.
- Adjust other connection parameters as needed.
Connect
ibis.athena.connect
con = ibis.athena.connect(
s3_staging_dir="s3://my-bucket/",
)s3_staging_dir argument must be provided.
This argument tells the underlying driver library–pyathena–and ultimately Athena itself where to dump query results.
Connection Parameters
do_connect
do_connect(self, *, s3_staging_dir, cursor_class=ArrowCursor, memtable_volume=None, schema_name='default', catalog_name='awsdatacatalog', **config)
Create an Ibis client connected to an Amazon Athena instance.
athena.Backend
begin
begin(self)
compile
compile(self, expr, /, *, limit=None, params=None, pretty=False)
Compile an expression to a SQL string.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | An ibis expression to compile. | required |
| limit | str | int | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| params | Mapping[ir.Expr, Any] | None | Mapping of scalar parameter expressions to value. | None |
| pretty | bool | Pretty print the SQL query during compilation. | False |
Returns
| Name | Type | Description |
|---|---|---|
| str | Compiled expression |
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| *args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
| **kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args and kwargs, then calls reconnect and finally returns the newly created and connected backend instance.
Returns
| Name | Type | Description |
|---|---|---|
| BaseBackend | An instance of the backend |
create_database
create_database(self, name, /, *, location=None, catalog=None, force=False)
Create a database named name in catalog.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the database to create. | required |
| catalog | str | None | Name of the catalog in which to create the database. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database exists. |
False |
create_table
create_table(self, name, /, obj=None, *, schema=None, database=None, temp=False, overwrite=None, comment=None, properties=None, location=None, stored_as='PARQUET', partitioned_by=())
Create a table in Amazon Athena.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the table to create | required |
| obj | ir.Table | pd.DataFrame | pa.Table | pl.DataFrame | pl.LazyFrame | None | The data with which to populate the table; optional, but one of obj or schema must be specified |
None |
| schema | sch.SchemaLike | None | The schema of the table to create; optional, but one of obj or schema must be specified |
None |
| database | str | None | The database to insert the table into. If not provided, the current database is used. You can provide a single database name, like "mydb". For multi-level hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| temp | bool | This parameter is not yet supported in the Amazon Athena backend, because Amazon Athena doesn’t implement temporary tables | False |
| overwrite | bool | None | If True, replace the table if it already exists, otherwise fail if the table exists |
None |
| comment | str | None | Add a comment to the table | None |
| properties | Mapping[str, Any] | None | Table properties to set on creation | None |
| location | str | None | s3 location to store table data. Defaults to the s3_staging_dir bucket with the table name as the bucket key. |
None |
| stored_as | str | The file format in which to store table data. Defaults to parquet. | 'PARQUET' |
| partitioned_by | sch.SchemaLike | Iterable of column name and type pairs/mapping/schema by which to partition the table. | () |
create_view
create_view(self, name, /, obj, *, database=None, overwrite=False)
Create a view from an Ibis expression.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the view to create. | required |
| obj | ir.Table | The Ibis expression to create the view from. | required |
| database | str | None | The database that the view should be created in. | None |
| overwrite | bool | If True, replace an existing view with the same name. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | A table expression representing the view. |
disconnect
disconnect(self)
Disconnect from the backend.
drop_database
drop_database(self, name, /, *, catalog=None, force=False)
Drop the database with name in catalog.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Name of the schema to drop. | required |
| catalog | str | None | Name of the catalog to drop the database from. If None, the current catalog is used. |
None |
| force | bool | If False, an exception is raised if the database does not exist. |
False |
drop_table
drop_table(self, name, /, *, database=None, force=False)
Drop a table from the backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the table to drop | required |
| database | tuple[str, str] | str | None | The database that the table is located in. | None |
| force | bool | If True, do not raise an error if the table does not exist. |
False |
drop_view
drop_view(self, name, /, *, database=None, force=False)
Drop a view from the backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the view to drop. | required |
| database | str | None | The database that the view is located in. | None |
| force | bool | If True, do not raise an error if the view does not exist. |
False |
execute
execute(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| DataFrame | Series | scalar | The result of the expression execution. |
from_connection
from_connection(cls, con, /, *, memtable_volume=None)
Create an Ibis client from an existing connection to an Amazon Athena instance.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| con | An existing connection to an Amazon Athena instance. | required | |
| memtable_volume | str | None | The volume to use for Ibis memtables. | None |
get_schema
get_schema(self, table_name, *, catalog=None, database=None)
Compute the schema of a table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| table_name | str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
| catalog | str | None | Catalog name | None |
| database | str | None | Database name | None |
Returns
| Name | Type | Description |
|---|---|---|
| sch.Schema | Ibis schema |
has_operation
has_operation(cls, operation, /)
Return whether the backend supports the given operation.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| operation | type[ops.Value] | Operation type, a Python class object. | required |
insert
insert(self, name, /, obj, *, database=None, overwrite=False)
Insert data into a table.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | The name of the table to which data needs will be inserted | required |
| obj | pd.DataFrame | ir.Table | list | dict | The source data or expression to insert | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |
| overwrite | bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(self, *, like=None)
list_databases
list_databases(self, like=None, catalog=None)
List existing databases in the current connection.
schema to refer to database hierarchy.
A collection of table is referred to as a database. A collection of database is referred to as a catalog.
These terms are mapped onto the corresponding features in each backend (where available), regardless of the terminology the backend uses.
See the Table Hierarchy Concepts Guide for more info.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format to filter returned database names. | None |
| catalog | str | None | The catalog to list databases from. If None, the current catalog is searched. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_tables
list_tables(self, *, like=None, database=None)
The table names that match like in the given database.
For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| like | str | None | A pattern in Python’s regex format. | None |
| database | tuple[str, str] | str | None | The database, or (catalog, database) from which to list tables. For backends that support a single-level table hierarchy, you can pass in a string like "bar". For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). If not provided, the current database (and catalog, if applicable for this backend) is used. See the Table Hierarchy Concepts Guide for more info. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| list[str] | The list of the table names that match the pattern like. |
Examples
This example uses the DuckDB backend, but the list_tables API works the same for other backends.
>>> import ibis
>>> con = ibis.duckdb.connect()
>>> foo = con.create_table("foo", schema=ibis.schema(dict(a="int")))
>>> con.list_tables()
['foo']
>>> bar = con.create_view("bar", foo)
>>> con.list_tables()
['bar', 'foo']
>>> con.create_database("my_database")
>>> con.list_tables(database="my_database")
[]
>>> con.raw_sql("CREATE TABLE my_database.baz (a INTEGER)")
<duckdb.duckdb.DuckDBPyConnection object at 0x...>
>>> con.list_tables(database="my_database")
['baz']raw_sql
raw_sql(self, query, **kwargs)
read_csv
read_csv(self, path, /, *, table_name=None, **kwargs)
Register a CSV file as a table in the current backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. A string or Path to the CSV file. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_delta
read_delta(self, path, /, *, table_name=None, **kwargs)
Register a Delta Lake table in the current database.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. Must be a directory containing a Delta Lake table. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to the underlying backend or library. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table. |
read_json
read_json(self, path, /, *, table_name=None, **kwargs)
Register a JSON file as a table in the current backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. A string or Path to the JSON file. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
read_parquet
read_parquet(self, path, /, *, table_name=None, **kwargs)
Register a parquet file as a table in the current backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| path | str | Path | The data source. | required |
| table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
| **kwargs | Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The just-registered table |
reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| old_name | str | The old name of the table. | required |
| new_name | str | The new name of the table. | required |
sql
sql(self, query, /, *, schema=None, dialect=None)
Create an Ibis table expression from a SQL query.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| query | str | A SQL query string | required |
| schema | SchemaLike | None | The schema of the query. If not provided, Ibis will try to infer the schema of the query. | None |
| dialect | str | None | The SQL dialect of the query. If not provided, the backend’s dialect is assumed. This argument can be useful when the query is written in a different dialect from the backend. | None |
Returns
| Name | Type | Description |
|---|---|---|
| ir.Table | The table expression representing the query |
table
table(self, name, /, *, database=None)
Construct a table expression from the corresponding table in the backend.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Table name | required |
| database | tuple[str, str] | str | None | The database, or (catalog, database) from which to get the table. For backends that support a single-level table hierarchy, you can pass in a string like "bar". For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). If not provided, the current database (and catalog, if applicable for this backend) is used. See the Table Hierarchy Concepts Guide for more info. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| Table | Table expression |
Examples
>>> import ibis
>>> backend = ibis.duckdb.connect()Get the “foo” table from the current database (and catalog, if applicable for this backend):
>>> backend.table("foo")Get the “foo” table from the “bar” database (in DuckDB’s language they would say the “bar” schema, in SQL this would be "bar"."foo")
>>> backend.table("foo", database="bar")Get the “foo” table from the “bar” database, within the “baz” catalog (in DuckDB’s language they would say the “bar” schema, and “baz” database, in SQL this would be "baz"."bar"."foo")
>>> backend.table("foo", database=("baz", "bar"))to_csv
to_csv(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
| path | str | Path | The data source. A string or Path to the CSV file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| kwargs | Any | Additional keyword arguments passed to pyarrow.csv.CSVWriter | {} |
| https | required |
to_delta
to_delta(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
| path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_json
to_json(self, expr, /, path, **kwargs)
Write the results of expr to a json file of [{column -> value}, …] objects.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
| path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
| kwargs | Any | Additional, backend-specifc keyword arguments. | {} |
to_pandas
to_pandas(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame, Series, or scalar.
This method is a wrapper around execute.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, /, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrames.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| Iterator[pd.DataFrame] | An iterator of pandas DataFrames. |
to_parquet
to_parquet(self, expr, /, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| path | str | Path | The data source. A string or Path to the parquet file. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.parquet.ParquetWriter | {} |
| https | required |
to_parquet_dir
to_parquet_dir(self, expr, /, directory, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
| directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| **kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
| https | required |
to_polars
to_polars(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to polars. | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(self, expr, /, *, params=None, limit=None, **kwargs)
Execute expression to a pyarrow object.
This method is eager and will execute the associated expression immediately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to export to pyarrow | required |
| params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. The default is in ibis/config.py. |
None |
| kwargs | Any | Keyword arguments | {} |
Returns
| Name | Type | Description |
|---|---|---|
| result | If the passed expression is a Table, a pyarrow table is returned. If the passed expression is a Column, a pyarrow array is returned. If the passed expression is a Scalar, a pyarrow scalar is returned. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, /, *, params=None, limit=None, **_)
Return a stream of record batches.
The returned RecordBatchReader contains a cursor with an unbounded lifetime.
For analytics use cases this is usually nothing to fret about. In some cases you may need to explicit release the cursor.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression | required |
| params | Mapping[ir.Scalar, Any] | None | Bound parameters | None |
| limit | int | str | None | Limit the result to this number of rows | None |
to_torch
to_torch(self, expr, /, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| expr | ir.Expr | Ibis expression to execute. | required |
| params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
| limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
| kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(self, name, /, *, database=None)
Delete all rows from a table.
schema to refer to database hierarchy.
A collection of tables is referred to as a database. A collection of database is referred to as a catalog. These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| name | str | Table name | required |
| database | str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database"). |
None |