Polars
Install
Install Ibis and dependencies for the Polars backend:
Install with the polars
extra:
pip install 'ibis-framework[polars]'
And connect:
import ibis
= ibis.polars.connect() con
- 1
- Adjust connection parameters as needed.
Install for Polars:
conda install -c conda-forge ibis-polars
And connect:
import ibis
= ibis.polars.connect() con
- 1
- Adjust connection parameters as needed.
Install for Polars:
mamba install -c conda-forge ibis-polars
And connect:
import ibis
= ibis.polars.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.polars.connect
= ibis.polars.connect() con
ibis.polars.connect
is a thin wrapper around ibis.backends.polars.Backend.do_connect
.
Connection Parameters
do_connect
do_connect(['self', 'tables=None'])
Construct a client from a dictionary of polars LazyFrame
s and/or DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
tables | Mapping[str, pl.LazyFrame | pl.DataFrame] | None | An optional mapping of string table names to polars LazyFrames. | None |
Examples
>>> import ibis
>>> import polars as pl
>>> ibis.options.interactive = True
>>> lazy_frame = pl.LazyFrame(
"name": ["Jimmy", "Keith"], "band": ["Led Zeppelin", "Stones"]}
... {
... )>>> con = ibis.polars.connect(tables={"band_members": lazy_frame})
>>> t = con.table("band_members")
>>> t
┏━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ name ┃ band ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━┩
│ string │ string │
├────────┼──────────────┤
│ Jimmy │ Led Zeppelin │
│ Keith │ Stones │ └────────┴──────────────┘
polars.Backend
compile
compile(['self', 'expr', 'params=None', '**_'])
Compile an expression.
connect
connect(['self', '*args', '**kwargs'])
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args | Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs | Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Name | Type | Description |
---|---|---|
BaseBackend | An instance of the backend |
create_table
create_table(['self', 'name', 'obj=None', '*', 'schema=None', 'database=None', 'temp=None', 'overwrite=False'])
Create a new table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the new table. | required |
obj | pd.DataFrame | pa.Table | ir.Table | None | An Ibis table expression or pandas table that will be used to extract the schema and the data of the new table. If not provided, schema must be given. |
None |
schema | ibis.Schema | None | The schema for the new table. Only one of schema or obj can be provided. |
None |
database | str | None | Name of the database where the table will be created, if not the default. | None |
temp | bool | Whether a table is temporary or not | False |
overwrite | bool | Whether to clobber existing data | False |
Returns
Name | Type | Description |
---|---|---|
Table | The table that was created. |
create_view
create_view(['self', 'name', 'obj', '*', 'database=None', 'overwrite=False'])
Create a new view from an expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the new view. | required |
obj | ir.Table | An Ibis table expression that will be used to create the view. | required |
database | str | None | Name of the database where the view will be created, if not provided the database’s default is used. | None |
overwrite | bool | Whether to clobber an existing view with the same name | False |
Returns
Name | Type | Description |
---|---|---|
Table | The view that was created. |
disconnect
disconnect(['self'])
Close the connection to the backend.
drop_table
drop_table(['self', 'name', '*', 'force=False'])
Drop a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the table to drop. | required |
database | str | None | Name of the database where the table exists, if not the default. | None |
force | bool | If False , an exception is raised if the table does not exist. |
False |
drop_view
drop_view(['self', 'name', '*', 'force=False'])
Drop a view.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Name of the view to drop. | required |
database | str | None | Name of the database where the view exists, if not the default. | None |
force | bool | If False , an exception is raised if the view does not exist. |
False |
execute
execute(['self', 'expr', 'params=None', 'limit=None', 'streaming=False', "engine='cpu'", '**kwargs'])
Execute an expression.
get_schema
get_schema(['self', 'table_name'])
has_operation
has_operation(['cls', 'operation'])
Return whether the backend implements support for operation
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
operation | type[ops.Value] | A class corresponding to an operation. | required |
Returns
Name | Type | Description |
---|---|---|
bool | Whether the backend implements the operation. |
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True
list_tables
list_tables(['self', 'like=None', 'database=None'])
Return the list of table names in the current database.
For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like | str | None | A pattern in Python’s regex format. | None |
database | tuple[str, str] | str | None | The database from which to list tables. If not provided, the current database is used. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
Returns
Name | Type | Description |
---|---|---|
list[str] | The list of the table names that match the pattern like . |
read_csv
read_csv(['self', 'path', 'table_name=None', '**kwargs'])
Register a CSV file as a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | list[str | Path] | tuple[str | Path] | The data source. A string or Path to the CSV file. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading function. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_csv.html for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_delta
read_delta(['self', 'path', 'table_name=None', '**kwargs'])
Register a Delta Lake as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | The data source(s). Path to a Delta Lake table directory. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading function. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_delta.html for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_json
read_json(['self', 'path', 'table_name=None', '**kwargs'])
Register a JSON file as a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | A string or Path to a JSON file; globs are supported | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading function. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_ndjson.html for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_pandas
read_pandas(['self', 'source', 'table_name=None', '**kwargs'])
Register a Pandas DataFrame or pyarrow Table a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | pd.DataFrame | The data source. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading function. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.from_pandas.html for more information. | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
read_parquet
read_parquet(['self', 'path', 'table_name=None', '**kwargs'])
Register a parquet file as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path | str | Path | Iterable[str] | The data source(s). May be a path to a file, an iterable of files, or directory of parquet files. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading function. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_parquet.html for more information (if loading a single file or glob; when loading multiple files polars’ scan_pyarrow_dataset method is used instead). |
{} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
reconnect
reconnect(['self'])
Reconnect to the database already configured with connect.
register
register(['self', 'source', 'table_name=None', '**kwargs'])
Register a data source as a table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source | str | Path | Any | The data source(s). May be a path to a file, a parquet directory, or a pandas dataframe. | required |
table_name | str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs | Any | Additional keyword arguments passed to Polars loading functions for CSV or parquet. See https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_csv.html and https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_parquet.html for more information | {} |
Returns
Name | Type | Description |
---|---|---|
ir.Table | The just-registered table |
register_options
register_options(['cls'])
Register custom backend options.
rename_table
rename_table(['self', 'old_name', 'new_name'])
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name | str | The old name of the table. | required |
new_name | str | The new name of the table. | required |
sql
sql(['self', 'query', 'schema=None', 'dialect=None'])
table
table(['self', 'name'])
Construct a table expression.
schema
to refer to database hierarchy.
A collection of tables is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str | Table name | required |
database | tuple[str, str] | str | None | Database name If not provided, the current database is used. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . |
None |
Returns
Name | Type | Description |
---|---|---|
Table | Table expression |
to_csv
to_csv(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to CSV. | required |
path | str | Path | The data source. A string or Path to the CSV file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs | Any | Additional keyword arguments passed to pyarrow.csv.CSVWriter | {} |
https | required |
to_delta
to_delta(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path | str | Path | The data source. A string or Path to the Delta Lake table. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs | Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_pandas
to_pandas(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**kwargs'])
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
chunk_size | int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Iterator[pd.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(['self', 'expr', 'path', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
path | str | Path | The data source. A string or Path to the parquet file. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | Additional keyword arguments passed to pyarrow.parquet.ParquetWriter | {} |
https | required |
to_parquet_dir
to_parquet_dir(['self', 'expr', 'directory', '*', 'params=None', '**kwargs'])
Write the results of executing the given expression to a parquet file in a directory.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Table | The ibis expression to execute and persist to parquet. | required |
directory | str | Path | The data source. A string or Path to the directory where the parquet file will be written. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs | Any | Additional keyword arguments passed to pyarrow.dataset.write_dataset | {} |
https | required |
to_polars
to_polars(['self', 'expr', 'params=None', 'limit=None', 'streaming=False', "engine='cpu'", '**kwargs'])
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to polars. | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(['self', 'expr', 'params=None', 'limit=None', '**kwargs'])
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to pyarrow | required |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(['self', 'expr', '*', 'params=None', 'limit=None', 'chunk_size=1000000', '**kwargs'])
Execute expression and return a RecordBatchReader.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to export to pyarrow | required |
limit | int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
params | Mapping[ir.Scalar, Any] | None | Mapping of scalar parameter expressions to value. | None |
chunk_size | int | Maximum number of rows in each returned record batch. | 1000000 |
kwargs | Any | Keyword arguments | {} |
Returns
Name | Type | Description |
---|---|---|
results | RecordBatchReader |
to_torch
to_torch(['self', 'expr', '*', 'params=None', 'limit=None', '**kwargs'])
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr | ir.Expr | Ibis expression to execute. | required |
params | Mapping[ir.Scalar, Any] | None | Parameters to substitute into the expression. | None |
limit | int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs | Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Name | Type | Description |
---|---|---|
dict[str, torch.Tensor] | A dictionary of torch tensors, keyed by column name. |