clickhouse.Backend
close
close(self)
Close ClickHouse connection.
compile
compile(self, expr, limit=None, params=None, **kwargs)
Compile an Ibis expression to a ClickHouse SQL string.
create_database
create_database(self, name, *, force=False, engine='Atomic')
Create a new database.
Parameters
name
str
Name of the new database.
required
force
bool
If False
, an exception is raised if the database already exists.
False
create_table
create_table(self, name, obj=None, *, schema=None, database=None, temp=False, overwrite=False, engine='MergeTree', order_by=None, partition_by=None, sample_by=None, settings=None)
Create a table in a ClickHouse database.
Parameters
name
str
Name of the table to create
required
obj
pandas.pandas.DataFrame | pyarrow.pyarrow.Table | ibis.ibis.Table | None
Optional data to create the table with
None
schema
ibis.ibis.Schema | None
Optional names and types of the table
None
database
str | None
Database to create the table in
None
temp
bool
Create a temporary table. This is not yet supported, and exists for API compatibility.
False
overwrite
bool
Whether to overwrite the table
False
engine
str
The table engine to use. See ClickHouse’s CREATE TABLE
documentation for specifics. Defaults to MergeTree
with ORDER BY tuple()
because MergeTree
is the most feature-complete engine.
'MergeTree'
order_by
collections.abc.Iterable[str] | None
String column names to order by. Required for some table engines like MergeTree
.
None
partition_by
collections.abc.Iterable[str] | None
String column names to partition by
None
sample_by
str | None
String column names to sample by
None
settings
collections.abc.Mapping[str, typing.Any] | None
Key-value pairs of settings for table creation
None
create_view
create_view(self, name, obj, *, database=None, overwrite=False)
Create a new view from an expression.
Parameters
name
str
Name of the new view.
required
obj
ibis.ibis.Table
An Ibis table expression that will be used to create the view.
required
database
str | None
Name of the database where the view will be created, if not provided the database’s default is used.
None
overwrite
bool
Whether to clobber an existing view with the same name
False
Returns
Table
The view that was created.
drop_database
drop_database(self, name, *, force=False)
Drop a database with name name
.
Parameters
name
str
Database to drop.
required
force
bool
If False
, an exception is raised if the database does not exist.
False
drop_table
drop_table(self, name, database=None, force=False)
Drop a table.
Parameters
name
str
Name of the table to drop.
required
database
str | None
Name of the database where the table exists, if not the default.
None
force
bool
If False
, an exception is raised if the table does not exist.
False
drop_view
drop_view(self, name, *, database=None, force=False)
Drop a view.
Parameters
name
str
Name of the view to drop.
required
database
str | None
Name of the database where the view exists, if not the default.
None
force
bool
If False
, an exception is raised if the view does not exist.
False
execute
execute(self, expr, limit='default', external_tables=None, **kwargs)
Execute an expression.
get_schema
get_schema(self, table_name, database=None)
Return a Schema object for the indicated table and database.
Parameters
table_name
str
May not be fully qualified. Use database
if you want to qualify the identifier.
required
database
str | None
Database name
None
Returns
ibis.ibis.Schema
Ibis schema
has_operation
has_operation(cls, operation)
Return whether the backend implements support for operation
.
Parameters
operation
type[ibis.ibis.Value]
A class corresponding to an operation.
required
Returns
bool
Whether the backend implements the operation.
Examples
>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True
insert
insert(self, name, obj, settings=None, **kwargs)
list_databases
list_databases(self, like=None)
List existing databases in the current connection.
Parameters
like
str | None
A pattern in Python’s regex format to filter returned database names.
None
Returns
list[str]
The database names that exist in the current connection, that match the like
pattern if provided.
list_tables
list_tables(self, like=None, database=None)
Return the list of table names in the current database.
For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.
Parameters
like
str | None
A pattern in Python’s regex format.
None
database
str | None
The database from which to list tables. If not provided, the current database is used.
None
Returns
list[str]
The list of the table names that match the pattern like
.
raw_sql
raw_sql(self, query, external_tables=None, **kwargs)
Execute a SQL string query
against the database.
Parameters
query
str | sqlglot.sqlglot.exp.sqlglot.exp.Expression
Raw SQL string
required
external_tables
collections.abc.Mapping[str, pandas.pandas.DataFrame] | None
Mapping of table name to pandas DataFrames providing external datasources for the query
None
kwargs
Backend specific query arguments
{}
read_csv
read_csv(self, path, table_name=None, engine='MergeTree', **kwargs)
Register a CSV file as a table in the current backend.
Parameters
path
str | pathlib.Path
The data source. A string or Path to the CSV file.
required
table_name
str | None
An optional name to use for the created table. This defaults to a sequentially generated name.
None
**kwargs
typing.Any
Additional keyword arguments passed to the backend loading function.
{}
Returns
ibis.ibis.Table
The just-registered table
read_parquet
read_parquet(self, path, table_name=None, engine='MergeTree', **kwargs)
Register a parquet file as a table in the current backend.
Parameters
path
str | pathlib.Path
The data source.
required
table_name
str | None
An optional name to use for the created table. This defaults to a sequentially generated name.
None
**kwargs
typing.Any
Additional keyword arguments passed to the backend loading function.
{}
Returns
ibis.ibis.Table
The just-registered table
sql
sql(self, query, schema=None, dialect=None)
table
table(self, name, database=None)
Construct a table expression.
Parameters
name
str
Table name
required
database
str | None
Database name
None
to_pyarrow
to_pyarrow(self, expr, *, params=None, limit=None, external_tables=None, **kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
expr
ibis.ibis.Expr
Ibis expression to export to pyarrow
required
params
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None
Mapping of scalar parameter expressions to value.
None
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
kwargs
typing.Any
Keyword arguments
{}
Returns
Table
A pyarrow table holding the results of the executed expression.
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, limit=None, params=None, external_tables=None, chunk_size=1000000, **_)
Execute expression and return an iterator of pyarrow record batches.
This method is eager and will execute the associated expression immediately.
Parameters
expr
ibis.ibis.Expr
Ibis expression to export to pyarrow
required
limit
int | str | None
An integer to effect a specific row limit. A value of None
means “no limit”. The default is in ibis/config.py
.
None
params
collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None
Mapping of scalar parameter expressions to value.
None
external_tables
collections.abc.Mapping[str, typing.Any] | None
External data
None
chunk_size
int
Maximum number of row to return in a single chunk
1000000
Returns
results
RecordBatchReader
Notes
There are a variety of ways to implement clickhouse -> record batches.
FORMAT ArrowStream -> record batches via raw_query This has the same type conversion problem(s) as to_pyarrow
. It’s harder to address due to lack of cast
on RecordBatch
. However, this is a ClickHouse problem: we should be able to get string data out without a bunch of settings/permissions rigmarole.
Native -> Python objects -> pyarrow batches This is what is implemented, using query_column_block_stream
.
Native -> Python objects -> DataFrame chunks -> pyarrow batches This is not implemented because it adds an unnecessary pandas step in between Python object -> arrow. We can go directly to record batches without pandas in the middle.
truncate_table
truncate_table(self, name, database=None)