ClickHouse

https://clickhouse.com

Install

Install Ibis and dependencies for the ClickHouse backend:

Install with the clickhouse extra:

pip install 'ibis-framework[clickhouse]'

And connect:

import ibis

con = ibis.clickhouse.connect()
1
Adjust connection parameters as needed.

Install for ClickHouse:

conda install -c conda-forge ibis-clickhouse

And connect:

import ibis

con = ibis.clickhouse.connect()
1
Adjust connection parameters as needed.

Install for ClickHouse:

mamba install -c conda-forge ibis-clickhouse

And connect:

import ibis

con = ibis.clickhouse.connect()
1
Adjust connection parameters as needed.

Connect

ibis.clickhouse.connect

con = ibis.clickhouse.connect(
    user="username",
    password="password",
    host="hostname",
)
Note

ibis.clickhouse.connect is a thin wrapper around ibis.backends.clickhouse.Backend.do_connect.

Connection Parameters

do_connect

do_connect(self, host='localhost', port=None, database='default', user='default', password='', client_name='ibis', secure=None, compression=True, **kwargs)

Create a ClickHouse client for use with Ibis.

Parameters
Name Type Description Default
host str Host name of the clickhouse server 'localhost'
port int | None ClickHouse HTTP server’s port. If not passed, the value depends on whether secure is True or False. None
database str Default database when executing queries 'default'
user str User to authenticate with 'default'
password str Password to authenticate with ''
client_name str Name of client that will appear in clickhouse server logs 'ibis'
secure bool | None Whether or not to use an authenticated endpoint None
compression str | bool The kind of compression to use for requests. See https://clickhouse.com/docs/en/integrations/python#compression for more information. True
kwargs typing.Any Client specific keyword arguments {}
Examples
>>> import ibis
>>> client = ibis.clickhouse.connect()
>>> client
<ibis.clickhouse.client.ClickhouseClient object at 0x...>

ibis.connect URL format

In addition to ibis.clickhouse.connect, you can also connect to ClickHouse by passing a properly-formatted ClickHouse connection URL to ibis.connect:

con = ibis.connect(f"clickhouse://{user}:{password}@{host}:{port}?secure={secure}")

ClickHouse playground

ClickHouse provides a free playground with several datasets that you can connect to using Ibis:

from ibis.interactive import *

con = ibis.connect("clickhouse://play:clickhouse@play.clickhouse.com:443?secure=True")
actors = con.table("actors")
actors
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ login             type     site_admin  name          company              blog                 location           email                      hireable  bio                                        twitter_username  public_repos  public_gists  followers  following  created_at           updated_at          ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ !string!string!boolean!string!string!string!string!string!boolean!string!string!int64!int64!int64!int64!timestamp(0)!timestamp(0)       │
├──────────────────┼─────────┼────────────┼──────────────┼─────────────────────┼─────────────────────┼───────────────────┼───────────────────────────┼──────────┼───────────────────────────────────────────┼──────────────────┼──────────────┼──────────────┼───────────┼───────────┼─────────────────────┼─────────────────────┤
│ 0000Blaze       User    │ False      │ RohanChhetryPulchowk Campus,IOErohanchhetry.com.npSanepa , Lalitpur~                         │ True     │ ~rohanchhetry9   56057832019-02-24 02:31:212023-07-30 11:30:14 │
│ 007developforfunUser    │ False      │ ~~~~~                         │ False    │ ~~00002015-08-07 11:28:012022-08-12 08:45:30 │
│ 00arthur00      User    │ False      │ Arthur      ~~Beijing, China   yang_yapo@126.com         │ False    │ ~~7215612017-04-01 13:37:012023-06-15 14:50:12 │
│ 010001          User    │ False      │ ~~~~1025394547@qq.com         │ False    │ ~~150112015-02-05 03:11:592023-03-17 06:07:01 │
│ 01001101ilad    User    │ False      │ Milad       ~~~~                         │ False    │ Programmer, Writer and Full-Time Learner.~100002016-10-31 19:12:552023-07-24 11:43:03 │
│ 010227leo       User    │ False      │ zucker      trip.com           ~Shanghai, China  ~                         │ False    │ ~~217112012-01-11 06:23:152023-07-24 03:35:26 │
│ 010ric          User    │ False      │ Mario Turic ~~Munich           ~                         │ True     │ Enthusiast and Maker                     ~19023682017-10-27 14:00:072023-08-04 18:44:35 │
│ 01egen          User    │ False      │ ~~~~~                         │ False    │ ~~10022019-02-27 08:59:002023-06-20 04:02:51 │
│ 0400H           User    │ False      │ 0400H       ~~Shanghai         git@0400h.cn              │ True     │ HPC & MLSys & PPML                       ~1703422015-12-20 17:38:002023-07-21 11:28:22 │
│ 0442A403        User    │ False      │ Damir Petrov~~Moscow           petrovdamir2235@gmail.com │ False    │ HSE student                              ~19015302016-11-05 18:59:382023-06-22 06:08:50 │
│                    │
└──────────────────┴─────────┴────────────┴──────────────┴─────────────────────┴─────────────────────┴───────────────────┴───────────────────────────┴──────────┴───────────────────────────────────────────┴──────────────────┴──────────────┴──────────────┴───────────┴───────────┴─────────────────────┴─────────────────────┘

clickhouse.Backend

close

close(self)

Close ClickHouse connection.

compile

compile(self, expr, limit=None, params=None, **kwargs)

Compile an Ibis expression to a ClickHouse SQL string.

create_database

create_database(self, name, *, force=False, engine='Atomic')

Create a new database.

Parameters

Name Type Description Default
name str Name of the new database. required
force bool If False, an exception is raised if the database already exists. False

create_table

create_table(self, name, obj=None, *, schema=None, database=None, temp=False, overwrite=False, engine='MergeTree', order_by=None, partition_by=None, sample_by=None, settings=None)

Create a table in a ClickHouse database.

Parameters

Name Type Description Default
name str Name of the table to create required
obj pandas.pandas.DataFrame | pyarrow.pyarrow.Table | ibis.ibis.Table | None Optional data to create the table with None
schema ibis.ibis.Schema | None Optional names and types of the table None
database str | None Database to create the table in None
temp bool Create a temporary table. This is not yet supported, and exists for API compatibility. False
overwrite bool Whether to overwrite the table False
engine str The table engine to use. See ClickHouse’s CREATE TABLE documentation for specifics. Defaults to MergeTree with ORDER BY tuple() because MergeTree is the most feature-complete engine. 'MergeTree'
order_by collections.abc.Iterable[str] | None String column names to order by. Required for some table engines like MergeTree. None
partition_by collections.abc.Iterable[str] | None String column names to partition by None
sample_by str | None String column names to sample by None
settings collections.abc.Mapping[str, typing.Any] | None Key-value pairs of settings for table creation None

Returns

Type Description
Table The new table

create_view

create_view(self, name, obj, *, database=None, overwrite=False)

Create a new view from an expression.

Parameters

Name Type Description Default
name str Name of the new view. required
obj ibis.ibis.Table An Ibis table expression that will be used to create the view. required
database str | None Name of the database where the view will be created, if not provided the database’s default is used. None
overwrite bool Whether to clobber an existing view with the same name False

Returns

Type Description
Table The view that was created.

drop_database

drop_database(self, name, *, force=False)

Drop a database with name name.

Parameters

Name Type Description Default
name str Database to drop. required
force bool If False, an exception is raised if the database does not exist. False

drop_table

drop_table(self, name, database=None, force=False)

Drop a table.

Parameters

Name Type Description Default
name str Name of the table to drop. required
database str | None Name of the database where the table exists, if not the default. None
force bool If False, an exception is raised if the table does not exist. False

drop_view

drop_view(self, name, *, database=None, force=False)

Drop a view.

Parameters

Name Type Description Default
name str Name of the view to drop. required
database str | None Name of the database where the view exists, if not the default. None
force bool If False, an exception is raised if the view does not exist. False

execute

execute(self, expr, limit='default', external_tables=None, **kwargs)

Execute an expression.

get_schema

get_schema(self, table_name, database=None)

Return a Schema object for the indicated table and database.

Parameters

Name Type Description Default
table_name str May not be fully qualified. Use database if you want to qualify the identifier. required
database str | None Database name None

Returns

Type Description
ibis.ibis.Schema Ibis schema

has_operation

has_operation(cls, operation)

Return whether the backend implements support for operation.

Parameters

Name Type Description Default
operation type[ibis.ibis.Value] A class corresponding to an operation. required

Returns

Type Description
bool Whether the backend implements the operation.

Examples

>>> import ibis
>>> import ibis.expr.operations as ops
>>> ibis.sqlite.has_operation(ops.ArrayIndex)
False
>>> ibis.postgres.has_operation(ops.ArrayIndex)
True

insert

insert(self, name, obj, settings=None, **kwargs)

list_databases

list_databases(self, like=None)

List existing databases in the current connection.

Parameters

Name Type Description Default
like str | None A pattern in Python’s regex format to filter returned database names. None

Returns

Type Description
list[str] The database names that exist in the current connection, that match the like pattern if provided.

list_tables

list_tables(self, like=None, database=None)

Return the list of table names in the current database.

For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.

Parameters

Name Type Description Default
like str | None A pattern in Python’s regex format. None
database str | None The database from which to list tables. If not provided, the current database is used. None

Returns

Type Description
list[str] The list of the table names that match the pattern like.

raw_sql

raw_sql(self, query, external_tables=None, **kwargs)

Execute a SQL string query against the database.

Parameters

Name Type Description Default
query str | sqlglot.sqlglot.exp.sqlglot.exp.Expression Raw SQL string required
external_tables collections.abc.Mapping[str, pandas.pandas.DataFrame] | None Mapping of table name to pandas DataFrames providing external datasources for the query None
kwargs Backend specific query arguments {}

Returns

Type Description
Cursor Clickhouse cursor

read_csv

read_csv(self, path, table_name=None, engine='MergeTree', **kwargs)

Register a CSV file as a table in the current backend.

Parameters

Name Type Description Default
path str | pathlib.Path The data source. A string or Path to the CSV file. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs typing.Any Additional keyword arguments passed to the backend loading function. {}

Returns

Type Description
ibis.ibis.Table The just-registered table

read_parquet

read_parquet(self, path, table_name=None, engine='MergeTree', **kwargs)

Register a parquet file as a table in the current backend.

Parameters

Name Type Description Default
path str | pathlib.Path The data source. required
table_name str | None An optional name to use for the created table. This defaults to a sequentially generated name. None
**kwargs typing.Any Additional keyword arguments passed to the backend loading function. {}

Returns

Type Description
ibis.ibis.Table The just-registered table

sql

sql(self, query, schema=None, dialect=None)

table

table(self, name, database=None)

Construct a table expression.

Parameters

Name Type Description Default
name str Table name required
database str | None Database name None

Returns

Type Description
Table Table expression

to_pyarrow

to_pyarrow(self, expr, *, params=None, limit=None, external_tables=None, **kwargs)

Execute expression and return results in as a pyarrow table.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to export to pyarrow required
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
kwargs typing.Any Keyword arguments {}

Returns

Type Description
Table A pyarrow table holding the results of the executed expression.

to_pyarrow_batches

to_pyarrow_batches(self, expr, *, limit=None, params=None, external_tables=None, chunk_size=1000000, **_)

Execute expression and return an iterator of pyarrow record batches.

This method is eager and will execute the associated expression immediately.

Parameters

Name Type Description Default
expr ibis.ibis.Expr Ibis expression to export to pyarrow required
limit int | str | None An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py. None
params collections.abc.Mapping[ibis.ibis.Scalar, typing.Any] | None Mapping of scalar parameter expressions to value. None
external_tables collections.abc.Mapping[str, typing.Any] | None External data None
chunk_size int Maximum number of row to return in a single chunk 1000000

Returns

Type Description
results RecordBatchReader

Notes

There are a variety of ways to implement clickhouse -> record batches.

  1. FORMAT ArrowStream -> record batches via raw_query This has the same type conversion problem(s) as to_pyarrow. It’s harder to address due to lack of cast on RecordBatch. However, this is a ClickHouse problem: we should be able to get string data out without a bunch of settings/permissions rigmarole.
  2. Native -> Python objects -> pyarrow batches This is what is implemented, using query_column_block_stream.
  3. Native -> Python objects -> DataFrame chunks -> pyarrow batches This is not implemented because it adds an unnecessary pandas step in between Python object -> arrow. We can go directly to record batches without pandas in the middle.

truncate_table

truncate_table(self, name, database=None)

Back to top