Skip to content



Install ibis and dependencies for the PySpark backend:

pip install 'ibis-framework[pyspark]'
conda install -c conda-forge ibis-pyspark
mamba install -c conda-forge ibis-pyspark



Create a client by passing in PySpark things to ibis.pyspark.connect.

See ibis.backends.pyspark.Backend.do_connect for connection parameter information.

ibis.pyspark.connect is a thin wrapper around ibis.backends.pyspark.Backend.do_connect.

Connection Parameters

do_connect(self, session)

Create a PySpark Backend for use with Ibis.


session A SparkSession instance


import ibis import pyspark session = pyspark.sql.SparkSession.builder.getOrCreate() ibis.pyspark.connect(session)

Backend API

Backend (BaseSQLBackend)


current_database property readonly

Return the name of the current database.

Backends that don't support different databases will return None.


str | None Name of the current database.

version property readonly

Return the version of the backend engine.

For database servers, return the server version.

For others such as SQLite and pandas return the version of the underlying library or application.


str The backend version



Close Spark connection and drop any temporary objects.

compile(self, expr, timecontext=None, params=None, *args, **kwargs)

Compile an ibis expression to a PySpark DataFrame object.

compute_stats(self, name, database=None, noscan=False)

Issue a COMPUTE STATISTICS command for a given table.


name Table name database Database name noscan If True, collect only basic statistics for the table (number of rows, size in bytes).

create_database(self, name, path=None, force=False)

Create a new Spark database.


name Database name path Path where to store the database data; otherwise uses Spark default

create_table(self, table_name, obj=None, schema=None, database=None, force=False, format='parquet')

Create a new table in Spark.


table_name Table name obj If passed, creates table from select statement results schema Mutually exclusive with obj, creates an empty table with a schema database Database name force If true, create table if table with indicated name already exists format Table format


con.create_table('new_table_name', table_expr) # doctest: +SKIP

create_view(self, name, expr, database=None, can_exist=False, temporary=False)

Create a Spark view from a table expression.


name View name expr Expression to use for the view database Database name can_exist Replace an existing view of the same name if it exists temporary Whether the table is temporary

drop_database(self, name, force=False)

Drop a Spark database.


name Database name force If False, Spark throws exception if database is not empty or database does not exist

drop_table(self, name, database=None, force=False)

Drop a table.

drop_table_or_view(self, name, database=None, force=False)

Drop a Spark table or view.


name Table or view name database Database name force Database may throw exception if table does not exist


table = 'my_table' db = 'operations' con.drop_table_or_view(table, db, force=True) # doctest: +SKIP

drop_view(self, name, database=None, force=False)

Drop a view.

execute(self, expr, timecontext=None, params=None, limit='default', **kwargs)

Execute an expression.

has_operation(cls, operation)

Return whether the backend implements support for operation.


operation A class corresponding to an operation.


bool Whether the backend implements the operation.


import ibis import ibis.expr.operations as ops ibis.sqlite.has_operation(ops.ArrayIndex) False ibis.postgres.has_operation(ops.ArrayIndex) True

insert(self, table_name, obj=None, database=None, overwrite=False, values=None, validate=True)

Insert data into an existing table.


table = 'my_table' con.insert(table, table_expr) # doctest: +SKIP

Completely overwrite contents

con.insert(table, table_expr, overwrite=True) # doctest: +SKIP

list_databases(self, like=None)

List existing databases in the current connection.


like A pattern in Python's regex format to filter returned database names.


list[str] The database names that exist in the current connection, that match the like pattern if provided.

list_tables(self, like=None, database=None)

Return the list of table names in the current database.

For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.


like : str, optional A pattern in Python's regex format. database : str, optional The database to list tables of, if not the current one.


list[str] The list of the table names that match the pattern like.

raw_sql(self, query)

Execute a query string.

Could have unexpected results if the query modifies the behavior of the session in a way unknown to Ibis; be careful.


query DML or DDL statement


Any Backend cursor

set_database(self, name)

DEPRECATED: set_database is deprecated as of v2.0; use a new connection to the database

table(self, name, database=None)

Return a table expression from a table or view in the database.


name Table name database Database in which the table resides


Table Table named name from database

truncate_table(self, table_name, database=None)

Delete all rows from an existing table.


table_name Table name database Database name

Last update: March 1, 2022