Skip to content

PySpark

Install

Install ibis and dependencies for the PySpark backend:

pip install 'ibis-framework[pyspark]'
conda install -c conda-forge ibis-pyspark
mamba install -c conda-forge ibis-pyspark

Connect

API

Create a client by passing in PySpark things to ibis.pyspark.connect.

See ibis.backends.pyspark.Backend.do_connect for connection parameter information.

ibis.pyspark.connect is a thin wrapper around ibis.backends.pyspark.Backend.do_connect.

Connection Parameters

do_connect(self, session)

Create a PySpark Backend for use with Ibis.

Parameters

session A SparkSession instance

Examples

import ibis import pyspark session = pyspark.sql.SparkSession.builder.getOrCreate() ibis.pyspark.connect(session)

Backend API

Backend (BaseSQLBackend)

Attributes

current_database property readonly

Return the name of the current database.

Backends that don't support different databases will return None.

Returns

str | None Name of the current database.

version property readonly

Return the version of the backend engine.

For database servers, return the server version.

For others such as SQLite and pandas return the version of the underlying library or application.

Returns

str The backend version

Methods

close(self)

Close Spark connection and drop any temporary objects.

compile(self, expr, timecontext=None, params=None, *args, **kwargs)

Compile an ibis expression to a PySpark DataFrame object.

compute_stats(self, name, database=None, noscan=False)

Issue a COMPUTE STATISTICS command for a given table.

Parameters

name Table name database Database name noscan If True, collect only basic statistics for the table (number of rows, size in bytes).

create_database(self, name, path=None, force=False)

Create a new Spark database.

Parameters

name Database name path Path where to store the database data; otherwise uses Spark default

create_table(self, table_name, obj=None, schema=None, database=None, force=False, format='parquet')

Create a new table in Spark.

Parameters

table_name Table name obj If passed, creates table from select statement results schema Mutually exclusive with obj, creates an empty table with a schema database Database name force If true, create table if table with indicated name already exists format Table format

Examples

con.create_table('new_table_name', table_expr) # doctest: +SKIP

create_view(self, name, expr, database=None, can_exist=False, temporary=False)

Create a Spark view from a table expression.

Parameters

name View name expr Expression to use for the view database Database name can_exist Replace an existing view of the same name if it exists temporary Whether the table is temporary

drop_database(self, name, force=False)

Drop a Spark database.

Parameters

name Database name force If False, Spark throws exception if database is not empty or database does not exist

drop_table(self, name, database=None, force=False)

Drop a table.

drop_table_or_view(self, name, database=None, force=False)

Drop a Spark table or view.

Parameters

name Table or view name database Database name force Database may throw exception if table does not exist

Examples

table = 'my_table' db = 'operations' con.drop_table_or_view(table, db, force=True) # doctest: +SKIP

drop_view(self, name, database=None, force=False)

Drop a view.

execute(self, expr, timecontext=None, params=None, limit='default', **kwargs)

Execute an expression.

has_operation(cls, operation)

Return whether the backend implements support for operation.

Parameters

operation A class corresponding to an operation.

Returns

bool Whether the backend implements the operation.

Examples

import ibis import ibis.expr.operations as ops ibis.sqlite.has_operation(ops.ArrayIndex) False ibis.postgres.has_operation(ops.ArrayIndex) True

insert(self, table_name, obj=None, database=None, overwrite=False, values=None, validate=True)

Insert data into an existing table.

Examples

table = 'my_table' con.insert(table, table_expr) # doctest: +SKIP

Completely overwrite contents

con.insert(table, table_expr, overwrite=True) # doctest: +SKIP

list_databases(self, like=None)

List existing databases in the current connection.

Parameters

like A pattern in Python's regex format to filter returned database names.

Returns

list[str] The database names that exist in the current connection, that match the like pattern if provided.

list_tables(self, like=None, database=None)

Return the list of table names in the current database.

For some backends, the tables may be files in a directory, or other equivalent entities in a SQL database.

Parameters

like : str, optional A pattern in Python's regex format. database : str, optional The database to list tables of, if not the current one.

Returns

list[str] The list of the table names that match the pattern like.

raw_sql(self, query)

Execute a query string.

Could have unexpected results if the query modifies the behavior of the session in a way unknown to Ibis; be careful.

Parameters

query DML or DDL statement

Returns

Any Backend cursor

set_database(self, name)

DEPRECATED: set_database is deprecated as of v2.0; use a new connection to the database

table(self, name, database=None)

Return a table expression from a table or view in the database.

Parameters

name Table name database Database in which the table resides

Returns

Table Table named name from database

truncate_table(self, table_name, database=None)

Delete all rows from an existing table.

Parameters

table_name Table name database Database name


Last update: March 1, 2022