Skip to content

PySpark

filebadge

exportbadge

Install

Install ibis and dependencies for the PySpark backend:

pip install 'ibis-framework[pyspark]'
conda install -c conda-forge ibis-pyspark
mamba install -c conda-forge ibis-pyspark

Connect

ibis.pyspark.connect

con = ibis.pyspark.connect(session=session)

ibis.pyspark.connect is a thin wrapper around ibis.backends.pyspark.Backend.do_connect.

The pyspark backend does not create SparkSession objects, you must create a SparkSession and pass that to ibis.pyspark.connect.

Connection Parameters

do_connect(session)

Create a PySpark Backend for use with Ibis.

Parameters:

Name Type Description Default
session SparkSession

A SparkSession instance

required

Examples:

>>> import ibis
>>> from pyspark.sql import SparkSession
>>> session = SparkSession.builder.getOrCreate()
>>> ibis.pyspark.connect(session)
<ibis.backends.pyspark.Backend at 0x...>

File Support

read_csv(source_list, table_name=None, **kwargs)

Register a CSV file as a table in the current database.

Parameters:

Name Type Description Default
source_list str | list[str] | tuple[str]

The data source(s). May be a path to a file or directory of CSV files, or an iterable of CSV files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any {}

read_parquet(source, table_name=None, **kwargs)

Register a parquet file as a table in the current database.

Parameters:

Name Type Description Default
source str | Path

The data source. May be a path to a file or directory of parquet files.

required
table_name str | None

An optional name to use for the created table. This defaults to a sequentially generated name.

None
kwargs Any {}

Last update: June 2, 2023