Skip to content

image/svg+xml Ibis

Expressive analytics in Python at any scale.

Installation

pip install ibis-framework
conda install -c conda-forge ibis-framework
mamba install -c conda-forge ibis-framework

Try it out!

python -c 'import ibis; print(ibis.__version__)'

Features

SQL Coverage

Anything you can write in a SELECT statement you can write in Ibis.

SQL
SELECT f, sum(a + b) AS d
FROM t
GROUP BY f
Ibis
t.group_by("f").aggregate(d=t.a + t.b.sum())
SQL
SELECT exp(t.a) AS d
FROM t
LEFT SEMI JOIN s
  ON t.x = t.y
Ibis
t.semi_join(s, t.x == t.y).select([lambda t: t.a.exp().name("d")])
SQL
SELECT *, avg(x) OVER (PARTITION BY y) as z
FROM t
Ibis
t.group_by("y").mutate(z=t.x.avg())

Coming from SQL?

Check out Ibis for SQL Programmers

Abstract Over SQL Dialects

No more rewrites when scaling up or down.

con = ibis.sqlite.connect("my_sqlite.db")
con = ibis.postgres.connect(user="me", host="my_computer", port=9090)
con = ibis.bigquery.connect(project_id="my_project_id", dataset_id="my_dataset_id")
t = con.table("t")
t.group_by("y").mutate(z=t.x.avg())

Ecosystem

Ibis builds on top of and works with existing Python tools.

t.semi_join(s, t.x == t.y).select([lambda t: t.a.exp().name("d")]).head(2)
df = expr.execute()  # a pandas DataFrame!

Example

Let's compute the number of citizens per squared kilometer in Asia:

>>> import ibis
>>> db = ibis.sqlite.connect("geography.db")
>>> countries = db.table("countries")
>>> asian_countries = countries.filter(countries.continent == "AS")
>>> density_in_asia = asian_countries.population.sum() / asian_countries.area_km2.sum()
>>> density_in_asia.execute()
130.7019141926602

Learn more!

Learn more about Ibis in our tutorial.

Comparison to other tools

Coming from SQL?

Check out Ibis for SQL Programmers!

Ibis gives you the benefit of a programming language. You don't need to sacrifice maintainability to get to those insights!

docs/example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import ibis

con = ibis.sqlite.connect("crunchbase.db")

c = con.table("companies")
i = con.table("investments")

expr = (
    c.left_join(i, c.permalink == i.company_permalink)
    .group_by(investor_name=ibis.coalesce(i.investor_name, "NO INVESTOR"))
    .aggregate(
        num_investments=c.permalink.nunique(),
        acq_ipos=(
            c.status.isin(("ipo", "acquired"))
            .ifelse(c.permalink, ibis.NA)
            .nunique()
        ),
    )
    .mutate(acq_rate=lambda t: t.acq_ipos / t.num_investments)
    .sort_by(ibis.desc(2))
)
docs/example.sql
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
SELECT
  *,
  CAST(acq_ipos / num_investments AS FLOAT) AS acq_rate
FROM (
  SELECT
    COALESCE(i.investor_name, 'NO INVESTOR') AS investor_name,
    COUNT(DISTINCT c.permalink) AS num_investments,
    COUNT(
      DISTINCT
        CASE
          WHEN c.status IN ('ipo', 'acquired') THEN c.permalink
          ELSE NULL
        END
    ) AS acq_ipos
  FROM companies AS c
  LEFT JOIN investments AS i
    ON c.permalink = i.company_permalink
  GROUP BY 1
  ORDER BY 2 DESC
)

Ibis aims to be more concise and composable than SQLAlchemy when writing interactive analytics code.

Ibis ❤'s SQLAlchemy

Ibis generates SQLAlchemy expressions for some of our backends including the PostgreSQL and SQLite backends!

docs/example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import ibis

con = ibis.sqlite.connect("crunchbase.db")

c = con.table("companies")
i = con.table("investments")

expr = (
    c.left_join(i, c.permalink == i.company_permalink)
    .group_by(investor_name=ibis.coalesce(i.investor_name, "NO INVESTOR"))
    .aggregate(
        num_investments=c.permalink.nunique(),
        acq_ipos=(
            c.status.isin(("ipo", "acquired"))
            .ifelse(c.permalink, ibis.NA)
            .nunique()
        ),
    )
    .mutate(acq_rate=lambda t: t.acq_ipos / t.num_investments)
    .sort_by(ibis.desc(2))
)
docs/sqlalchemy_example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import sqlalchemy as sa

c = sa.table("companies")
i = sa.table("investments")

a = (
    sa.select(
        [
            sa.case(
                [(i.c.investor_name.is_(None), "NO INVESTOR")],
                else_=i.c.investor_name,
            ).label("investor_name"),
            sa.func.count(c.c.permalink.distinct()).label("num_investments"),
            sa.func.count(
                sa.case(
                    [(c.status.in_(("ipo", "acquired")), c.c.permalink)]
                ).distinct()
            ).label("acq_ipos"),
        ]
    )
    .select_from(
        c.join(
            i, onclause=c.c.permalink == i.c.company_permalink, isouter=True
        )
    )
    .group_by(1)
    .order_by(sa.desc(2))
)
expr = sa.select([(a.c.acq_ipos / a.c.num_investments).label("acq_rate")])

What's Next?

Need a specific backend?

Take a look at the backends documentation!

Interested in contributing?

Get started by setting up a development environment!


Last update: April 4, 2022