Using SQL strings with Ibis

While Ibis goes to great lengths to help you avoid the problems associated with hand-building raw SQL strings, there are few use cases where you may need to use SQL strings in your Ibis code:

You’re porting a large amount of existing code and need to get it working quickly without rewriting it all to ibis expressions
You need to access functionality that isn’t yet available in a public Ibis API
You need to access functionality that may never exist in Ibis, such as extremely backend-specific DDL.
You’re in a flow state and you can’t be bothered to figure out how the SQL your brain generates is spelled in Ibis.

For these situations and others, Ibis has you covered.

Setup

We’ll use DuckDB to illustrate the concepts here, but the ideas and code generalize to other SQL backends.

import ibis
from ibis import _

ibis.options.interactive = True

con = ibis.connect("duckdb://")

t = ibis.examples.penguins.fetch(backend=con, table_name="penguins")

1: Connect to an in-memory DuckDB database
2: Read in the penguins example with our DuckDB database, and name it penguins

`Table.sql`

At the highest level there’s the Table.sql method. This method allows you to run arbitrary SELECT statements against a table expression:

t.sql("SELECT * FROM penguins")

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │               193 │        3450 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │               190 │        3650 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │               181 │        3625 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │               195 │        4675 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │               193 │        3475 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │               190 │        4250 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Now, SELECT * on a table expressions is not particularly useful and in fact is a bit wasteful: there’s no need to reselect a table expression.

The real power of Table.sql is composing it with other ibis expressions:

(
   t.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .order_by(_.bill_mad.desc())
)

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species   ┃ island    ┃ bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ string    │ string    │ float64  │
├───────────┼───────────┼──────────┤
│ Chinstrap │ Dream     │     2.45 │
│ Adelie    │ Torgersen │     2.20 │
│ Gentoo    │ Biscoe    │     2.10 │
└───────────┴───────────┴──────────┘

This method is powerful and you can mix and match Ibis expressions as you like:

(
   t.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species   ┃ island    ┃ bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ string    │ string    │ float64  │
├───────────┼───────────┼──────────┤
│ Chinstrap │ Dream     │     2.45 │
│ Adelie    │ Torgersen │     2.20 │
│ Gentoo    │ Biscoe    │     2.10 │
└───────────┴───────────┴──────────┘

Working with different SQL dialects

You can also pass SQL strings from SQL dialects that do not match the backend you’re using by passing a dialect name to the dialect argument of .sql.

For example, here’s MySQL syntax running against DuckDB (note the use of backticks for quoting).

(
   t.sql(
      """
      SELECT
        `species`,
        `island`,
        mad(`bill_length_mm`) AS bill_mad
      FROM `penguins`
      GROUP BY 1, 2
      """,
      dialect="mysql",
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)

1: By default the dialect is the backend’s native dialect.

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species   ┃ island    ┃ bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ string    │ string    │ float64  │
├───────────┼───────────┼──────────┤
│ Chinstrap │ Dream     │     2.45 │
│ Adelie    │ Torgersen │     2.20 │
│ Gentoo    │ Biscoe    │     2.10 │
└───────────┴───────────┴──────────┘

This feature is useful if you’re porting existing SQL from one backend to another.

`Backend.sql`

Backend.sql supports the dialect argument.

There’s also the Backend.sql method, which can handle arbitrary SELECT statements as well and returns an Ibis table expression.

The main difference with Table.sql is that Backend.sql can only refer to tables that already exist in the database, because the API is defined on Backend instances.

After calling Backend.sql, however, you’re able to mix and match similar to Table.sql:

(
   con.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species   ┃ island    ┃ bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ string    │ string    │ float64  │
├───────────┼───────────┼──────────┤
│ Chinstrap │ Dream     │     2.45 │
│ Adelie    │ Torgersen │     2.20 │
│ Gentoo    │ Biscoe    │     2.10 │
└───────────┴───────────┴──────────┘

`Backend.raw_sql`

At the lowest level there’s Backend.raw_sql which is for those situations where you need to run arbitrary SQL–like a CREATE statement–that cannot be modeled as a table expression.

Backend.raw_sql accepts a SQL string, executes it and returns the cursor associated with the SQL statement’s execution.

You may need to close the cursor returned from raw_sql to avoid leaking resources

Failure to do so can result in a variety of errors and hard-to-debug behaviors.

For DDL statements, you may not need to close the cursor since DDL statements do not produce results.

Depending on the backend you may have to experiment to see when closing the cursor is necessary.

In most cases a cursor returned from a SELECT statement requires a call to close().

The easiest way to do this is to use a context manager:

from contextlib import closing


with closing(con.raw_sql("SELECT * FROM RANGE(10)")) as c:
    ...  # do something with c if necessary