Using SQL strings with Ibis

While Ibis goes to great lengths to help you avoid the problems associated with hand-building raw SQL strings, there are few use cases where you may need to use SQL strings in your Ibis code:

For these situations and others, Ibis has you covered.

Setup

We’ll use DuckDB to illustrate the concepts here, but the ideas and code generalize to other SQL backends.

import ibis
from ibis import _

ibis.options.interactive = True

con = ibis.connect("duckdb://")

t = ibis.examples.penguins.fetch(backend=con, table_name="penguins")
1
Connect to an in-memory DuckDB database
2
Read in the penguins example with our DuckDB database, and name it penguins

Table.sql

At the highest level there’s the Table.sql method. This method allows you to run arbitrary SELECT statements against a table expression:

t.sql("SELECT * FROM penguins")
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Now, SELECT * on a table expressions is not particularly useful and in fact is a bit wasteful: there’s no need to reselect a table expression.

The real power of Table.sql is composing it with other ibis expressions:

(
   t.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .order_by(_.bill_mad.desc())
)
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species    island     bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringfloat64  │
├───────────┼───────────┼──────────┤
│ ChinstrapDream    2.45 │
│ Adelie   Torgersen2.20 │
│ Gentoo   Biscoe   2.10 │
└───────────┴───────────┴──────────┘

This method is powerful and you can mix and match Ibis expressions as you like:

(
   t.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species    island     bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringfloat64  │
├───────────┼───────────┼──────────┤
│ ChinstrapDream    2.45 │
│ Adelie   Torgersen2.20 │
│ Gentoo   Biscoe   2.10 │
└───────────┴───────────┴──────────┘

Working with different SQL dialects

You can also pass SQL strings from SQL dialects that do not match the backend you’re using by passing a dialect name to the dialect argument of .sql.

For example, here’s MySQL syntax running against DuckDB (note the use of backticks for quoting).

(
   t.sql(
      """
      SELECT
        `species`,
        `island`,
        mad(`bill_length_mm`) AS bill_mad
      FROM `penguins`
      GROUP BY 1, 2
      """,
      dialect="mysql",
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)
1
By default the dialect is the backend’s native dialect.
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species    island     bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringfloat64  │
├───────────┼───────────┼──────────┤
│ ChinstrapDream    2.45 │
│ Adelie   Torgersen2.20 │
│ Gentoo   Biscoe   2.10 │
└───────────┴───────────┴──────────┘

This feature is useful if you’re porting existing SQL from one backend to another.

Backend.sql

Backend.sql supports the dialect argument.

There’s also the Backend.sql method, which can handle arbitrary SELECT statements as well and returns an Ibis table expression.

The main difference with Table.sql is that Backend.sql can only refer to tables that already exist in the database, because the API is defined on Backend instances.

After calling Backend.sql, however, you’re able to mix and match similar to Table.sql:

(
   con.sql(
      """
      SELECT
        species,
        island,
        mad(bill_length_mm) AS bill_mad
      FROM penguins
      GROUP BY 1, 2
      """
   )
   .filter(_.bill_mad > 2)
   .alias("big_bills")  # note the alias call for subsequent '.sql' calls
   .sql("SELECT * FROM big_bills ORDER BY 3 DESC")
)
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ species    island     bill_mad ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ stringstringfloat64  │
├───────────┼───────────┼──────────┤
│ ChinstrapDream    2.45 │
│ Adelie   Torgersen2.20 │
│ Gentoo   Biscoe   2.10 │
└───────────┴───────────┴──────────┘

Backend.raw_sql

At the lowest level there’s Backend.raw_sql which is for those situations where you need to run arbitrary SQL–like a CREATE statement–that cannot be modeled as a table expression.

Backend.raw_sql accepts a SQL string, executes it and returns the cursor associated with the SQL statement’s execution.

You may need to close the cursor returned from raw_sql to avoid leaking resources

Failure to do so can result in a variety of errors and hard-to-debug behaviors.

For DDL statements, you may not need to close the cursor since DDL statements do not produce results.

Depending on the backend you may have to experiment to see when closing the cursor is necessary.

In most cases a cursor returned from a SELECT statement requires a call to close().

The easiest way to do this is to use a context manager:

from contextlib import closing


with closing(con.raw_sql("SELECT * FROM RANGE(10)")) as c:
    ...  # do something with c if necessary
Back to top