While Ibis goes to great lengths to help you avoid the problems associated with hand-building raw SQL strings, there are few use cases where you may need to use SQL strings in your Ibis code:
You’re porting a large amount of existing code and need to get it working quickly without rewriting it all to ibis expressions
You need to access functionality that isn’t yet available in a public Ibis API
You need to access functionality that may never exist in Ibis, such as extremely backend-specific DDL.
You’re in a flow state and you can’t be bothered to figure out how the SQL your brain generates is spelled in Ibis.
For these situations and others, Ibis has you covered.
Setup
We’ll use DuckDB to illustrate the concepts here, but the ideas and code generalize to other SQL backends.
Now, SELECT * on a table expressions is not particularly useful and in fact is a bit wasteful: there’s no need to reselect a table expression.
The real power of Table.sql is composing it with other ibis expressions:
( t.sql(""" SELECT species, island, mad(bill_length_mm) AS bill_mad FROM penguins GROUP BY 1, 2 """ ) .filter(_.bill_mad >2) .order_by(_.bill_mad.desc()))
This method is powerful and you can mix and match Ibis expressions as you like:
( t.sql(""" SELECT species, island, mad(bill_length_mm) AS bill_mad FROM penguins GROUP BY 1, 2 """ ) .filter(_.bill_mad >2) .alias("big_bills") # note the alias call for subsequent '.sql' calls .sql("SELECT * FROM big_bills ORDER BY 3 DESC"))
You can also pass SQL strings from SQL dialects that do not match the backend you’re using by passing a dialect name to the dialect argument of .sql.
For example, here’s MySQL syntax running against DuckDB (note the use of backticks for quoting).
( t.sql(""" SELECT `species`, `island`, mad(`bill_length_mm`) AS bill_mad FROM `penguins` GROUP BY 1, 2 """, dialect="mysql", ) .filter(_.bill_mad >2) .alias("big_bills") # note the alias call for subsequent '.sql' calls .sql("SELECT * FROM big_bills ORDER BY 3 DESC"))
1
By default the dialect is the backend’s native dialect.
This feature is useful if you’re porting existing SQL from one backend to another.
Backend.sql
Backend.sql supports the dialect argument.
There’s also the Backend.sql method, which can handle arbitrary SELECT statements as well and returns an Ibis table expression.
The main difference with Table.sql is that Backend.sqlcan only refer to tables that already exist in the database, because the API is defined on Backend instances.
After calling Backend.sql, however, you’re able to mix and match similar to Table.sql:
( con.sql(""" SELECT species, island, mad(bill_length_mm) AS bill_mad FROM penguins GROUP BY 1, 2 """ ) .filter(_.bill_mad >2) .alias("big_bills") # note the alias call for subsequent '.sql' calls .sql("SELECT * FROM big_bills ORDER BY 3 DESC"))
At the lowest level there’s Backend.raw_sql which is for those situations where you need to run arbitrary SQL–like a CREATE statement–that cannot be modeled as a table expression.
Backend.raw_sql accepts a SQL string, executes it and returns the cursor associated with the SQL statement’s execution.
You may need to close the cursor returned from raw_sql to avoid leaking resources
Failure to do so can result in a variety of errors and hard-to-debug behaviors.
For DDL statements, you may not need to close the cursor since DDL statements do not produce results.
Depending on the backend you may have to experiment to see when closing the cursor is necessary.
In most cases a cursor returned from a SELECT statement requires a call to close().
The easiest way to do this is to use a context manager:
from contextlib import closingwith closing(con.raw_sql("SELECT * FROM RANGE(10)")) as c: ... # do something with c if necessary