One of the most powerful features of Ibis is the separation of transformation logic from the execution engine, which allows you to “write once, execute everywhere”.
Unbound tables
In Ibis, you can define unbound tables. An unbound table is a table with a specified schema but not connected to a data source. You can think of it as an empty spreadsheet with just the header. Even though the spreadsheet is empty, you know what the data would look like.
Unbound tables allow you to write transformations for data as long as it conforms to the provided schema. You don’t need to connect to a data source until you’re ready to execute the expression and compute outputs.
Execute an unbound expression
Here’s how we can define an unbound table in Ibis:
UnboundTable: diamonds
carat float64
cut string
color string
clarity string
depth float64
table float64
price int64
x float64
y float64
z float64
So far, we have an empty diamonds table that contains 10 columns. Even though there is no data in the diamonds table right now, we can write transformations knowing that these are the columns available to us.
Given this table of diamonds of various carats, cuts, and colors, we’re interested in learning the average carat for each color of premium and ideal diamonds. In order to do this, we can first calculate the average carat for each color and cut of diamonds, then make a pivot table to show the results:
Now that we’re ready to compute results, we can connect to any of Ibis’ supported backends. This feature logic can be reused and you don’t need to modify it again!
This is a dataset that we can process locally. Let’s connect to DuckDB and load the data into a DuckDB table:
parquet_dir ="diamonds.parquet"# download data into a local fileibis.examples.diamonds.fetch().to_parquet(parquet_dir)con = ibis.duckdb.connect()con.read_parquet(parquet_dir, table_name="diamonds")
DatabaseTable: diamonds
carat float64
cut string
color string
clarity string
depth float64
table float64
price int64
x float64
y float64
z float64
Connecting to this DuckDB table and executing the transformation on the loaded data is now as simple as
con.to_pandas(expr)
color
Ideal
Premium
0
G
0.700715
0.841488
1
I
0.913029
1.144937
2
D
0.565766
0.721547
3
E
0.578401
0.717745
4
J
1.063594
1.293094
5
H
0.799525
1.016449
6
F
0.655829
0.827036
Voilà!
If you want to continue to work with the data in DuckDB, you can create a new table and insert the outputs into it like so:
Because Ibis separates the transformation logic from the execution engine, you can easily reuse the written transformation for another backend. Here we use Polars as an example, but you can do the same for any of Ibis’ nearly 20 supported backends as long as that particular backend supports the operations (see the operation support matrix).