Ibis sneak peek: writing to files

blog
io
new feature
sneak peek
Author

Kae Suarez

Published

March 9, 2023

Ibis 5.0 is coming soon and will offer new functionality and fixes to users. To enhance clarity around this process, we’re sharing a sneak peek into what we’re working on.

In Ibis 4.0, we added the ability to read CSVs and Parquet via the Ibis interface. We felt this was important because, well, the ability to read files is simply necessary, be it on a local scale, legacy data, data not yet in a database, and so on. However, for a user, the natural next question was “can I go ahead and write when I’m done?” The answer was no. We didn’t like that, especially since we do care about file-based use cases.

So, we’ve gone ahead and fixed that for Ibis 5.0.

Files in, Files out

Before we can write a file, we need data — so let’s read in a file, to start this off:

import ibis

ibis.options.interactive = True

t = ibis.read_csv(
    "https://storage.googleapis.com/ibis-examples/data/penguins.csv.gz"
)
t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Of course, we could just write out, but let’s do an operation first — how about using selectors, which you can read more about here? Self-promotion aside, here’s an operation:

from ibis import _
import ibis.selectors as s

expr = (
    t.group_by("species")
     .mutate(s.across(s.numeric() & ~s.cols("year"), (_ - _.mean()) / _.std()))
)
expr
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Gentoo Biscoe-0.455854-1.816223-0.954050-1.142626female2007 │
│ Gentoo Biscoe-0.975022-0.287513-0.491442-0.448342female2009 │
│ Gentoo Biscoe0.387793-0.898997-1.108253-1.241809female2007 │
│ Gentoo Biscoe0.8096160.2220560.1253681.237778male  2007 │
│ Gentoo Biscoe0.030865-0.491341-0.3372400.642677male  2007 │
│ Gentoo Biscoe-0.326062-1.510481-1.108253-1.043442female2007 │
│ Gentoo Biscoe-0.682990-0.389427-0.954050-0.547525female2007 │
│ Gentoo Biscoe-0.2611670.3239700.2795710.245943male  2007 │
│ Gentoo Biscoe-1.364397-1.612395-1.262455-1.340993female2007 │
│ Gentoo Biscoe-0.2287190.425884-0.3372400.146759male  2007 │
│  │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Now, finally, time to do the exciting part:

expr.to_parquet("normalized.parquet")

Like many things in Ibis, this is as simple and plain-looking as it is important. Being able to create files from Ibis instead of redirecting into other libraries first enables operation at larger scales and fewer steps. Where desired, you can address a backend directly to use its native export functionality — we want to make sure you have the flexibility to use Ibis or the backend as you see fit.

Wrapping Up

Ibis is an interface tool for analytical engines that can reach scales far beyond a laptop. Files are important to Ibis because:

  • Ibis also supports local execution, where files are the standard unit of data — we want to support all our users.
  • Files are useful for moving between platforms, and long-term storage that isn’t tied to a particular backend.
  • Files can move more easily between our backends than database files, so we think this adds some convenience for the multi-backend use case.

We’re excited to release this functionality in Ibis 5.0.

Interested in Ibis? Docs are available on this very website, at:

and the repo is always at:

Please feel free to reach out on GitHub!

Back to top