Overview
Welcome to the first public roadmap for the Ibis project! If you aren’t familiar with the background of Ibis or who supports it nowadays, we recommend reading why Voltron Data supports Ibis before the roadmap below.
Roadmap GitHub project
We have a public roadmap as a GitHub project! You can follow along with specific higher-level issues you are interested in there to keep track of our progress.
We are early in our use of this GitHub project so please pardon any disorganization as we get it up and running efficiently. We have:
- Roadmap view: consisting of meta-issues in their respective repositories for high-level objectives of the Ibis project
- Triage view: consisting of new issues across Ibis project repositories that need to be triaged
- Backlog view: consisting of issues that have been triaged (assigned a priority) and are on the backlog
- TODO view: consisting of issues that are in progress or ready to be worked on soon
- Label-specific views: consisting of issues for specific labels, like documentation or a large refactor
Right now, the team at Voltron Data sets the roadmap and priorities. Over time as more contributors and organizations join the project, we expect this process to diversify and become more community-driven. We’d love to have you involved in the process! Join us on Zulip or interact with us on GitHub to get involved and join the decision making process for Ibis.
Repeating from Why Voltron Data supports Ibis:
Ibis is independently governed and not owned by Voltron Data. Currently, four out of five members of the steering committee are employed by Voltron Data (the fifth being at Alphabet working on Google BigQuery). We are working toward a more diverse representation of companies and organizations as the project continues to grow.
Voltron Data also welcomes this dilution of power and influence! A healthy open source project is one that is not controlled by a single entity. This is true of Apache Arrow and other open source projects that Voltron Data employees have been instrumental in building.
Main themes
Our top five themes for 2024 include:
Ibis backends: Ibis is a Python frontend for many backends. To continue scaling to more backends, we need to complete a major rework of library internals and stabilize the API for backend authors. Related work in this area will make it easier than ever to create new Ibis backends and maintain them. This work will also include improving backend interfaces for operations like table creation, insertion, and upsertion. This theme allows Ibis to deliver on the promise of a single Python dataframe API that can be written once and run on any execution engine.
Ibis for ML: Increasingly, data projects are ML projects. Ibis can uniquely help with feature engineering and other ML tasks connecting your data where it lives to ML models. We will continue to improve Ibis for ML use cases. This theme allows Ibis to cover more of the data and MLOps lifecycle, with efficient feature engineering and handoff to ML training frameworks.
Ibis for streaming data: Ibis has only been for batch data until very recently. With the addition of the first streaming backends, we will continue to improve Ibis for streaming data use cases and bridge the gap between batch and streaming data. This theme allows Ibis to expand its promise of a single Python dataframe to stream processing, too.
Ibis for geospatial: Ibis has a rich set of geospatial expressions, but most backends do not implement them. We will continue to improve Ibis for geospatial use cases and bridge the gap between geospatial data and other data types. This theme allows Ibis to cover more of the data lifecycle for geospatial data.
Ibis community: Ibis is an open source project and we want to make it as easy as possible for new contributors to get involved. We will continue to improve the Ibis community and make it easier than ever to contribute to Ibis. This theme is critical for Ibis to continue to grow and thrive as an open source project. We aim to delight our community and make it easy to get involved.
We believe these themes will help Ibis as a standard Python interface for many backends and real-world data use cases.
The big refactor
The biggest item in Q1 2024 and primary focus of the core Ibis team right now is the big refactor – dubbed “the epic split” – continuing the great work completed by Krisztián in his PR splitting the relational operations. You can read more details in that PR, but the gist is that a new intermediary representation for Ibis expressions is being has been created that drastically simplifies the codebase.
With that refactor in place, each backend Ibis supports needs to be moved to the new relational model. As a consequence, we are also swapping out SQLAlchemy for SQLGlot. We are losing out on some of the things SQLAlchemy did for us automatically, but overall this gives us a lot more control over the SQL that is generated, reduces dependency overhead, and simplifies the codebase further. This was recently merged into main and will be released in Ibis 9.0.
Look out for a blog post dedicated to the refactor soon!
Ibis for ML preprocessing and feature engineering
Data projects are increasingly ML projects. pandas and scikit-learn are the default for Python users, but tend to lack scalability. Many projects look to address this and Ibis does not intend on duplicating effort here. Instead, we want to leverage what sets Ibis apart – the ability to have a single Python API that scales across many backends – to feature engineering and other ML preprocessing tasks ahead of model training.
Jim took this on over the last few months, building up the IbisML package to a usable (but still early) state. We will further invest in IbisML this year to get it to a production-ready state, bringing the power of Ibis to ML preprocessing and feature engineering.
We’re excited to welcome the (former) Claypot AI team to Voltron Data to help drive this work forward! Expect a release announcement for IbisML soon covering the majority of feature engineering operations and handoff to popular ML training frameworks.
I’ve been working on a new LLM integration for Ibis called ibis-birdbrain
. It’s highly experimental and still a work in progress, but keep an eye out for more details soon!
Streaming data backends
With the release of Ibis 8.0, we added support for Apache Flink in collaboration with Claypot AI, the first dedicated streaming data backend for Ibis. We are also excited to have the (former) Claypot team helping us drive this forward!
We’ve also collaborated with RisingWave on the second streaming backend, also released in Ibis 8.0. This backend is still early and fairly experimental, but demonstrates the ability for Ibis to quickly add new backends. We can now add batch and streaming backend with ease!
We will continue to improve Ibis for streaming data use cases and bridge the gap in the dataframe API experience between batch and streaming data. This may mean new backends, though we expect to focus on improving the existing backends and the API for streaming data. If you’re interested in contributing to Ibis and have expertise in a streaming engine, a new streaming backend is a great way to get involved!
Geospatial improvements
Ibis supports 50+ geospatial expressions in the API, but most backends do not implement them. We will continue to improve the coverage for geospatial operations across backends. Naty recently added support in the DuckDB backend, making Ibis a great option for local geospatial analysis.
This is a great opportunity for new contributors to get involved with Ibis! Let us know if you’re interested in adding geospatial support to your favorite backend.
Community engagement
Hello! Expect to see an increased presence from the Ibis project in the form of blogs, conference talks, video content, and more in 2024. Join us on Zulip to discuss ideas and get involved!
We would love to onboard new contributors to the project.
Other themes
We have a few other themes for 2024 that are lower priority.
New backends
Adding new backends is not a priority for the Ibis team at Voltron Data in Q1. Instead, we are focusing on the big refactor and other internal library improvements to get Ibis to the point where adding new backends is much easier and maintanable. That will take the form of stabilizing the new intermediary representation, separating out connection from compilation steps, and solidifying the API for backend authors. We will also introduce new documentation and possibly testing frameworks to ease the burden of adding new backends.
We are still happy to support new backends! With contributions from the community, we have recently added:
- Apache Flink
- Exasol
- RisingWave
Adding a new backend is a great way to get involved with Ibis! If you’re interested, join us on Zulip and let us know or open an issue on GitHub.
Logo and website design
We will likely redesign the logo (initially created by Tim Swast, thanks Tim! It has served us well!) and website theme. We aim to keep the website simple and focused on documentation that helps users, but want to deviate from the default themes in Quarto to make Ibis stand out.
Documentation
“When you’re
sellingdistributing free and open source software, the documentation is the product.” - old tech adage, origin unknown
A few months ago, we moved our documentation to Quarto and revamped most of the website along the way. We will continue improving the documentation with backend-specific getting started tutorials, how-to guides for common tasks, improved API references, improving the website search functionality, and more!
Improving the documentation is a great way to get involved with Ibis!
Beyond Q1 2024
This writeup of our roadmap is heavily focused on Q1 of 2024. Looking beyond, our priorities remain much the same. After the big refactor is done, we will continue improving our library internals, backend interface, and ensuring the longevity of Ibis. We’ll continue improving ML, streaming, and geospatial support.
Expect an updated roadmap blog in the second half of the year for more details!
Next steps
It’s never been a better time to get involved with Ibis. Join us on Zulip and introduce yourself!