9.4.0 (2024-09-03)
Features
- api: add
approx_quantiles
for computing approximate quantiles (dcdb7a7) - api: add
DateValue.epoch
api for computing days since epoch (#9856) (8b0fb66) - api: make the
null
function deferrable (0613ef1) - api: support
SchemaLike
inBackend.create_table()
(#9885) (949fbea) - api: support deferred objects in
literal
(#9904) (0a07906) - clickhouse: partition kwargs for compile and execution in
to_pyarrow
andto_pandas
(2dd2c3f) - clickhouse: support ms/us/ns truncate units (9881edb)
- decompile: make the decompiler run on TPCH query 1 (#9779) (0268044)
- exasol: implement
approx_nunique
,std
,var
(d9c3daa) - exasol: implement
approx_nunique
,std
,var
(63c20c0) - exasol: implement
cov
/corr
(24f41b2) - exasol: implement
median
andapprox_median
(3cfc344) - exasol: implement
quantile
(ecbef94) - exasol: implement
Table.nunique
(a24200c) - exasol: implement
Table.nunique
(7ead7c7) - flink: array sort (ca85ae2)
- flink: support
ArrayValue.collect
(eb857e6) - impala: add
tbl_properties
tocreate_table
(#9839) (e3d02bd) - mssql: support connecting with a url (#9894) (8bb12e1), closes #9856
- oracle: implement
mode
aggregation (#9914) (9ee910d) - output-formats: add support for to_parquet_dir (#9781) (80dfbe2)
- polars: array sort (9a2563b)
- polars: implement approx_nunique (3f3738d)
- pyspark: support
quantile
(26d8516) - selectors: support naming deferreds in across (de1595c)
- snowflake: implement interval arithmetic (#9794) (41e10ca), closes #9783
- sql: enable cross-database joins (#9849) (c3ff6ae)
- sql: fuse
distinct
with other select nodes when possible (c31412b) - sqlite: support most date/timestamp interval arithmetic (75f594d)
- sql: load parsed but unsupported types as unknown (#9868) (a76acfc)
- sql: support inserts with default constraints (#9844) (86a3c06)
- timestamps: add support for timestamp/date +/- intervals for additional backends (#9799) (79cef68)
- trino: support years and months in datetime arithmetic (1133973)
- trino: wrap
auth
strings withBasicAuthentication
(#9960) (e0f54c9)
Bug Fixes
- bigquery: disallow column names longer than 300 characters (#9916) (ea97794), closes #8931
- clickhouse: workaround
EXCEPT
andINTERSECT
generation in sqlglot; add tpcds query 87 (#9959) (910b8f5) - datafusion: fix creation of SessionContext in datafusion 40.1.0 (eec5328)
- datafusion: handle
NULL
s in arrayflatten
(ecc199f) - deps: update dependency datafusion to v40 (4aa402a)
- deps: update dependency sqlglot to >=23.4,<25.11 (#9805) (84bfeb5)
- deps: update dependency sqlglot to >=23.4,<25.12 (#9834) (69a10d9)
- deps: update dependency sqlglot to >=23.4,<25.13 (#9851) (6780a6b)
- deps: update dependency sqlglot to >=23.4,<25.15 (#9864) (d182e9e)
- deps: update dependency sqlglot to >=23.4,<25.16 (#9875) (0a6765b)
- deps: update dependency sqlglot to >=23.4,<25.17 (#9907) (9e52edb)
- deps: update dependency sqlglot to >=23.4,<25.18 (#9935) (ee5116d)
- deps: update dependency sqlglot to >=23.4,<25.19 (#9962) (4c136d8)
- dot-sql: ensure that CTEs can be used in
.sql
(b63e0fd) - duckdb: fix create_table() in databases with spaces in the name (#9817) (9da3c9f)
- exasol: properly handle returning BIGINT values (e20bdad)
- ir: convert analytic functions to window functions in filters (31295dd)
- mssql: remove sort key to keep order (#9848) (3780a13)
- mssql: support
.cache()
for caching tables (1de2f45) - oracle: avoid double cursor closing by removing unnecessary
close
in_fetch_from_cursor
(#9913) (a402095) - oracle: implement current_catalog and current_database correctly (#9918) (4fdb707)
- pickle: make
Parameter
instances pickleable (#9798) (d772c80), closes #9793 - pivot-wider: handle the case of empty
id_cols
(#9912) (4a4bc64) - polars: use
drop_table
when cleaning the cache and remove duplicated_remove_table
method (#9922) (ce51941) - polars: use
flatten
API forArrayFlatten
implementation to avoid large string upcast (#9997) (7a6af8d), closes #9995 - pyspark: suppress errors for unsupported Databricks serverless compute properties (#9830) (57f5ff6)
- repr: format scalar values with same logic as columnar values (5cd58fa)
- rewrite: avoid accumulating context state during rewriting (#9814) (9165255)
- strings: correct a typo in
startswith
docstring (#9994) (f98aec5), closes #1000 #2000 #3000 - trino: remove hack that silently breaks join transpilation (#9941) (c842453)
- ux: get rid of duplicated tracebacks (#10002) (7df4bdd)
Documentation
- add aggregate-udfs api page to index (#9789) (01dc81e)
- blog: ibis + clickhouse + shiny for better pypi stats (#9880) (4d8d352)
- blog: kaggle competition using IbisML (#9505) (7320c18)
- blog: mention dask in the pandeprecation post (#9899) (2f2c3ed)
- blog: minor edits to pandas blog (#9920) (69a44c4)
- blog: post on why we are dropping the pandas backend (#9896) (95104e5)
- datatypes: install rich traceback handler for prettier exception tracebacks (#10004) (4b77e4f)
- fix doctests for postgres backend (#9964) (497df15)
- presentations: positconf 2024 talk (#9822) (e8b89e7)
- snowflake: document using private key to connect to snowflake (c70f55a)
- snowflake: fix the syntax for passing schema (#9831) (c99cb4b)
- snowflake: remove duplicate schema docstring (#9829) (3183bef)
Refactors
- aliasing: remove the need for renaming after execution (#9996) (a0d7237)
- api: make unit required in IntegerValue.as_timestamp (7fa7395)
- internals: don’t cache table accessor to avoid a circular reference (fb604a5)
- intervals: conslidate interval conversion under
_make_interval
base compiler implementation (fe29210) - make approximate ops subclasses of their non-approximate variants (9d218d1)
- remove base implementation for
quantile
(3c49c6a) - selectors: remove janky
Predicate
class and unifySelector
s under a single interface (#9917) (c15a229) - simplify caching implementation (afba988)
Deprecations
- api: deprecate
StructValue.destructure
API (#9824) (98c5c76) - api: deprecate type coercion
to_*
methods in favor ofas_*
methods (fb22e20), closes #9788 /github.com/ibis-project/ibis/pull/9843#discussion_r1717545190 - dask,pandas: deprecate the dask and pandas backends (910fa5c)
- expr-api: deprecate useless
has_name
method (#9901) (e0436aa)
9.3.0 (2024-08-07)
Features
- api: support
ignore_null
incollect
(71271dd) - api: support
ignore_null
infirst
/last
(8d4f97f) - api: support
order_by
in order-sensitive aggregates (collect
/group_concat
/first
/last
) (#9729) (a18cb5d) - api: support quarterly truncation (#9715) (75b31c2), closes #9714
- array: implement min, max, any, all, sum, mean (#9704) (793efbc)
- bigquery: support timestamp bucket (fd61f2c)
- datafusion:
pivot_longer
(2330b0c) - datafusion: enable array flatten, group concat, and timestamp now (4d110a0)
- datafusion: struct literals (a63cee9)
- datafusion: unnest (a706f54)
- duckdb: add support for passing a subset of column types to
read_csv
(#9776) (c1dcf67) - duckdb: support arbitrary url prefixes (#9691) (11af489)
- mssql: support case-sensitive collations (#9700) (9382a0e)
- oracle: support group_concat operator (47d97ea)
- pyspark: add support for pyarrow and python UDFs (#9753) (02a1d48)
- snowflake: add
userinfo
URL parsing (524a2fa) - ux: allow window functions in predicates and compile to
QUALIFY
where possible (#9787) (0370bcb)
Bug Fixes
- algolia: add parent class docstring to algolia index (#9739) (3bc9799)
- bigquery: repr geospatial values in interactive mode (#9712) (bd8c93f)
- case: fix dshape, error on noncomparable and empty cases (#9559) (ff2d019)
- compiler-internals: define unsupported operations after simple operations (#9755) (d9b6264)
- deps: update dependency atpublic to v5 (#9697) (a4f3940)
- deps: update dependency sqlglot to >=23.4,<25.10 (#9774) (7144257)
- deps: update dependency sqlglot to >=23.4,<25.8 (#9696) (d4a2ea2)
- deps: update dependency sqlglot to >=23.4,<25.9 (#9719) (b1d8b2e)
- drop: ignore order for
DropColumns
equality (#9677) (ae1e112) - druid: get basic timestamp functionality working (#9692) (6cd3eee)
- duckdb: avoid literals casts that might defeat optimization (e4ff1bd)
- duckdb: ensure that array remove doesn’t remove
NULL
s (f0c3be4) - duckdb: use
register
directly instead of callingread_in_memory
(597817f) - internals: ensure that CTEs are emitted in topological order (#9726) (acd7d82)
- polars: fix polars
std
/var
to properly handlesample
/population
(f83d84f) - polars: remove bogus minus-one-week truncation (ac519b2)
- postgres: handle enums by delegating to the parent class (#9769) (3f01075), closes #9295
- snowflake: bring back
where
filter support ingroup_concat
; fixarray_agg
ordering (#9758) (6e7e4de) - sql: only return tables in
current_database
(#9748) (c7f5717) - types: fix histogram bin allocation (#9711) (6634864), closes #9687
Documentation
- algolia: add custom attributes to backend and core methods (#9730) (d9473cf)
- browser-repl: fix jupyterlite build (#9762) (f403aa1)
- fix spelling in pivot_longer explanation (#9780) (3201d8b)
- fix typo in
drop
method docstring (#9727) (4cf0014) - presentations: update overview slides (#9685) (d3a2c0c)
- replace all double graves with single graves (#9679) (dd26d60)
Refactors
- dependencies: pandas and numpy are now optional for non-backend installs (#9564) (cff210a)
- duckdb: use replace to generate less sql (#9713) (f89aa32)
- internals: remove unnecessary dynamism in
drop
method (#9682) (5ac84c5) - pandas: remove unreachable code in pandas backend (#9786) (dc6bfe2)
- polars: delete some dead versioning code (b23c5a3)
- polars: remove casting where possible; handle conversion on output (#9673) (8717629)
- polars: remove extra backwards compatibility code no longer in use after 1.0 upgrade (feb12f4)
- sql: make compilers usable with a base install (#9766) (84a786d)
- table_loc: return consistent object from catalog.db parsing (#9743) (1ae2a37)
Performance
Deprecations
9.2.0 (2024-07-22)
Features
- api: accept more input types in
ibis.range
(#9659) (310ad30) - api: add
nulls_first=False
argument toorder_by
(#9385) (ce9011e) - api: add
TableUnnest
operation to support cross-join unnest semantics as well asoffset
(#9423) (3352a84) - api: add positional joins (#9533) (85ea9da)
- api: allow grouping by scalar values (#9451) (14f1821)
- api: support deferred or string column names in
cov
/corr
methods (#9657) (4d135b3) - api: support selectors in window function
order_by
andgroup_by
(#9649) (0ad47de) - backends: support creation from a DB-API con (#9603) (fc4d1e3)
- bigquery: implement CountDistinctStar (#9470) (273e4bc)
- caching: tie lifetime of cached tables to python refs (#9477) (f51546e)
- datafusion: datafusion enhancements (#9544) (f11ca43)
- dtypes: fall back to
dt.unknown
for unknown types (#9567) (6e0b5f5) - dtypes: fall back to
dt.unknown
for unknown types (#9576) (56a10d2) - duckdb: use
delta_scan
instead of reading pyarrow datasets (#9566) (0ff595e) - flink: create views from more mem data types (#9622) (b83fc2b)
- geospatial: use geoarrow extension types when returning geometry columns as pyarrow (#9549) (cba7367)
- polars: add more accurate type mapping for timestamps (#8954) (3eafac4)
- polars: support version 1.0 and later (#9516) (62a1864)
- postgres: support basic jsonb type and existing operations (#9630) (7179cc6)
- pyarrow: support
__arrow_c_schema__
onibis.Schema
objects (#9665) (00a776e) - pyspark: implement new experimental read/write directory methods (#9272) (adade5e)
Bug Fixes
- api: add support for using deferreds in the
argmin
/argmax
key
argument (#9652) (3f05cbc) - bigquery: escape table names with spaces for bigquery backend (#9589) (ca21dbb)
- bigquery: support microseconds in time literals (#9610) (c876abc), closes #9609
- clickhouse: generate redundant aliases to workaround clickhouse naming behavior (#9525) (b44dac2), closes #9508
- clickhouse: support
Date32
database type (#9509) (efa6fb7) - datatypes: proper handling of srid in geospatial datatypes (#9519) (a3ceb59)
- deps: update dependency datafusion to v39 (#9506) (21ef0a6)
- deps: update dependency fsspec to <2024.6.2 (#9463) (8e225ec)
- deps: update dependency geopandas to v1 (#9437) (fa1037b)
- deps: update dependency numpy to v2 (#9395) (3cb39a5)
- deps: update dependency pyarrow to v17 (#9614) (16998df)
- deps: update dependency sqlglot to >=23.4,<25.3 (#9401) (bdc1b3f)
- deps: update dependency sqlglot to >=23.4,<25.4 (#9427) (8e015b6)
- deps: update dependency sqlglot to >=23.4,<25.5 (#9472) (f6f80da)
- deps: update dependency sqlglot to >=23.4,<25.6 (#9523) (6a748c4)
- deps: update dependency sqlglot to >=23.4,<25.7 (#9628) (f5207ff)
- druid: handle typed nulls where possible (#9452) (33ec754)
- fix and improve shape inference in many ops (7a0b21e)
- ir: avoid deduplicating filters based solely on their name (#9476) (b35582e), closes #9474
- ir: repr iterables when constructing name of operations (#9480) (f5a541c)
- join: skip substitution of non-field references in join chains (#9595) (61ef0ed)
- mssql: always pass port to
pyodbc
in host string (#9656) (2e3fd9a) - mssql: avoid calling
.commit()
unless a DDL operation is being performed (#9658) (69c5bf0), closes #9654 - mssql: fix temporary table creation and implement
cache
(#9434) (196d8a1) - mysql: ensure that
port
is captured in MySQL_from_url
implementation (#9421) (5bb4971), closes #9417 - oracle, clickhouse: ensure
port
is captured in_from_url
implementation (#9507) (bd3009a) - oracle: support connection using oracle connection string (#9435) (f3cd8b2)
- pandas, dask: fix
drop_table
handling offorce
keyword (#9503) (95048a4) - polars: add workaround to compile Array
correctly (#9484) (5a9d026) - postgres: add dtype mapping for
citext
(f46979b) - pyspark: run pre-execute hooks for
to_delta
(#8848) (fe0466a) - pyspark: set catalog and database with
USE
instead of pyspark api (#9620) (6991f04) - pyspark: set lower bound of pyspark to 3.3.3 to avoid maintenance burden of pytest collection hook (#9606) (97af53c), closes #9564
- sec: remove most instances of possible sql injection (#9404) (a555774)
- trino: allow passing the
auth
keyword (#9410) (560ddf6) - uri-parsing: handle password with bracket in connection url (#9466) (c73bcf0)
- urls: standardize pass-through of parsed query parameters (#9482) (87cba01), closes #9456
Documentation
- add
deferred
,range
,e
, andpi
to API reference (#9592) (43b116b) - add ibis-bench blog (#9391) (2c9198d)
- add Shiny (for Python) to “works well with” section (#9558) (8862979), closes #1000 #2000 #3000
- algolia: add methods from backend pages to algolia index (#9608) (c098c70), closes #9600
- arbitrary: add example for
arbitrary
method docstring (#9596) (09cfd85) - blog-axis-labels: set correct y axis in plot in Ibis benchmark blog post (#9445) (fce5bd3)
- blog: add 1tb challenge on a laptop post (#9487) (e33a4cc)
- blog: run Ibis on Snowflake (#9406) (1839c13)
- clickhouse: put the clickhouse tutorial into sidebar (#9624) (0fbec3f)
- fix spelling issues (#9563) (92eda02)
- how-to: add a guide on streaming operations (#9642) (f3ed10c)
- update link to the flink example repo (#9553) (87af588)
Refactors
- api: refactor the implementation of windowing (#9200) (eaa1301)
- api: remove
tuple
support inSortKey
(#9416) (4dff6e2) - api: remove unnecessary
select
from set operations (#9438) (88a2785) - backends: remove redundant implementations of
_register_in_memory_tables
(5235a4b) - caching: remove parameters that are always the same (#9532) (afa2848)
- compilers: move compilers out of the backend dependency path (#9590) (122330a)
- deps: make pyarrow optional for non-backend installs (#9552) (9047b26)
- polars: delete some dead code in the polars backend (#9389) (77fa811)
- polars: remove numpy usage entirely (#9607) (946f761)
- pyarrow: remove comparison of column names for renaming (#9616) (3b2a7ec)
- streamlit: update to use BaseConnection interface (#9550) (f5dd8fb)
- tests: add tpc ds setup and rearrange tpc setup (#9453) (b150635)
- tpc: add tpc-ds tests (#9467) (d2dff68)
- viz: avoid repeatedly rendering redundant schemas in graphviz output (#9518) (d53602b)
Performance
- bigquery: avoid running
list_tables
when registering memtables (#9425) (fbc79d2) - bigquery: use
query_and_wait
for better performance on queries of small data with smaller result sets (#9418) (ad1e915) - drop: speed up performance of drop (#9440) (1c6eb5c), closes #9111
- drop: use
_fast_bind
to speed updrop
even more (#9646) (4f39d69) - duckdb: speed up memtable registration (#9419) (7878d8c)
- duckdb: speedup timestamp conversion by avoiding conversion to object (#9556) (5923e1e)
- relocate: avoid redundant selector position computation (#9644) (cd58214)
- rename: avoid unnecessary rewrites and dereferencing in
rename
(#9641) (e56489e)
9.1.0 (2024-06-13)
Features
- all: enable passing in-memory data to create_table (#9251) (fa15c7d), closes #6593 #8863
- api: add
Table.value_counts
for easy group by count on multiple fields (aba913d) - api: isoyear method (#9034) (4707c44)
- api: support
type
arg to ibis.null() (8db686e) - api: support wider range of types in
where
arg to column reductions (582165f) - api: support wider range of types in
where
arg to table reductions (7aba385) - bigquery: implement a few URL ops (#9210) (3d0f9bc)
- bigquery: support filtering by
_TABLE_SUFFIX
when using a wildcard table name (#9375) (62a25c4), closes #9371 - datafusion: use pyarrow for type conversion (#9299) (5bef96a)
- drop Python 3.9 and test on Python 3.10/3.12 (#9213) (c06285e)
- duckdb: add catalog support to create_table (#9147) (07331b5)
- duckdb: allow to use named in-memory db (#9241) (67460aa), closes #9240
- duckdb: support and test 1.0 (#9297) (395c8b5)
- pandas,dask: implement ops.StructColumn (#9302) (ea81d85)
- polars: accept list of CSVs to read_csv (#9232) (7a272e3), closes #9230
- polars: implement
create_view
/drop_view
/drop_table
(#9263) (c4324f5) - postgres: provide translation for
hash
ops (#9348) (57e2348) - pyarrow: support Arrow PyCapsule interface on
ibis.Table
objects (1a262b9) - pyspark: builtin udf support (#9191) (142c105)
- pyspark: provide a mode option to manage both batch and streaming connections (e425ad5)
- pyspark: support reading from and writing to Kafka (#9266) (1c7c6e3)
- selectors: parse Python types in
s.of_type
(#9356) (c0ebdc8) - snowflake: implement array map and array filter (#9178) (9b42751)
- snowflake: implement support for
asof_join
API (#9180) (49c6ce3) - snowflake: implement Table.sample (#9071) (307334b)
- ux: improve error message on unequal schemas during set ops (#9115) (5488896)
Bug Fixes
- api: treat
col == None
orcol == ibis.NA
ascol.isnull()
(#9114) (711bf9f) - bigquery: only register memtable if obj is not None (#9268) (f175d0a)
- bigquery: quote all parts of table names (#9141) (e1338d5)
- bigquery: quote qualified memtable names (#9149) (878d0d5)
- bigquery: strip whitespace from bigquery field names (#9160) (8e5cc3b), closes #9112
- clickhouse: more explicitly disallow null structs (#9305) (fc1d00f)
- convert the uint64’s from some backends’ hash() to the desired int64 (900ecca)
- datatypes: manually cast the type of
pos
toint16
fortable.info()
(#9139) (9eb1ed1) - datatypes: manually cast the type of pos to int16 for
table.describe()
(#9314) (c7fcddf) - ddl: use column names, not position, for insertion order (#9264) (3506f40)
- deps: remove pydruid sqlalchemy dependency (#9092) (a0df103)
- deps: update dependency datafusion to v37 (#9189) (49ecf8d)
- deps: update dependency datafusion to v38 (#9278) (77aaecd)
- deps: update dependency fsspec to <2024.5.1 (#9201) (15a5257)
- deps: update dependency fsspec to <2024.6.1 (#9304) (d600a0d)
- deps: update dependency sqlglot to >=23.4,<23.14 (#9118) (d8119fb)
- deps: update dependency sqlglot to >=23.4,<23.15 (#9151) (ac2201d)
- deps: update dependency sqlglot to >=23.4,<23.17 (#9209) (82a5f93)
- deps: update dependency sqlglot to >=23.4,<23.18 (#9212) (b92dd7b)
- deps: update dependency sqlglot to >=23.4,<24.2 (#9277) (98cb460)
- deps: update dependency sqlglot to >=23.4,<25.2 (#9368) (d65a752)
- deps: update dependency sqlglot to v24 (#9229) (a4918be)
- deps: update dependency sqlglot to v25 (#9316) (2b921f8)
- drop nulls in
.collect()
aggregation (b6e0c31) - duckdb: clean up temp view junk when using memtables in
create_table
(#9107) (4e7a00c) - duckdb: use existing table repr for settings view (#9155) (1892bfd)
- exclude null values from
first
andlast
aggregations (22fffc7) - mysql: avoid creating any tables when using
.sql()
(#9363) (d2d5251), closes #9354 - mysql: support parametrized datetime types (#9294) (ccfcbbc)
- polars,mysql: avoid execution of query in
_get_schema_using_query
(#9290) (0348b9a) - pyspark: plumb through
limit
andparams
in export functions (1f36552) - replace NaNs with None in some backends when loading from pandas dataframe (#9094) (f2a7cd9), closes #9095 #8792
- snowflake: ensure that timestamp conversion from parquet files is correct (#9181) (1ba4c32)
- snowflake: properly pass schema and database for sqlglot generation (#9221) (1ecb319)
- to_sql: use default backend for sql generation when set (#9228) (c66d6aa), closes #9227
- trino: parse URL passed to ibis.connect (e3ee67b)
- typing: map() can take ArrayValues not just ArrayColumns (#9282) (3ad1183)
Documentation
- add API docs for operations (#9233) (11e0530)
- add probabl podcast to 9.0 blog (#9105) (9d20b85)
- api: document the ability to apply a different sort order using
across
(#9376) (f41c554) - blog: new blog post on sqlmesh + ibis (#9218) (8e015a7)
- builtin: update url for packages.parquet file (#9132) (4f93a91)
- descheming: move callout note outside of parameter description (#9133) (bb7bdb3), closes #8712
- dev: update maintainers guide link (#9312) (379afac)
- fix jupyterlite build (#9090) (63dcb92)
- improve maintainers summit slides (#9207) (5999e2d)
- presentation: minor updates to overview presentation (#9145) (f7c2dbb)
- presentations: minor updates to history in overview presentation (#9341) (f5c7978)
- presentation: updates to overview presentation (#9126) (2ba1884)
- put DataType base class first in reference (eaf0e45)
- pyspark: remove outdated
connect()
callout (#9327) (10112bd) - release blog for Ibis 9.0 (#8918) (0350815)
- remove GitHub-specific Markdown in README.md (#9370) (ce0f1f2)
- remove unrendered/unused top-level getting-started.qmd document (#9106) (66a67c0)
- rework the homepage (#9088) (c68f9d4), closes #8856
- search: append scraped API records to algolia index in CI (#9366) (05d9d7a)
- talks: pycon 2024 maintainers talk (#9193) (77d6cb6)
- update contribute index page content (#9349) (f130dae)
- update the code of conduct link (#9337) (fa2de4d)
- use interactive mode instead of execute for typed null docstrings (c27097b)
- use more idiomatic group_by in readme example (#9307) (2aca613)
- website: make icon grayscale for consistency (#9100) (fb81f92)
Refactors
- deprecate
fillna
/dropna
methods in favor offill_null
/drop_null
(df0e656) - deprecate register api (#8863) (7a39bd3)
- ir: actually remove
analysis.py
(#9087) (8508e3d) - pyspark: remove custom implementation of cursors (#9161) (9caa552)
- remove ibisNA (#9344) (83db19d), closes #9311
- snowflake: replace array repeat udf with builtin transform function (#9177) (b3abc9a)
- sql: add
LOWERED_OPS
mapping for cleaner handling of operations implemented by “lowering” to simpler operations (7a9b4b6) - sql: extract aggregate handling out into common utility class (#9222) (56e0b38), closes #9170
- sql: rename
UNSUPPORTED_OPERATIONS
toUNSUPPORTED_OPS
for consistency (9e11957) - sql: use a rewrite rule to implement FillNa/DropNa (378251e)
Performance
9.0.0 (2024-04-30)
⚠ BREAKING CHANGES
- udf: The
schema
parameter for UDF definition has been removed. A newcatalog
parameter has been added. Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination ofcatalog
anddatabase
to specify a hierarchical location for the UDF. - pyspark: Arguments to
create_database
,drop_database
, andget_schema
are now keyword-only except for thename
args. Calls to these functions that have relied on positional argument ordering need to be updated. - dask: the dask backend no longer supports
cov
/corr
withhow="pop"
. - duckdb: Calling the
get
orcontains
method onNULL
map values now returnsNULL
. Usecoalesce(map.get(...), default)
orcoalesce(map.contains(), False)
to get the previous behavior. - api: Integer inputs to
select
andmutate
are now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax. - api: strings passed to table.mutate() are now interpreted as column references instead of literals, use
ibis.literal(string)
to pass the string as a literal - ir:
Schema.apply_to()
is removed, useibis.formats.pandas.PandasConverter.convert_frame()
instead - ddl: We are removing the word
schema
in its hierarchical sense. We usedatabase
to mean a collection of tables. The behavior of all*_database
methods now applies only to collections of tables and never to collections ofdatabase
(formerlyschema
) CanListDatabases
abstract methods now all refer to collections of tables.CanCreateDatabases
abstract methods now all refer to collections of tables.list_databases
now takes a kwargcatalog
.create_database
now takes a kwargcatalog
.drop_database
now takes a kwargcatalog
.current_database
now refers to the current collection of tables.CanCreateSchema
is deprecated andcreate_schema
,drop_schema
,list_schemas
, andcurrent_schema
are deprecated and redirect to the corresponding method/property ending indatabase
.- We add a
CanListCatalog
andCanCreateCatalog
that can list and create collections ofdatabase
, respectively. The new methods arelist_catalogs
,create_catalog
,drop_catalog
, - There is a new
current_catalog
property. - api: timecontext feature is removed
- api: The
by
argument fromasof_join
is removed. Calls toasof_join
that previously usedby
should pass those arguments topredicates
instead. - cleanup: Deprecated methods and properties
op
,output_dtype
, andoutput_shape
are removed.op
is no longer needed, and use.dtype
and.shape
respectively for the other two. - api: expr.topk(…) now includes null counts. The row count of the topk call will not differ, but the number of nulls counted will no longer be zero. To drop the null row use the dropna method.
- api:
ibis.rows_with_max_lookback()
function andibis.window(max_lookback)
argument are removed - strings: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word’s definition being backend-specific).
- impala: Impala UDFs no longer require explicit registration. Remove any calls to
Function.register
. If you were passingdatabase
toFunction.register
, pass that toscalar_function
oraggregate_function
as appropriate. - pandas: the
timecontext
feature is not supported anymore - api:
on
paremater oftable.asof_join()
is now only accept a single predicate, usepredicates
to supply additional join predicates.
Features
- add to_date function to StringValue (#9030) (0701978), closes #8908
- api: add
.as_scalar()
method for turning expressions into scalar subqueries (#8350) (8130169) - api: add
catalog
anddatabase
kwargs toibis.table
(#8801) (7d593c4) - api: add
describe
method to compute summary stats of table expressions (#8739) (c8d98a1) - api: add
ibis.today()
for retrieving the current date (#8664) (5e10d17) - api: add a
to_polars()
method for returning query results aspolars
objects (53454c1) - api: add a
uuid
function for returning a new uuid (#8438) (965b6d9) - api: add API for unwrapping JSON values into backend-native values (#8958) (aebb5cf)
- api: add disconnect method (#8341) (32665af), closes #5940
- api: allow *arg syntax with GroupedTable methods (#8923) (489bb89)
- api: count nulls with topk (#8531) (54c2c70)
- api: expose common types in the top-level
ibis
namespace (#9008) (3f3ed27), closes #8717 - api: include bad type in NotImplementedError (#8291) (36da06b)
- api: natively support polars dataframes in
ibis.memtable
(464bebc) - api: support
Table.order_by(*keys)
(6ade4e9) - api: support all dtypes in MapGet and MapContains (#8648) (401e0a4)
- api: support converting ibis types & schemas to/from polars types & schemas (73add93)
- api: support Deferreds in Array.map and .filter (#8267) (8289d2c)
- api: support the inner join convenience to not repeat fields known to be equal (#8127) (798088d)
- api: support variadic arguments on
Table.group_by()
(#8546) (665bc4f) - backends: introducing ibish the infinite scale backend you always wanted (#8785) (1d51243)
- bigquery: support polars memtables (26d103d)
- common: add
Dispatched
base class for convenient visitor pattern implementation (f80c5b3) - common: add
Node.find_below()
methods to exclude the root node from filtering (#8861) (80d12a2) - common: add a memory efficient
Node.map()
implementation (e3f2217) - common: also traverse nodes used as dictionary keys (#9041) (02c6607)
- common: introduce
FrozenOrderedDict
(#9081) (f926995), closes #9063 - datafusion, flink, mssql: add uuid operation (#8545) (2f85a42)
- datafusion: add array and strings functions (#8895) (2f23223)
- datafusion: implement
arbitrary
(43a8f50) - datafusion: port to new sqlglot backend (3aa109a)
- datatypes: convert money and small money datatype to decimal datatype (#8556) (ecc5d70)
- duckdb: add support for read_mysql (#8656) (4ea4a1d)
- duckdb: allow all-null columns in memtables (#8367) (b2ae64a)
- duckdb: support
asof
joins includingtolerance
parameter (104cb9b) - exasol: add support for bit operations (#8741) (4aa721e)
- exasol: add support for DateDelta (dd639ca)
- exasol: add support for DayOfWeekName (#8589) (de4e988)
- exasol: add support for extract epoch seconds (#8726) (db79aae)
- exasol: add support for extract seconds (#8723) (fb7e533)
- exasol: add support for ExtractDayOfYear (#8578) (df2b69e)
- exasol: add support for extracting milliseconds from timestamps (#8722) (1778de5)
- exasol: add support for ExtractQuarter (#8587) (0d9b676)
- exasol: add support for ExtractWeekOfYear (#8588) (68925f6)
- exasol: add support for hexdigest (#8740) (76d8ef0)
- exasol: add support for TimestampNow (#8563) (94e79e4)
- flink: add map support (#8425) (68739a2)
- flink: implement support for array expansion (#8511) (a6e6564)
- flink: implement UDF support for the backend (#8142) (a3b1cc6)
- geo-duckdb: support casting binary to geometry (#9062) (1926eb4)
- geospatial: add support for duckdb operations on literals (#8570) (b4c4369)
- graphviz: node- and edge-specific custom attributes (#8527) (98c52aa)
- graphviz: support custom node and edge attributes in
ibis.visualize
(#8510) (ee821b1) - ir: add
StringSlice
(#8832) (e4e3531) - ir: add default implementation of pretty formatting nodes (#8880) (a696c70)
- ir: more flexible dereferencing support for join right hand side (#8992) (d7a31aa), closes #9043 #9041
- ir: support pretty printing arbitrary traversable objects (#9043) (68dfe39)
- ir: support showing variable names used to create an expression in
repr()
(#8630) (220085e) - mssql: add datatype mapping for
hierarchyid
(#8397) (2fd2c30) - mssql: use integrated auth when no user or password supplied (#8668) (0a78414)
- new backend issue template (#8449) (e4edc78)
- pandas, polars, dask, datafusion: enable create local backends with empty url (#8860) (9dabae0), closes #8450
- polars: add limited support for table dot sql (#8528) (b2a4fbb)
- polars: implement
arbitrary
(973c3d3) - postgres: add mappings for more esoteric dtypes (#9055) (5cb83fc), closes #8845
- postgres: support loading tables with
pgvector
column types (#9037) (8846514) - postgres: use port in connection string (d561c01)
- pyspark: add catalog support to pyspark (#9042) (2c1a58e), closes #9038
- pyspark: add support for PySpark 3.5 (65717f4)
- pyspark: support
ibis.pyspark.connect()
(#8515) (0f663e6) - python: support python 3.12 (7056dea)
- risingwave: add streaming DDLs (#8239) (356e459)
- snowflake: allow empty url when using ibis.connect (#8428) (0275c9b), closes #8422
- snowflake: create an ibis backend from a snowpark session (#8962) (f15d033)
- snowflake: support connecting with no arguments (#8422) (543a2ec)
- sql: add option to enable/disable select merging (#9065) (4bc9314), closes #9064 #9058
- sql: extract common table expressions (0324372)
- sql: lower expressions to SQL-like relational operations (6f7f190)
- sql: use
SELECT *
for complete reprojections (#9075) (a9aa8a7) - trino: implement existing json functionality (#8963) (964ac3e)
- trino: port to sqlglot (9c5a907)
- ux: add Table and Column.preview() (#7915) (1c03ad0), closes #7408 #7172
Bug Fixes
- api: forbid using
asc
/desc
in selections (62992c3) - api: improve error message raised on improper calls to array
map
orfilter
(#8602) (0236370) - api: restore and deprecate
ir.Table.to_array()
(#8227) (22de674) - api: return NULL when NULL is passed to
Array.zip
(#8652) (fac85f0) - api: selection using a selector yielding multiple columns (#8215) (869889b)
- api: support passing literal booleans to
filter
(2aa31f4) - backends: make string concat-with-null behavior consistent across backends (#8305) (2d97b8e), closes #8302
- bigquery: do not overwrite the entire default query job config (b42fb1c)
- bigquery: ensure session creation before creating temp tables (#8976) (314abe4), closes #8975
- bigquery: get literals working again (#8577) (6369734)
- bigquery: restore option to specify table path in table name (a9beadb)
- clickhouse: adjust for new timestamp behavior and regen sql (4bdc040)
- clickhouse: avoid forcing UTC to allow connection to servers that do not allow it (#8762) (52eeea9)
- clickhouse: make arrays non nullable (#8501) (1caf6de)
- clickhouse: use backwards compatible string search function (bb736fe)
- common:
Node.map_clear
should have return type annotationAny
(#8564) (8d7baae) - common: don’t match an
Object
pattern with more positional arguments defined than__match_args__
has (2e63bba) - common: intermediate result removal fails if there are duplicated dependencies (e3e17db)
- comparison: wrap isnull equality check in parens (#8366) (247e2f7)
- conversion: convert decimals to the exact precision and scale requested by the input type (8c1e6f4)
- dask: don’t call
compute
when executingargmin
/argmax
(1204c56) - dask: don’t call
compute
when executingcov
/corr
(a876c47) - dask: fix argmin/argmax implementation for dask (93834f1)
- dask: pin dask version to avoid automatically picking up dask-expr (#8629) (f1d0f65)
- datafusion: ensure that to_pyarrow_batches does do compute (d1a62d0)
- datatypes: always quote sqlglot struct fields (#8777) (18bb91b), closes #8771
- datatypes: convert UUIDs to strings (#8262) (6f32374)
- decompile: ensure that
SelfReference
is decompiled with a call to.view()
(4a44c57) - deps: bump dependencies’ lower bounds to reflect tested minimum version (#8977) (9c29f28), closes #8795
- deps: bump polars lower bound (#8841) (125e4ad)
- deps: bump sqlglot to pick up duckdb array fixes (#8682) (a3bd853)
- deps: support pandas 2.2 (#8758) (4b476ba)
- deps: update dependency datafusion to v36 (#8612) (5a67102)
- deps: update dependency pyarrow to v16 (#9033) (a687ec1)
- deps: update dependency sqlglot to >=22,<22.5 (#8635) (267f4bc)
- deps: update dependency sqlglot to >=23.4,<23.10 (#8787) (0f00101)
- deps: update dependency sqlglot to >=23.4,<23.11 (#8957) (2b7f7b1)
- deps: update dependency sqlglot to >=23.4,<23.12 (#9029) (1cace01)
- deps: update dependency sqlglot to >=23.4,<23.13 (#9056) (5dac34d)
- deps: update dependency sqlglot to v21 (#8272) (efaa365)
- deps: update dependency sqlglot to v22 (8aa4222)
- deps: update dependency sqlglot to v23 (#8688) (5041894)
- druid: array_string_join and to_polars extra column (2311d4f)
- druid: import pydruid.db module explicitly (#8782) (550ada0)
- duckdb-sql: ignore importlib package errors when importing ibis.snowflake for transpilation (#8389) (b968301)
- duckdb: add
flip_coordinates
translation to sqlglot duckdb backend (f7df510) - duckdb: allow connection to motherduck via ibis.connect (#8357) (42f45fe), closes #8355
- duckdb: allow passing both overwrite and temp to create_table (b9b19e0)
- duckdb: ensure that create_schema and create_database are actually tested (ba31f82)
- duckdb: ensure that structs can be used with sqlglot 20.1.0 (17be43a)
- duckdb: generate struct fields with propertyeq instead of slice (d2c1316)
- duckdb: load extension when executing geospatial expressions (#9080) (1960d54)
- duckdb: parenthesize argument to StructField operation to support field access on CASE statements (#8486) (1371016)
- duckdb: pass global replace flag to
ops.RegexReplace
translation rule (e46260d) - duckdb: udfs builtins taking zero args (ab39344)
- duckdb: workaround for duckdb Map NULL bugs (#8649) (75d32e5)
- duckdb: workaround remaining null map issues (#8985) (b6c71d7), closes #8632
- fix SQL backend has_operation to include operations supported through rewrite rules (133a1f1)
- flink: avoid non-existent sge.NULL (9f190eb)
- flink: cast map key lookups because flink requires exact type matches (#8724) (6893a5f)
- flink: fix compilation of memtable with nested data (#8751) (364a6ee), closes #8516
- flink: fix compilation of over aggregation query in flink backend (#8359) (de174a2)
- graphviz: show proper field attributes of accessed relations and do not display join link property accesses (#8521) (69d6c73)
- impala: remove no-longer-used temporary database and paths that may prevent connection success (#8489) (32fcce6), closes #8466
- ir:
asof
jointolerance
parameter should post-filter and post-join instead of adding a predicate (e380e79) - ir: accidentally remapping fields during bind() (#8988) (f4cee67), closes #8884
- ir: compute
InSubquery.shape
property fromneedle
input (#8364) (13d675e), closes #8361 - ir: fix window boundaries being forcefully casted (#8400) (09b6ada)
- ir: make impure ibis.random() and ibis.uuid() functions return unique node instances (#8967) (741063a)
- ir: only dereference comparisons not generic binary operations (05ac73a)
- ir: self reference fields were incorrectly dereferenced to the parent relation (7bfebe2)
- make devcontainer work correctly (#9019) (a696c58), closes #9011
- mssql: don’t use the removed
sge.TRUE
andsge.FALSE
literals (7e0b735) - mssql: restore any, all and cumulative versions (#8409) (99a4022), closes #8073
- mssql: restore unbounded window functions (#8411) (0211d4f)
- mysql: remove not-allowed frame clause from rank window function (ee96cef)
- oracle: allow passing both overwrite and temp to create_table (3ce4766)
- oracle: clean up memtables at exit (dc34f61)
- oracle: enable dropping temporary tables (1dffd5e)
- oracle: map bare
NUMBER
toint64
and consolidate data type mapping code for shared inference (#8626) (b5f9bbe) - pandas: make case work for non-RangeIndex dataframes (#9083) (73dd685)
- pandas: map
date
type todatetime64[s]
(#8667) (6bd965e) - pandas: use mergesort for deterministic sorting (6042a71)
- polars: columns are picked from the correct side in case of conflicting names (#8134) (4273cef)
- polars: ensure
t.select(col=scalar)
results inlen(t)
rows (#8665) (6c00579) - polars: ensure that reading from a compressed csv triggers in-memory read (b3bbde1)
- polars: force null sorting to match the rest of ibis (b475c36)
- polars: reference the correct field in the
ops.SelfReference
rule (a371274) - polars: support order by computed column (ddf56cb)
- polars: use value type of array type for int_ranges construction (c24c54e)
- polars: user newer
drop
API in asof join implementation (c65d9f8) - polars: user newer
drop
API to avoid deprecation warning (43424ea) - postgres: fix compilation of array string join and map/struct field extraction (306d0fc)
- postgres: fix json type conversion in
to_pyarrow
output (#8439) (b338517), closes #8318 - postgres: pass through additional kwargs in pguri (7ab4fda)
- pyarrow: map date type to arrow date32 not date64 (05575b7)
- pyarrow: support accepting pyarrow dictionary types as inputs (#8276) (14c4226), closes #8207
- pyspark: don’t use the removed
sge.NULL
,sge.TRUE
andsge.FALSE
literals (dffb44a) - pyspark: ensure that
to_delta
works and is tested (64af56a) - pyspark: ensure that the output of zip matches the expected ibis schema (#9052) (be9d5da), closes #9049
- pyspark: force sqlglot to generate first/last (fbfc3c1)
- pyspark: remove use of attribute that prevents using spark connect (#9061) (b48f451), closes #9060
- pyspark: unwind catalog/database settings in same order they were set (#9067) (962ee00)
- rewrites: add missing filter arguments for
node.replace()
calls (196e716) - rewrites: change
TableColumn
->Field
in rewrites (#8448) (3730eb6) - risingwave: gen correct jsonb extract path function (7b0d6a9)
- risingwave: set implicit flush to true (#8929) (fe16877)
- snowflake-snowpark: disable the
reconnect
method (#8969) (e31eded) - snowflake: bring back default nth behavior from before the-epic-split (e943667)
- snowflake: handle udf function naming (bec36ca)
- snowflake: import the connector at the correct scope (6bbb9c8)
- snowflake: initialize
_from_snowpark
variable in constructor to ensure it is defined (#8970) (5722a10) - snowflake: initialize the parent class on construction (#8972) (de3a169)
- snowflake: manually construct quantile calls with
WITHIN GROUP
(#8846) (261a544) - snowflake: set con outside of
_setup_session
call (#8979) (3b1a6ef) - snowflake: use
_safe_raw_sql
forinsert
implementation (2ceb5a6) - sql: avoid calling .subquery on subqueries (7ad32bd)
- sql: avoid excessive inlining during
Select
merge (#8825) (ba931da) - sql: don’t generate table aliases for
ops.JoinLink
(3da1abf) - sqlite: don’t use the removed
sge.NULL
literal (e8ed08a) - sqlite: ensure
ibis.uuid()
generates a unique uuid per row (#8535) (c097a2d), closes #8532 - sql: look for CTEs under value expressions as well (#8633) (14358fe)
- sql: outer order by should take precedence over inner order by (4376c35)
- sql: overwrite the original sort key on successive order_by calls ordering by the same key (103dc68)
- sql: replace CTEs within CTEs (#8572) (182b6a5)
- sql: support set operations wrapping subqueries (#8414) (aab0c13)
- strings: make
StringValue.capitalize()
consistent across backends (#8270) (c4055d6), closes #8271 - structs: ensure that isin works with struct membership (#8978) (c0c508e)
- timestamps: use timezone aware objects instead of utcfromtimestamp (0b6ac2d)
- trino: compile property literal values directly instead of going throughh the pipeline (b2761c9)
- trino: generate first_value/lasts_value instead of arbitrary (710f8ac)
- trino: re-enable native TABLESAMPLE support (#8284) (75d154a)
- update error message when executing against unbound tables (#8695) (384b10f), closes #8677
Documentation
- 2024H1 roadmap and why VoDa supports Ibis (#8184) (7fa4334)
- add blog link on the homepage (ad7fa68)
- add blog on using duckdb for RAG (#8387) (fedf21b)
- add concepts guide on Datatypes and Datashapes (#8557) (cbeb6cf)
- add constants to width and height (#8640) (01b3f28)
- add explanation in rag blog and note on fixed length arrays (#8413) (ea58d22)
- add link to style and formatting in env setup section (#8936) (5095349)
- add lms for data post (#8222) (8f35010)
- add missing dev docker instructions for backends (#8352) (a20f44a)
- add missing import to
RowNumber
example (#8523) (ba21ec0) - add Python + SQL section to why ibis (#8526) (211f336)
- add testimonial on combining Ibis with Kedro (5d23818)
- add v8 blog (#8200) (4adb5e3)
- added section on duckdb reading gcs files (#8651) (c2b06f6)
- altair: remove to_pandas from altair example (#8951) (eecbcea)
- api: expose relevant mixin members in temporal docs (cb97806)
- blog: add post introducing the Flink backend (#7912) (7bf764f), closes #1000 #2000 #3000 #7739
- blog: add post on stream-batch unification (#8293) (96b251d)
- blog: blog on why DuckDB is the default backend (#8378) (c916f6a), closes #8230
- blog: fix a formatting issue (#8473) (92bb6fd)
- blog: make point on stream-batch unification (#8316) (8ed4d4b)
- blog: needle haystack post (#8824) (ca97867)
- blog: portable dataflows with Ibis and Hamilton (#8798) (e97aa46)
- blog: update date on hamilton blog (#8851) (9cb4c8d)
- blog: wow analysis blog post (#8441) (6bd6c26)
- build: cache notebooks during render (#8950) (fd8858d)
- change date and remove links from why voda blog (#8287) (4d89300)
- clean up cursor closing and link to
Table.sql
API docs (#8417) (753a268), closes #8345 - colima: move section to its own header (865779b)
- concepts: remove outdated refs to SQLAlchemy (e932f6f)
- contribute: add a section to explain the different pytest markers that we use (3c74d84)
- create examples tab and populate with
ibis-examples
repo content (40db711) - datafusion: update website link (#9072) (8df106f), closes /github.com/apache/datafusion/issues/9691#issuecomment-2063690461
- deps: pin itables to preserve grid styling (#8420) (fb91766)
- development: move pip to end of env creation tabset (d475660)
- dev: give the callout box a title (#8487) (c2a9ba9)
- duckdb: add short section on enabling duckdb geospatial (#8744) (c964bc5)
- env: give more actionable advice for using nix on apple m1 (#8716) (6639163)
- escape necessary characters (a4eb2d2)
- exasol: remove deprecated parameters from example (#8720) (becdca9)
- fix a couple of formatting issues (#8802) (7a8f65c)
- fix displaying the PyPI package name warning (#8237) (2d3297f)
- fix typo in
order_by
docstring (#8456) (ae3db0e) - flink: remove conda warning from flink setup page (#8952) (dd9e928)
- flink: update conda-based installation guide (#8238) (f2ef90a)
- freeze clickhouse tutorial (#8312) (4bc7dc3)
- geospatial: update blog to use as_scalar and fix dependencies (#8543) (a845030)
- hierarchy: add brief doc listing backend hierarchy types (190b2c7)
- how-to: add a how-to guide for executing unbound expressions on backends (#8522) (66b4dc0), closes #8297
- how-to: add graphviz section (#8513) (f1bb060)
- improve nested type docstrings (#8358) (bc9d757)
- include return type in Interval component properties (#8193) (9b00657)
- links: fix release notes link in 8.0 release (#8315) (9387ac0)
- minor fixes to LM blog (#8236) (36f8904)
- presentations: fix quarto markup for Ibis origins slide in linkedin-meetup presentation (#9050) (e04c3e5)
- presentations: presentation for linkedin meetup 2024-04-24 (#8983) (01b521c)
- release-notes: fix up version header (#8941) (e583d2b)
- remove
all
extra installation from contributing docs (#9004) (03825b6) - remove experimental indicator from all backends (586f979)
- rework why Ibis article to explain what Ibis is and other updates (#8490) (c44f357), closes #8251 #8488
- rewrite readme (#8524) (9a741f7), closes #8436
- risingwave: add backend docs page for RisingWave (a6e7920)
- roadmap 2024 H1 blog (#8362) (2750fe5)
- show how Array.unique() keeps NULLs (#8766) (7308249)
- snowflake: add blog post showing insertion into snowflake from postgres (#8426) (3a8c7cc)
- tutorial: add a tutorial for the Flink backend (#8085) (e2a3fb6)
- tutorial: add examples for complex topk (#8831) (2d9afe0)
- tutorial: deploy a jupyterlite repl on the website (#9009) (9355281)
- tutorial: update documentation regarding
topk
(#8583) (4115860) - typo fix to why voda blog (882ade2)
- udfs: improve udf docstring examples (#9079) (fead40b)
- uniform connect via url across backends in why ibis (#8865) (e7f8ea8), closes #8860
- update contributing docs and env for m1/m2 devs (#8495) (1b84cd1)
- update environment setup instructions (#8476) (a18ab29)
- update links to duckdb csv parameters (#8733) (7cc67d3), closes #8732
- update outdated comments in Chaining Expressions docs (#8601) (306cdad)
- update sankey diagram (#9077) (b691959)
- update the geospatial blogpost to use explicit scalar subqueries (#8330) (aa74515)
- update the haystack blog publish date (#8949) (4ebeb19)
- update to_delta kwargs docs (#8623) (b77aeec), closes /github.com/ibis-project/ibis/blob/e94366700ae460052f675188674e24002b755ec5/ibis/backends/init.py#L536
- update TODOs in 8.0 blog (#8314) (4d4fdab)
- use algolia for searchability (#8410) (f8d8ca3)
Refactors
- add polars data mapper implementation (17f5e97)
- add polars format (40ada17)
- api: make input value coercion of
mutate()
identical toselect()
(#8878) (38e7e14) - api: remove
by
of asof_join() in favor ofpredicates
(#8700) (1a8eec8), closes #7869 - api: remove now unsupported
max_lookback
window attribute (99dda5b) - api: remove the remnants of now unsupported
timecontext
feature (#8721) (0a00a05) - api: remove unnecessary methods used to implement negate() (#8812) (f59f423)
- api: restrict arbitrary input nesting (#8917) (fd35b66)
- api: revamp
asof
join predicates (9fb3627) - api: treat integer inputs as literals instead of column references (#8884) (feeb8ae), closes #8878
- backend-api: remove the
database()
API and implementation (#8406) (ff5d078), closes #8405 - backends: remove
_metadata
method in favor of_get_schema_using_query
(#8627) (4ec7daf) - backends: remove singledispatchmethod from the sql backends (#8338) (78dc393), closes #8283
- benchmarks: remove pandas benchmarking and replace with more-representative duckdb version (#8322) (e540575)
- bigquery: port to sqlglot (bcfd7e7)
- cleanup: remove deprecated methods and properties (#8701) (90c5a86)
- common: consolidate egraph implementation by using
FrozenSlotted
(#8702) (d244b75) - common: make
FrozenDict
a subclass ofdict
(#8693) (32b7514), closes #8687 #8687 - common: support union types as well as forward references in the dispatch utilities (e4769de)
- compilers: conslidate StringJoin impl (28fb6ec)
- consolidate rewrite rule implementations (c9b8a08)
- dask: port the dask backend to the new execution model (#8005) (c925640)
- ddl: deprecate
schema
keyword in truncate_table (edde6a3) - ddl: deprecate all
*_schema
methods (c43c0f1) - delete some unused visit methods in sql compiler (f2e1465)
- deps: remove
multipledispatch
as a dependency (#8332) (d587166) - druid: port to sqlglot (85e2b16)
- dtype: move all the castable logic to a single function (#8335) (580536c)
- duckdb/clickhouse: implement sqlglot backends and re-enable ci (b7821bf)
- duckdb: initial cut of sqlglot DuckDB compiler (ca95204)
- duckdb: remove the need for a specialized
_to_geodataframe
method (a8add4b) - duckdb: use
.sql
instead of.execute
in performance-senitive locations (#8669) (aa6aa0c) - exasol: add custom TimestampTruncate (#8590) (66f12c3)
- exasol: add temp kwarg to create_table for api consistency (4267c72)
- exasol: port to sqlglot (#8032) (b8bcbf2)
- flink: port to sqlglot (8fdc75d)
- formats: remove unnecessary schema argument from schema inference (#8814) (91ea332)
- get_schema: remove
schema
kwarg, addcatalog
, kw-only (6273e7e) - impala: port to sqlglot (4a0be76)
- ir: accept any relation in
ops.ExistsSubquery
(#8264) (68287db) - ir: add
JoinTable
operation unique toJoinChain
instead of using the globally uniqueSelfReference
(2a7ae3f) - ir: dereference literal expressions (cd9219b)
- ir: give unbound tables namespaces (8ac2d2a)
- ir: impure values are never dereferencible (#9023) (33286f2)
- ir: loosen the join integrity checks (#8817) (2bc903d)
- ir: merge
ops.WindowFrame
node intoops.WindowFunction
(#8779) (3cd5a1a) - ir: remove
find_first_base_table()
analysis function (#8451) (5cbd472) - ir: remove deprecated
Schema.apply_to()
method (06962b2) - ir: remove now obsolete
__window_op__
property (8d27d14) - ir: remove now obsolete
ops.Named
mixin (#8244) (3cff9f6) - ir: remove the decimal precision promotion logic (0db3ec7)
- ir: split the relational operations (2a2f8c6)
- ir: stricter scalar subquery integrity checks (d269776)
- ir: support join of joins while avoiding nesting (40c1af7)
- ir: wrap
JoinChain.first
inops.SelfReference
similar to the rest of the join tables (15bd926) - list_tables: deprecate
schema
keyword (ed69960) - modules: remove unnecessary
base
submodule (#8405) (6946f98) - mssql: port to sqlglot (e12faa5)
- mysql: port to sqlglot (#7926) (cba2f98)
- ops: use
catalog
anddatabase
kwargs for Namespace op (21f57d4) - oracle: port to sqlglot (#8020) (fbdc909)
- oracle: remove dependency on private function for temp tables (#8480) (809912c)
- oracle: simplify oracle timestamp overrides (7d44d77)
- pandas: port the pandas backend with an improved execution model (#7797) (3b35d77), closes #7752
- pandas: simplify pandas helpers (41ddd41)
- polars: allow passing temp=False to polars create_table (89abf8e)
- polars: update the polars backend to use the new relational abstractions (#7868) (29b5b53)
- postgres: port to sqlglot (#7877) (6916c1d)
- pyspark: reimplement the backend using the new relational operations an spark SQL (32efbe7)
- pyspark: remove sqlalchemy dependency from pyspark (e9656fb)
- remove duplicated rewrite_sample rule (dd019e7)
- rename SupportSchema -> SchemaLike, fix type definition (#8427) (ad1f53a)
- risingwave: port to sqlglot (#8171) (1023e56)
- snowflake: adjust window frames to use
_minimize_spec
due to upstream snowflake changes (3a6c9f3) - snowflake: remove no longer necessary pyarrow warnings filter (6a16899)
- snowflake: use sqlglot for the snowflake backend (83acf48)
- sql: automatically add simple ops implementations (#8349) (2c64b3f), closes #8338
- sql: deprecate
schema
kwarg in insert (92fcbdf) - sql: deprecate schema in
_view
ddl methods (2af37e8) - sqlglot: clean up
explode
usage (4d99314) - sqlglot: make anonymous functions easier to use and remove
array_func
hack (5891546) - sqlglot: remove duplicate
StringAscii
definitions (0bdeb8b) - sqlglot: remove duplicated simple compilation rules and sort (d44001f)
- sqlglot: use a more backend-agnostic expression for non-finite constants (776079c)
- sqlglot: various sqlglot compiler and backend clean ups (#7904) (db45e41), closes #7871
- sqlite: port to SQLGlot (#8154) (4d24502)
- sql: move dialects to always-importable location (#8279) (c8f4afe)
- sql: remove old compiler (#8307) (88e8384)
- sql: remove sqlalchemy from the codebase (#8074) (22004ed)
- sql: remove temporary table creation when using inline sql (#8149) (ea428ba)
- sql: reorganize sqlglot rewrites (9f4851c)
- sql: simplify
FirstValue
/LastValue
usage (#8568) (6ed2e39) - table ddl: remove hierarchical schema from *_table methods (14b0944)
- table: deprecate
schema
(dfb8734) - udf: remove hierarchical usage of schema (#9078) (f5d9084)
- udfs: consolidate builtin udf compilation and failure modes for unimplemented udfs (#8398) (0320d01)
- util: remove unused
get_logger
utility function (#8760) (0aceefc)
Performance
- common: improve equality caching by explicitly invalidating the entry on
__del__
(#8708) (ac86f91) - common: improve the performance of replacing nodes using mappings (#8638) (a2e733a)
- common: reduce the average recusrion depth in
_flatten_collections
(#8709) (3d52904) - dask: avoid triggering compute for dynamic limit/offset (#8747) (b3e27eb)
- ir: avoid exponential growth on
name
attribute access (#8445) (7667328), closes #8432 - sql: don’t compile pretty sql by default (#8616) (af402f9)
- sql: prevent sqlglot from extensive deepcopying every time we create a sqlglot object (#8592) (461293b), closes #8484
Deprecations
8.0.0 (2024-02-05)
⚠ BREAKING CHANGES
- backends: Columns with Ibis
date
types are now returned as object dtype containingdatetime.date
objects when executing with the pandas backend. - impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
- api: replace
ibis.show_sql(expr)
calls withprint(ibis.to_sql(expr))
or if using Jupyter or IPythonibis.to_sql(expr)
- bigquery:
nullifzero
is removed; usenullif(0)
instead - bigquery:
zeroifnull
is removed; usefillna(0)
instead - bigquery:
list_databases
is removed; uselist_schemas
instead - bigquery: the bigquery
current_database
method returns thedata_project
instead of thedataset_id
. Usecurrent_schema
to retrievedataset_id
. To explicitly list tables in a given project and dataset, you can usef"{con.current_database}.{con.current_schema}"
Features
- api: define
RegexSplit
operation andre_split
API (07beaed) - api: support median and quantile on more types (#7810) (49c75a8)
- clickhouse: implement
RegexSplit
(e3c507e) - datafusion: implement
ops.RegexSplit
using pyarrow UDF (37b6b7f) - datafusion: set ops (37abea9)
- datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
- datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
- duckdb-geospatial: add support for flipping coordinates (d47088b)
- duckdb-geospatial: enable use of literals (23ad256)
- duckdb: implement
RegexSplit
(229a1f4) - examples: add
zones
geojson example (#8040) (2d562b7), closes #7958 - flink: add new temporal operators (dfef418)
- flink: add primary key support (da04679)
- flink: export result to pyarrow (9566263)
- flink: implement array operators (#7951) (80e13b4)
- flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
- impala: rudimentary date support (d4bcf7b)
- mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
- mssql: use odbc (f03ad0c)
- polars: implement
ops.RegexSplit
using pyarrow UDF (a3bed10) - postgres: implement
RegexSplit
(c955b6a) - pyspark: implement
RegexSplit
(cfe0329) - risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
- snowflake: implement
RegexSplit
(2c1a726) - snowflake: implement insert method (2162e3f)
- trino: implement
RegexSplit
(9d1295f)
Bug Fixes
- api: deferred values are not truthy (00b3ece)
- backends: ensure that returned date results are actually proper date values (0626fb2)
- backends: preserve
order_by
position in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940 - common: do not convert callables to resolveable objects (9963705)
- datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
- datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
- datatypes: fix bad references in
to_numpy()
(6fd4550) - deps: remove
filelock
from required dependencies (76dded5) - deps: update dependency black to v24 (425f7b1)
- deps: update dependency datafusion to v34 (601f889)
- deps: update dependency datafusion to v35 (#8224) (a34af25)
- deps: update dependency oracledb to v2 (e7419ca)
- deps: update dependency pyarrow to v15 (ef6a9bd)
- deps: update dependency pyodbc to v5 (32044ea)
- docs: surround executable code blocks with interactive mode on/off (4c660e0)
- duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
- duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
- examples: use anonymous access when reading example data from GCS (8e5c0af)
- impala: generate memtables using
UNION ALL
to work around sqlglot bug (399a5ef) - mutate/select: ensure that unsplatted dictionaries work in
mutate
andselect
APIs (#8014) (8ed19ea), closes #8013 - mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
- pandas: support non-string categorical columns (5de08c7)
- polars: avoid using unnecessary subquery for schema inference (0f43667)
- polars: handle integers coming out of high precision numpy datetime64 values (bcf36cb)
- postgres: ensure that no timezone conversion takes place on timestamptz columns when selecting them out (7b79ec8)
- repr: default to pa.binary for all geospatial dtypes (#7817) (066d3fc)
- repr: force exception message to console in IPython in interactive mode (414c49a)
- snowflake: insert into the correct object (5e1efe3)
- sqlalchemy: properly handle aliases of extracted subqueries (38aaf8f)
- sqlglot: stop using removed singletons for true, false, null (4fb0aad)
Documentation
- add composable data ecosystem concept (#7898) (d78a887), closes #6618
- add exasol to list of supported backends (4fae620)
- add ibis.join() to docs (#7913) (de2e282), closes #7895
- add image preview for index page (#7920) (ac2375a)
- add post about move to Zulip chat (#7889) (88f1ee8), closes #7888
- add quotes around install in 1brc post (#8065) (5998143)
- add user testimonials page (#7897) (c0714f8), closes #7341
- blog for the 1 billion row challenge (#8004) (141edea)
- blog-post: replicate spatial dev guru blog (4b73c3b)
- blog: redux array blog with equivalent duckdb and bq expressions (5bde8da)
- blog: show how to install geospatial dependencies (951a169)
- blog: update geospatial - no need to_array() (78434a0)
- contrib: add pull request template (effd461)
- deps: bump quarto version to pick up dashboard feature (79657db)
- dev: update maintainers guide (d67409c)
- document possible range of seed values to
Table.sample
(6a652ec) - duckdb: correct wording for empty path logic (72b2cde)
- fix formatting for note on
_name
,_dtype
(#7911) (e58be2e) - fix rolling date on bigquery/duckdb array blog (#8059) (fb09b78)
- flink: add to the set of documented backends (83eab61)
- flink: override default install instructions (4fc8e75)
- geospatial: add examples for duckdb supported methods (#8128) (2a92306), closes #7959
- geospatial: fix flaky ci geo-literals doctests (417e81d)
- hyphenate “properly formatted” and add colon (5ab1c27)
- ibis-analytics blog post (#7990) (17a1ef2)
- improve UDF signature docs (#8194) (3cdc6ce)
- include American spelling usage in style guide (#8163) (ac72157), closes #8162
- kedro blog post link (#8150) (1ffe435)
- meta: add goatcounter to header of all quarto pages (fd2e6c9)
- minor edit to the who supports ibis doc (#7896) (d5a0779)
- minor update to composable data ecosystem concept (a46bd4a)
- pandas: fix format for kwarg warning callout (0f6d45d)
- pyspark: document
ibis.connect
using a URL (d6049f8) - pyspark: mention using
ibis.connect
(33c855a) - random: document behavior of repeated use of ibis.random() instance (f4b67e5)
- row_number always starts at 0 (#8209) (5a26c05)
- security: add a security policy (33e9f26)
- sql-tutorial: fix minor typo in union section of SQL user tutorial (ca6c2a5)
- style: add style guide to contributing (#8092) (b807555), closes #7094
- support-matrix: replace the backend info streamlit app with a static quarto dashboard (f9da637)
- update quickstart to use rename (#8196) (9ed4e92)
- update release date on Ibis geospatial dev guru post (175f141)
- who supports Ibis (#7892) (1a5a420), closes #7743
Refactors
- api: remove
show_sql
in favor ofprint(to_sql)
(36da8c1) - bigquery: remove
list_databases
(22e5ada) - bigquery: remove
nullifzero
(8447b9a) - bigquery: remove
zeroifnull
(8be3c25) - bigquery: return data_project as database, not dataset_id (05608eb)
- deps: make
pins
an optional dependency through anexamples
extra (#7878) (3d6c3f1), closes #7844 - flink: expose
raw_sql
over_exec_sql
(0b66b94) - impala: modernize the impala backend (252833d)
Deprecations
- deprecate Value.least() and Value.greatest() (f711337)
7.2.0 (2023-12-18)
Features
- api: add
ArrayValue.flatten
method and operation (e6e995c) - api: add
ibis.range
function for generating sequences (f5a0a5a) - api: add timestamp range (c567fe0)
- base: add
to_pandas
method to BaseBackend (3d1cf66) - clickhouse: implement array flatten support (d15c6e6)
- common:
node.replace()
now supports mappings for quick lookup-like substitutions (bbc93c7) - common: add
node.find_topmost()
method to locate matching nodes without descending further to their children (15acf7d) - common: allow matching on dictionaries in possibly nested patterns (1d314f7)
- common: expose
node.__children__
property to access the flattened list of children of a node (2e91476) - duckdb: add initial support for geospatial functions (65f496c)
- duckdb: add read_geo function (b19a8ce)
- duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
- duckdb: implement array flatten support (0a0eecc)
- exasol: add exasol backend (295903d)
- export: allow passing keyword arguments to PyArrow
ParquetWriter
andCSVWriter
(40558fd) - flink: implement nested schema support (057fabc)
- flink: implement windowed computations (256767f)
- geospatial: add support for GeoTransform on duckdb (ec533c1)
- geospatial: update read_geo to support url (3baf509)
- pandas/dask: implement flatten (c2e8d9d)
- polars: add
streaming
kwarg toto_pandas
(703507f) - polars: implement array flatten support (19b2aa0)
- pyspark: enable multiple values in
.substitute
(291a290) - pyspark: implement array flatten support (5d1fadf)
- snowflake: implement array flatten support (d3c754f)
- snowflake: read_csv with https (72752eb)
- snowflake: support udf arguments for reading from staged files (529a3a2)
- snowflake: use upstream
array_sort
(9624341) - sqlalchemy: support expressions in window bounds (5dbb3b1)
- trino: implement array flatten support (0d1faaa)
Bug Fixes
- api: avoid casting to bool for
table.info()
nullable
column (3b3bd7b) - bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
- bigquery: fully qualified memtable names in compile (a81e432)
- clickhouse: use backwards compatible methods of getting query metadata (975556f)
- datafusion: bring back UDF registration (43084fa)
- datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
- datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
- decompile: handle isin (6857751)
- deferred: don’t pass expression in fstringified error message (724859d)
- deps: update dependency datafusion to v33 (57047a2)
- deps: update dependency sqlglot to v20 (13bc6e2)
- duckdb: ensure that already quoted identifiers are not erased (45ee391)
- duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
- duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
- duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
- duckdb: use functions for temporal literals (b1407f8)
- duckdb: use the UDF’s signature instead of arguments’ output type for generating a duckdb signature (233dce1)
- flink: add more test (33e1a31)
- flink: add os to the cache key (1b92b33)
- flink: add test cases for recreate table (1413de9)
- flink: customize the list of base idenitifers (0b5d343)
- flink: fix recreating table/view issue on flink backend (0c9791f)
- flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
- flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
- geospatial: pretty print data in interactive mode (afb04ed)
- ir: ensure that join projection columns are all always nullable (f5f35c6)
- ir: handle renaming for scalar operations (6f77f17)
- ir: handle the case of non-overlapping data and add a test (1c9ae1b)
- ir: implicitly convert
None
literals withdt.Null
type to the requested type during value coercion (d51ec4e) - ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
- ir: raise if
Concrete.copy()
receives unexpected arguments (442199a) - memtable: ensure column names match provided data (faf99df)
- memtables: disallow duplicate column names when constructing memtables (4937b48)
- mssql: compute the length of strings correctly (64d2957)
- mssql: render dates, times and timestamps correctly (aca30e1)
- mysql: render dates and timestamps correctly (19e878c)
- oracle: ensure that
.sql
metadata results are in column-definition order (26a3c1f) - oracle: render dates and timestamps correctly (66fbad6)
- pandas-format: convert map keys (bb92e9f)
- pandas: ensure that empty arrays unnest to nothing (fa9831f)
- pandas: fix integer wraparound when extracting epoch seconds (e98fa3c)
- pandas: handle non-nullable type mapping (c6a6c56)
- parse_sql: parse IN clauses (8b1f7b5)
- polars: handle new categorical types (5d6d6ae)
- polars: handle the case of an empty
InValues
list (b26aa55) - polars: project first when creating computed grouping keys (7f9fdd4)
- postgres: render dates, times, timestamps and none literals correctly (a3c1c07)
- pyarrow: avoid catching
ValueError
and hiding legitimate failures (b7f650c) - pyspark,polars: add packaging extra (bdde3a4)
- pyspark: custom format converter to handle pyspark timestamps (758ec25)
- snowflake: convert arrays, maps and structs using the base class implementation (f361891)
- snowflake: convert path to
str
when checking for a prefix (c5f884c) - snowflake: ensure that empty arrays unnest to nothing (28c2498)
- snowflake: fix array printing by using a pyarrow extension type (7d8fe5a)
- snowflake: fix creating table in a different database (9b65b48)
- snowflake: fix quoting across all apis (7bf8e84)
- substitute: allow mappings with
None
keys (4b28ff1)
Documentation
- add exasol to the backend coverage app (3575858)
- arrays: document behavior of unnest in the presence of empty array rows (5526c40)
- backends: include docs for inherited members (c04bf67)
- blog-post: add blog post comparing ibis to pandas and dask (a7fd32b)
- blog-post: add blogpost ibis duckdb geospatial (def8031)
- blog-post: pydata performance part 2; polars and datafusion (36e1db5)
- blog: add dbt-ibis post (d73c156)
- blog: add pypi compiled file extension blog (751cfcf)
- build: allow building individual docs without rendering api docs first (529ee6c)
- build: turn off interactive mode before every example (502b88c)
- fix minor typo in sql.qmd (17aa929)
- fix typo in
ir.Table
docstring (e3b9611) - fix typos (9a4d1f8)
- make minor edits to duckdb-geospatial post (2365e10)
- name: improve docstring of
ibis.param
API (2f9ec90) - name: improve docstring of
Value.name
API (dd66af2) - perf: use an unordered list instead of an ordered one (297be44)
- pypi-metadata-post: add Fortran pattern and fix regex (12058f2)
- remove confusing backend page (c1d19c7)
- replace deprecated
relabel
s withrename
s (6bc9e15) - sql: emphasize the need to close a
raw_sql
cursor only when usingSELECT
statements (74379a8) - tests: add API docs for the testing base classes (173e9a9)
- tests: document class variables in BackendTest (e814c6b)
Refactors
- analysis: always merge frames during windowization (66fd69c)
- bigquery: move
BigQueryType
to use sqlglot for type parsing and generation (6e3219f) - clickhouse: clean up session timezone handling (66220c7)
- clickhouse: use isoformat instead of manual specification (a3fac3e)
- common: consolidate the finder and replacer inputs for the various graph methods (a1881eb)
- common: remove
traverse()
function’sfilter
argument since it can be expressed using the visitor (e4e2993) - common: unify the
node.find()
andnode.match()
methods to transparently support types and patterns (3c14091) - datafusion: simplify
execute
andto_pyarrow
implementations (c572eab) - duckdb: use pyarrow for all memtable registration (d6a2f09)
- formats: move the
TableProxy
object to formats from the operations (05964b1) - pandas-format: move to classmethods to pickup super class behavior where possible (7bb0470)
- snowflake: use upstream map-from-arrays function instead of a custom UDF (318459c)
- tests: remove test rounding mixins (3b730d9)
- tests: remove UnorderedComparator class (ab0a8f6)
Performance
- common: improve the performance of replacing nodes by using a specialized
node.__recreate__()
method (f3da926)
7.1.0 (2023-11-16)
Features
- api: add
bucket
method for timestamps (ca0f7bc) - api: add
Table.sample
method for sampling rows from a table (3ce2617) - api: allow selectors in
order_by
(359fd5e) - api: move analytic window functions to top-level (8f2ced1)
- api: support deferred in reduction filters (349f475)
- api: support specifying
signature
in udf definitions (764977e) - bigquery: add
location
parameter (d652dbb) - bigquery: add
read_csv
,read_json
,read_parquet
support (ff83110) - bigquery: support temporary tables using sessions (eab48a9)
- clickhouse: add support for timestamp
bucket
(10a5916) - clickhouse: support
Table.fillna
(5633660) - common: better inheritance support for Slotted and FrozenSlotted (9165d41)
- common: make Slotted and FrozenSlotted pickleable (13cbce0)
- common: support
Self
annotations forAnnotable
(0c60146) - common: use patterns to filter out nodes during graph traversal (3edd8f7)
- dask: add read_csv and read_parquet (e9260af)
- dask: enable pyarrow conversion (2d36722)
- dask: support
Table.sample
(09a7626) - datafusion: add case and if-else statements (851d560)
- datafusion: add corr and covar (edc42be)
- datafusion: add isnull and isnan operations (0076c25)
- datafusion: add some array functions (0b96b68)
- datafusion: add StringLength, FindInSet, ArrayStringJoin (fd03831)
- datafusion: add TimestampFromUNIX and subtract/add operations (2bffa5a)
- datafusion: add TimestampTruncate / fix broken extract time part functions (940ed21)
- datafusion: support dropping schemas (cc6870c)
- duckdb: add
attach
anddetach
methods for adding and removing databases to the current duckdb session (162b058) - duckdb: add
ntile
support (bf08a2a) - duckdb: add dict-like for DuckDB settings (ea2d317)
- duckdb: add support for specific timestamp scales (3518b78)
- duckdb: allow users to register fsspec filesystem with DuckDB (6172f07)
- duckdb: expose option to force reinstall extension (98080d0)
- duckdb: implement
Table.sample
as aTABLESAMPLE
query (3a80f3a) - duckdb: implement partial json collection casting (aae28e9)
- flink: add remaining operators for Flink to pass/skip the common tests (b27adc6)
- flink: add several temporal operators (f758228)
- flink: implement the
ops.TryCast
operation (752e587) - formats: map ibis JSON type to pyarrow strings (79b6eac)
- impala/pyspark: implement
to_pyarrow
(6b33454) - impala: implement
Table.sample
(8e78dfc) - implement window table valued functions (a35a756)
- improve generated column names for methods receiving intervals (c319ed3)
- mssql: add support for timestamp
bucket
(1ffac11) - mssql: support cross-db/cross-schema table list (3e0f0fa)
- mysql: support
ntile
(9a14ba3) - oracle: add fixes after running pre-commit (6538b70)
- oracle: add fixes after running pre-commit (e3d14b3)
- oracle: add support for loading Oracle RAW and BLOB types (c77eeb2)
- oracle: change parsing of Oracle NUMBER data type (649ab86)
- oracle: remove redundant brackets (2905484)
- pandas: add read_csv and read_parquet (34eeca6)
- pandas: support
Table.sample
(77215be) - polars: add support for timestamp
bucket
(c59518c) - postgres: add support for timestamp
bucket
(4d34afc) - pyspark: support
Table.sample
(6aa897e) - snowflake: support
ntile
(39eed1a) - snowflake: support cross-db/cross-schema table list (2071897)
- snowflake: support timestamp bucketing (a95ffa9)
- sql: implement
Table.sample
as arandom()
filter across several SQL backends (e1870ea) - trino: implement
Table.sample
as aTABLESAMPLE
query (f3d044c) - trino: support
ntile
(2978d1a) - trino: support temporal operations (8b8e885)
- udf: improve mypy compatibility for udf functions (65b5bb7)
- use
to_pyarrow
instead ofto_pandas
in the interactive repr (72aa573) - ux: fix long links, add repr links in vscode (734bd91)
- ux: implement recursive element conversion for nested types and json (8ddfa94)
- ux: render url strings as links in rich table output (1c7a9b6)
- ux: show syntax-highlighted SQL if pygments is installed (09881b0)
Bug Fixes
- bigquery: apply unnest transformation in other methods that execute SQL (2cc9d0e)
- bigquery: avoid trying to filter separator argument to
GroupConcat
operation (ed3b017) - bigquery: ensure that the identifier is parsed according to the dialect (f5bb555)
- bigquery: move sql code to proper argument (abb0bdd)
- datafusion: do_connect: properly deal with config-is-actually-context (649480c)
- datafusion: fix some temporal operations (3206dbc)
- datatypes: correct uint upper bounds (5ca56d5)
- datatypes: correct unsigned integer bounds (1e40d4e)
- deps: bump pins lower bound to pickup transitive fsspec upper bound (983e23e)
- deps: bump sqlglot lower bound (a47be79)
- deps: pin pyspark to a working version (7eb8a19)
- deps: update dependency datafusion to v32 (1afbe9c)
- deps: update dependency pyarrow to v14 (bce86c4)
- deps: update dependency sqlglot to v19 (1f3ae07)
- duckdb: ensure proper quoting when compiling cross database/schema tables (8d7b5fa)
- duckdb: query table list directly instead of relying on sqlalchemy (5d7822c)
- duckdb: use connect instead of begin to avoid nesting transactions (6889543)
- flink: cast argument to integer for reduction (5059eed)
- flink: correct the filtered count translation (2cbca74)
- flink: re-implement
ops.ApproxCountDistinct
(2e3a5a0) - ir:
ibis.parse_sql()
removes where clause (522f3a4) - ir: coerce integers passed to
Value[dt.Floating]
annotated values asdt.float64
(b8a924a) - ir: ensure that windowization directly wraps the reduction/analytic function (772df36)
- mssql: support translation of ops.Neg() when projecting a field (ca49d2a)
- oracle: change filter inside select into case when (c743fa2)
- oracle: disable if_exists for Oracle drop view command (973133b)
- oracle: fix fallback column type inference (fb5d56d)
- pandas: drop
__index_level_N__
cols before applying schema (b53feac) - patterns:
Object
pattern should match on positional arguments first (96c796f) - patterns:
PatternList
should keep the original pattern’s type (6552639) - polars: bump lower bound to 0.19.8 and clean up a bunch of backcompat code (462bd17)
- polars: various polars enhancements (5948dd6)
- repr: add dispatch for repr of GeoSpatialBinOps (843d086)
- snowflake: include views when listing tables for backwards compatibility (094881b)
- snowflake: support snowflake 3.3.0 (nanoarrow) (a0f24e8)
- sqlalchemy: ensure that limit on
.sql
calls works (a5e3062) - sqlite: handle BLOB datatype (d36ed1c)
- sqlite: truncate week to previous week not following (6239794)
- sql: subtract one from ntile output in string-generating backends (1d264dc)
- support self joins on memtables (f24e355)
- trino: enable passing the database argument when accessing tables (e7ce43e)
- trino: ensure that a schema is not required upon connection when accessing tables with explicit schema (8bde3e0)
- use
pyarrow_hotfix
where necessary (0fa1e5d)
Documentation
- add .nullif() example (6d405df)
- add “similar to pandas …” to docstrings (cd7be29)
- add basic intro docstring to Table class (1a68f31)
- add callout note for
Table.sample
(51027d9) - add copyright holders to license (ca97dfb)
- add deprecation to .nullifzero docstring (8502e81)
- add example to Value.hash() (501ae92)
- add examples to Value.typeof() (c146381)
- add more examples to Table.select() (735bbd0)
- add See Also sections to some APIs (be8938f)
- clickhouse: freeze clickhouse backend docs to avoid rate limit from upstream playground (e3a7eac)
- contribute: fix instructions for nix environment setup (013cedd)
- contribute: fix path to conda-lock files for contributors (ef5bdf9)
- dedupe 6.2.0 and 7.0.0 release notes (7ce4b1a)
- fix and improve .isin() docstring (063cfba)
- fix dask compile docstring typo (d38d2c4)
- fix link in Value.type() docstring (43b798c)
- fixup link (d4c97b0)
- flink: add backend back to support matrix df (e846e80)
- improve .between() docstring (a086134)
- improve .case() and .cases() docstrings (7fc89e8)
- improve cast() and try_cast() docstrings (0b686e8)
- improve cross-linking within reference (9e45194)
- improve examples for Table.order_by() (9465b2a)
- improve join() docstring (84c08c6), closes #7424
- improve re_replace docstring (f55d0db)
- improve Table.columns docstring (d50558b)
- mysql: render_do_connect mssql to mysql (3c2da6c)
- pandas: show methods from
BasePandasBackend
(20fd120) - ranking: add ranking function docstrings (750bfeb)
- setup codespace configuration [skip ci] (5363b94)
- style: replace Black with Ruff in guidelines (1db3047)
- temporal: add
Literal
annotation to display possible units fordelta
method (ee94cb5) - trino: add details for connecting to starburst (ca9873a)
- trino: add note about SSO configuration (457534b)
- udfs: fix udf interlink locations (c26e48b)
Refactors
- analysis: remove
_rewrite_filter()
in favor of using replacement patterns (4c0ac2e) - analysis: remove
is_reduction()
(2acc31f) - analysis: remove
pushdown_aggregation_filters()
(cf95ff7) - analysis: remove
sub_for()
,substitute()
,find_toplevel_aggs()
(492b296) - analysis: remove
substitute_parents()
(cd91a7e) - analysis: remove
substitute_unbound()
since it is used at a single place (6a6ad19) - analysis: simplify and improve
pushdown_selection_filters()
(2e47738) - analysis: vastly simplify windowize_function (998bbaa)
- backends: move
read_delta
to base io handler (3d5a684) - bigquery: add schema kwarg to list_tables (95be62f)
- bigquery: remove session use (60e7900)
- bigquery: remove unused
BigQueryTable
object (b83e60e) - clean up
lit
usage (1bc6cee) - clickhouse: apply repetitive transformations as pattern replacements (e966af8)
- clickhouse: replace
lit
with builtin sqlglot functions (221b630) - clickhouse: use a pattern for one-to-zero index conversion of ranking window functions (732c031)
- clickhouse: use sqlglot for
create_table
implementation (ea0826d) - common: remove
ibis.common.bases.Base
in favor ofAbstract
(8ed313c) - datafusion: create registry of time udfs to create them only once (9ed0a89)
- docker-compose: clean up unused exposed ports and make envar spec uniform (7ee518d)
- duckdb: remove lit (6f77df9)
- flink: use
FILTER
syntax when counting (815c12f) - imports: move pandas-importing object to method (103a524)
- ir: remove
ibis.expr.streaming
(70df318) - ir: remove ops.Negatable, ops.NotAny, ops.NotAll, ops.UnresolvedNotExistsSubquery (e31e8fd)
- ir: unify
ibis.common.pattern
builders andibis.expr.deferred
(652ceab) - make _WellKnownText not a NamedTuple (9a9e733)
- oracle: deprecate database for schema in list_tables (c8ea79f)
- patterns: support more flexible sequence matching (b8e463d)
- postgres: deprecate database for schema in list_tables (d622730)
- remove unused
*args
in udf functions (e22236c) - sql: align logic for filtered reductions (0347036)
- temporal: remove unnecessary
Temporal*
classes (d3bcf73) - trino: support better cross-db/cross-schema table list (d2cf1c9)
- use rewrite rules to handle fillna/dropna in sql backends (f5e06a6)
Performance
- bigquery: use more efficient representation for memtables (697d325)
7.0.0 (2023-10-02)
⚠ BREAKING CHANGES
- api: the
interpolation
argument was only supported in the dask and pandas backends; for interpolated quantiles use dask or pandas directly - ir: Dask and Pandas only; cumulative operations that relied on implicit ordering from prior operations such as calls to
table.order_by
may no longer work, passorder_by=...
into the appropriate cumulative method to achieve the same behavior. - api: UUID, MACADDR and INET are no longer subclasses of strings. Cast those values to
string
to enable use of the string APIs. - impala:
ImpalaTable.rename
is removed, useBackend.rename_table
instead. - pyspark:
PySparkTable.rename
is removed, useBackend.rename_table
instead. - clickhouse:
ClickhouseTable
is removed. This class only provided a singleinsert
method. Use the Clickhouse backend’sinsert
method instead. - datatypes: The minimum version of
sqlglot
is now 17.2.0, to support much faster and more robust backend type parsing. - ir: ibis.expr.selectors module is removed, use ibis.selectors instead
- api: passing a tuple or a sequence of tuples to table.order_by() calls is not allowed anymore; use ibis.asc(key) or ibis.desc(key) instead
- ir: the
ibis.common.validators
module has been removed and all validation rules fromibis.expr.rules
, either use typehints or patterns fromibis.common.patterns
Features
- api: add
.delta
method for computing difference in units between two temporal values (18617bf) - api: add
ArrayIntersect
operation and correspondingArrayValue.intersect
API (76c95b2) - api: add
Backend.rename_table
(0047143) - api: add
levenshtein
edit distance API (ab211a8) - api: add
relocate
table expression API for moving columns around based on selectors (ee8a86f) - api: add
Table.rename
, with support for renaming via keyword arguments (917d7ec) - api: add
to_pandas_batches
(740778f) - api: add support for referencing backend-builtin functions (76f5f4b)
- api: implement negative slice indexing (caee5c1)
- api: improve repr for deferred expressions containing Column/Scalar values (6b1218a)
- api: improve repr of deferred functions (f2b3744)
- api: support deferred and literal values in
ibis.ifelse
(685dbc1) - api: support deferred arguments in
ibis.case()
(6f9f7c5) - api: support deferred arguments to
ibis.array
(b1b83f9) - api: support deferred arguments to
ibis.map
(86c8669) - api: support deferred arguments to
ibis.struct
(7ef870d) - api: support deferred arguments to udfs (a49d259)
- api: support deferred expressions in
ibis.date
(f454a71) - api: support deferred expressions in
ibis.time
(be1fd65) - api: support deferred expressions in
ibis.timestamp
(0e71505) - api: support deferred values in
ibis.coalesce
/ibis.greatest
/ibis.least
(e423480) - bigquery: implement array functions (04f5a11)
- bigquery: use sqlglot to implement functional unnest to relational unnest (167c3bd)
- clickhouse: add
read_parquet
andread_csv
(dc2ea25) - clickhouse: add support for
.sql
methods (f1d004b) - clickhouse: implement builtin agg functions (eea679a)
- clickhouse: support caching tables with the
.cache()
method (621bdac) - clickhouse: support reading parquet and csv globs (4ea1834)
- common: match and replace graph nodes (78865c0)
- datafusion: add coalesce, nullif, ifnull, zeroifnull (1cc67c9)
- datafusion: add ExtractWeekOfYear, ExtractMicrosecond, ExtractEpochSeconds (5612d48)
- datafusion: add join support (e2c143a)
- datafusion: add temporal functions (6be6c2b)
- datafusion: implement builtin agg functions (0367069)
- duckdb: expose loading extensions (2feecf7)
- examples: name examples tables according to example name (169d889)
- flink: add batch and streaming mode test fixtures for Flink backend (49485f6)
- flink: allow translation of decimal literals (52f7032)
- flink: fine-tune numeric literal translation (2f2d0d9)
- flink: implement
ops.FloorDivide
operation (95474e6) - flink: implement a minimal PyFlink
Backend
(46d0e33) - flink: implement insert dml (6bdec79)
- flink: implement table-related ddl in Flink backend to support streaming connectors (8dabefd)
- flink: implement translation of
NULLIFZERO
(6ad1e96) - flink: implement translation of
ZEROIFNULL
(31560eb) - flink: support translating typed null values (83beb7e)
- impala: implement
Backend.rename_table
(309c999) - introduce watermarks in ibis api (eaaebb8)
- just chat to open Zulip in terminal (95e164e)
- patterns: support building sequences in replacement patterns (f320c2e)
- patterns: support building sequences in replacement patterns (beab068)
- patterns: support calling methods on builders like a variable (58b2d0e)
- polars: implement new UDF API (becbf41)
- polars: implement support for builtin aggregate udfs (c383f62)
- polars: support reading ndjson (1bda3bd)
- postgres: implement array functions (fe41d57)
- postgres: implement array sort (4791cb4)
- postgres: implement array union (6d3d518)
- pyspark: enable reading csv and parquet globs and implement
read_json
(d487e10) - pyspark: enable the new scalar UDF API (f29a8e7)
- pyspark: implement
Backend.rename_table
(0a8b201) - selectors: support column references in column selector (d4fae08)
- snowflake: add
ArrayRemove
implementation (4f9d9f9) - snowflake: allow disabling creation of object UDFs (569aa12)
- snowflake: handle glob patterns in
read_csv
,read_parquet
andread_json
(adb8f4c) - snowflake: implement
ops.ArrayRepeat
(a93cbd6) - snowflake: implement
read_csv
(3323156) - snowflake: implement
read_json
(ec870a2) - snowflake: implement
read_parquet
(e02888b) - snowflake: implement array sort (465fae1)
- snowflake: support literal map key contains check (dbe7d4e)
- sql: add
database
argument tolist_schemas
(22ceba7) - sqlalchemy: support builtin aggregate functions (3b27e23)
- sqlite: implement caching support (0677f8d)
- tests: support defining datatype nullability for hypothesis strategies (ff26fb8)
- trino: cross-schema table support (9c7c65f)
- udf: add support for builtin aggregate UDFs (8ee12bf)
- udf: support inputs without type annotations (99e531d)
- ux: promote lists of strings to
any_of
selectors (5e11529)
Bug Fixes
- api: ensure that deferred objects cannot be converted into literals (b37804a)
- api: ensure that normalization of boolean, ints and floats fail with readable error message (556f7cc)
- api: ensure the order of duplicate non-renamed columns in
relocate
is preserved (19a59aa) - api: fail on trying to construct an iterable of a deferred object (89bf919)
- api: improve error message for bad arguments to
Table.select
(258a289) - api: support passing
functools.partial
objects to array.map
/.filter
methods (28f45d0) - bigquery: generate the correct temporal literal type based on the presence of timezone information (98a6ae0)
- bigquery: quote struct field names in memtable when necessary (b1fcde8)
- clickhouse: do not always prefix the table name with database, because temp tables cannot be assigned a database (5f88102)
- clickhouse: list temporary tables with
list_tables
(758a875) - clickhouse: make sure that
array1.union(array2)
null handling matches across backends (8d42794) - clickhouse: workaround clickhouse_connect usage of removed APIs in pandas 2.1.0 (577599a)
- clip: preserve nulls when clipping (c12dfa4)
- common:
pattern()
factory should construct aCoercedTo(type)
pattern from coercible types (09be2cd) - common: disallow plain string inputs for SequenceOf patterns (578980d)
- common: disallow type coercion when checking for generic type fields (df63e8b)
- common: support optional keyword-only parameters when validating callables (519a9e0)
- datafusion: cast division inputs to float64 before dividing (197342d)
- datatypes: decimal normalization failed for integers (5213958)
- deps: update dependency datafusion to v28 (1a8b223)
- deps: update dependency datafusion to v31 (fa0a8bd)
- deps: update dependency pyarrow to v13 (43dc1e1)
- deps: update dependency sqlglot to v18 (5fa0083)
- drop: support deferred objects in calls to drop (d27374b)
- druid: avoid double escaping percent-signs in strings (1d1f7bd)
- druid: convert type strings to lowercase before looking up (4a838f7)
- druid: ensure that string types are translated to
VARCHAR
(56e6ffc) - dtypes: switch scale and timestamp parameter order when formatting a timestamp datatype (302b122)
- duckdb: load httpfs with read_csv from s3 (da1b95f)
- duckdb: make sure that
array1.union(array2)
null handling matches across backends (849dea4) - duckdb: remove hack to workaround bug that was fixed upstream (310c521)
- duckdb: workaround aggressive importing on the duckdb side (105e2d6)
- flink: correct
ops.RegexSearch
translations (a3427a1) - flink: correct the translation of
ops.Power
(42d2236) - flink: correct translation of
ops.IfNull
op (85de81c) - flink: fix the pandas conversion in
execute
(2f6564f) - flink: fix translation of
ops.TimestampDiff
(580eff7) - flink: implement an in-memory table formatter (217a14b)
- flink: remove broken, untested
epochseconds
(f18c760) - flink: rewrite
ops.Clip
using if statements (b7153ea) - flink: rewrite
ops.Date
as a cast operation (2470e81) - flink: translate
ops.RandomScalar
torand
(c485a92) - format: support rendering empty schemas (f8faada)
- histogram: ensure that the bin width calculation matches numpy (e6a0037)
- impala: allow arbitrary connection params (f251289)
- mysql: handle null literals (79788c7)
- oracle: clarify sid vs service_name handling and allow dsn (d4ea3bf)
- oracle: ensure that metadata queries use SQL and not sqlplus-specific syntax (2c1bf93)
- pandas: compatibility with 2.1 groupby behavior (ab3fc9e)
- patterns: fix pattern mismatch error for default
Pattern
(f68079a) - patterns: support optional keyword arguments in CallableWith (a78aa60)
- patterns: support passing mappings to Getitem builder (25864cf)
- patterns: support string inputs for builder() (3610e52)
- polars: polars no longer panics on a value_counts-ed expression (e14185a)
- pyspark: default to inferring the schema of CSV files and assuming they have a header with
header=True
(0ffda75) - pyspark: gate datediff op to restore pyspark 3.2 support (4a8d611)
- pyspark: gate other usage of DayTimeIntervalType for PySpark 3.2 (ab01de0)
- remove pandas license (476a659)
- repr: ensure that column expressions are not promoted to table when repring non-interactively (d57a162)
- selectors: error when trying to select a non-existent column with
s.c
(ae3e76e) - snowflake: allow backend to choose how to prefix table names during compilation (933fb32)
- snowflake: disable filter and map (53bc22e)
- snowflake: ensure that laterals joins with newlines are also rewritten (dfd3c9b)
- snowflake: ensure the correct compilation of tables from other databases and schemas (0ee68e2)
- snowflake: fix timestamp scale inference (083bdae)
- snowflake: use ibis-defined
array_sort
until upstream lands (6f7e13d) - sqlalchemy: ignore database when specified with
temp=True
(04461d5) - sql: avoid reselecting relations that do not need it to prevent dropping order by clauses (8ae2f03)
- sqlglot: ensure back compat for
DataTypeParam
import (65851fc) - struct-column: make
ops.StructColumn
dshape depend on its input (7086d58) - trino: differentiate between a single column struct and a non-struct column (b1f1939)
- type hints: improvements to type hints in ibis.expr (297b449)
- type hints: remove notimplemented as type hints as not valid (57ea7a1)
- type hints: various improvements to type hints in common (ff00347)
Documentation
- add
functools.partial
and lambda closures toArrayValue.map
andArrayValue.filter
(e245e83) - add
ibis.connect
totop_level
API docs (0d197e8) - add 404 page (8b2de41)
- add 6.2.0 release notes (f5a2aed)
- add back goatcounter to website (a0095bf)
- add docs issue to navbar (ea41e53)
- add exending how-to guides (0bad961)
- add how-to for working with raw sql strings (3e08556)
- add interactivity to altair example (0037fd8)
- add interactivity to altair example on homepage (217f080)
- add more redirects based on Google search console findings (fe890a4)
- add proper zulip icon (f854327)
- add some prose and move operation support matrix (48b7e34)
- add UDF API documentation (5354689)
- add v6.1.0 release blog (a66f7b7)
- arrays: update blog post to include
unnest
examples (e765712) - backends: add support more supported IO types (124f085)
- backends: explain how to release a cursor and suggest using
.sql
instead (1e1a574) - basic starburst galaxy tutorial (a7a49ca)
- blog: add bigquery arrays 7.0.0 blog post (8f2a40f)
- blog: add tags to blog posts (1527655)
- blog: embed the torch youtube video directly (70515ed)
- blog: snowflake io (ee8c512)
- bring back backend API documentation (df981d5)
- bring back versioning policy doc (9dc8966)
- clean up extending tutorials (8da58d4)
- community to contriute (f02c2fb)
- dark mode for life (2fa181a)
- default to short signature but show full path for top level functions (9793862)
- draft posts to draft and very minor edits (43499e3)
- ensure that all parameters elements overflow with a scrollbar (5002d9f)
- expose API under
ibis
when possible (bc05ced) - fix edit this page button (284b48a)
- fix links from install to connect (9bf27fb)
- fix numerics and move connection apis elsewhere (95bb2e0)
- fix setuptools extra install style (d9ab537)
- fix zulip link for new members (3732f46)
- gitter -> zulip (ef79e64)
- give reference docs a more organized layout (583af94)
- hand roll
datatypes.core
APIs to avoid documenting private types (6ad8069) - import ibis in doctests (7f340de)
- include full name of signatures (66a58e2), closes /github.com/ibis-project/ibis/pull/7159#issuecomment-1735845163
- install: point to
connect
anchor instead ofdo_connect
(d680b40) - IO: add missing word, add line breaks (f4fdfd3)
- language: de-simple-fy prose in docs (0617271)
- link to zulip in
README.md
(15112bb) - major home page refactor (a4e4569)
- minor blog fixes/prose update (304edd1)
- minor blog update (df14997)
- minor consistency on capitalization of versioning concept (b838760)
- minor fix in starburst tutorial (54473ba)
- more redirects and add ibis 3.0.2 (8d900ee)
- move column selectors closer to relevant expression page (ea3a090)
- nix: add configuration notes to nix environment setup (1c60318)
- only render file support methods once (98b348c)
- port bigquery ci-analysis blog post to use the
delta
API (e543b1d) - quarto: add quartodoc interlinks filter (9f7a1ef)
- quarto: make blog post titles visible in light mode (be2e95f)
- quarto: override api code blocks with custom renderer (b504fee)
- quarto: shorten method signature names (e37bfea)
- quart: remove temporary eval false setting (0b9e3a3)
- refactor and move to quarto (487a5e5)
- reference: fix ibis.ifelse() docstring (a80bb75)
- reference: improve descriptions of sections (6a4924a)
- reference: move collections API from global (4780536)
- reference: move generic API from global (ba1f72e)
- reference: move numeric/bool API from global (f3f23ac)
- reference: move Table API from global (efcc2fb)
- reference: move temporal API from global (c452fbb)
- reference: move types API from global (662c509)
- reference: rename Complex to Collection (194afa7)
- reference: rename top-level to connection (9b9cd03)
- reference: simplify title of Generic section (ef86165)
- reference: soft-deprecate ibis.where (3c94f7b)
- reference: sort numeric before strings (0df8bba)
- remove duckdb code annotations (b0bcdde)
- remove extra bits in zulip links (29680a3)
- remove final instances of gitter (69f941a)
- remove keywords (53a9f9f)
- remove old S3 comment in impala docs (57c0596)
- remove old-style schema construction from examples and docstrings (1b1c33a)
- remove stray bracket (72c9039)
- remove streamlit app on front page (d6d498e)
- remove underscores that are not deferreds in doctest (5d300a9)
- remove warning on front page (e68ec90)
- set expectations for the impala backend (09c7678)
- some how-to updates (6627016)
- swap release notes and contribute (a242e31)
- update
poetry
version (15e77f7) - update link to ‘example repository’ instead of ‘tutorial’ (c20d3ee)
- update link to sqlalchemy tutorial (047aef7)
- why ibis and other edits (a3c1c3f)
Refactors
- add deferrable decorator (b09d978)
- add type annotations to set operation functions (13f593b)
- add types to Case and Window Builders (b85b424)
- analysis: remove
find_memtables
function in favor ofnode.find()
(c4658e7) - analysis: remove
find_phyisical_tables()
function in favor ofnode.find()
(4daf2df) - analysis: remove
is_analytic
function in favor ofnode.find()
(0452810) - analysis: remove ScalarAggregate, reduction_to_aggregation and has_multiple_bases (ed75866)
- analysis: rewrite substitute_unbound to use the new pattern system (885d2ff)
- api: remove deprecated tuple syntax for order_by() (57733e0)
- api: remove interpolation argument (7c242af)
- api: remove string as a parent type from expression API (2db98fb)
- array-apply: adjust array map and array filter representation for easier non-recursive compilation (b91ecf0)
- backends: adjust backends to work with new array representation (90befb2)
- bigquery: make literals less messy (8d8ad87)
- clickhouse: move
ClickhouseTable.insert
method to clickhouse backend and removeClickhouseTable
class (c9c72ae) - clickhouse: remove recursion from the compiler (ccbcdc0)
- clickhouse: use more sqlglot constructs (c7ca7cd)
- common: disallow None for Annotation.pattern in favor of using Any() (7434068)
- common: factor out base classes to
ibis.common.bases
fromibis.common.grounds
(01671d2) - common: ibis.common.patterns.match() should return with the matched value rather than the context (cbb9b2f)
- common: improve error messages raised during validation (f95613a)
- common: remove
ibis.collections.DotDict
(fedd4b1) - common: remove
Validator
mixin for better clarity (4697e7d) - common: remove ibis.common.parse since it is only used by the datatype parser (557414f)
- common: restrict implicit traversals to common builtin collections (8531347)
- common: turn annotations into slotted classes (0770e92)
- datatypes: use sqlglot for parsing backend specific types (fe7ba24)
- delete unexposed
ibis.api.category_label
function (24ac5e7) - examples: replace pooch with lighter weight pins (521669c)
- flink: reorder registry to match SQL one (93dad5b)
- flink: use built-in
DEGREES
,RADIANS
(33518e9) - formats: turn
TypeParser
into aTypeMapper
implementation for sqlglot (468bed1) - ir: construct
ArrayContains
instead ofContains
forvalue.isin(array_value)
(e826037) - ir: decompose
Contains
intoInValues
andInColumn
(fe9a289) - ir: glue patterns and rules together (c20ba7f)
- ir: remove deprecated ibis.expr.selectors module (d4161d7)
- ir: rename
.output_dtype
and.output_shape
to.dtype
and.shape
respectively (f9d5403) - ir: replace
Cumulative
operations by addingwhere
,group_by
andorder_by
kwargs to cumulative APIs (26ffc68) - ir: rewrite
ibis.expr.format
usingnode.map()
(94ee679) - ir: use @annotated decorator to coerce Selection.order_by and Aggregation.order_by arguments (8b841c1)
- mysql: use describe temporary table to retrieve ibis schema from query (a723637)
- rename
ops.Where
toops.IfElse
(a64b7ad) - replace deprecated classes of type hints (25946f9)
- snowflake: get query schema using describe of last query id (890d54a)
- snowflake: remove unnecessary schema setting (9b0e6c8)
- snowflake: replace custom temp table ddl for memtables with
read_parquet
(41df410) - snowflake: sort column names in the database instead of on the client (fb52814)
- tests: move
test_visualize.py
toibis/expr/tests
(46d74ee) - tests: reorganize
ibis.expr.decompile
andibis.expr.sql
test files to be under theibis.expr
subpackage (d0d006e) - tests: reorganize operation related tests from
ibis.tests.exprs
toibis.expr.operations.tests
(3cbe2f3) - tests: simplify pattern matching tests on
Value
operations (d87e65a) - traverse builtin collections for in deferrable (b5ee8f4)
- use deferrable to implement deferred case statements (5577d51)
Performance
- common: improve
Concrete
construction performance (2cb1a55) - duckdb: improve
to_pyarrow
performance (5970cfe) - duckdb: speed up metadata access to support the many-columns use case (2854143)
- duckdb: use
information_schema
instead ofdescribe select
(ef7f69f) - introduce quicker abstract base classes (47822c6)
- ops: early return if two nodes do not hash to the same value (b0b62cc)
- ops: store schema on relation ops to avoid large traversals (0b49c96)
- snowflake: speed up metadata accesses from the existing schema and database (f2ef129)
Deprecations
- api: deprecate
ibis.negate
in favor ofnegate
method (47cdbe8) - api: deprecate
ibis.where
in favor ofibis.ifelse
(995c1bc) - api: deprecate
Table.relabel
in favor ofTable.rename
(dcd9772) - api: deprecate top-level
ibis.geo_*
functions in favor of their corresponding methods (71b7106) - api: replace
nullifzero
withifnull
and zeroifnull withfillna
(ac85d11)
6.2.0 (2023-08-31)
Features
- trino: add source application to trino backend (cf5fdb9)
Bug Fixes
- bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
- bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
- release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
- trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
- trino: override incorrect base sqlalchemy
list_schemas
implementation (84d38a1)
Documentation
- trino: add connection docstring (507a00e)
6.1.0 (2023-08-03)
Features
- api: add
ibis.dtype
top-level API (867e5f1) - api: add
table.nunique()
for counting unique table rows (adcd762) - api: allow mixing literals and columns in
ibis.array
(3355dd8) - api: improve efficiency of
__dataframe__
protocol (15e27da) - api: support boolean literals in join API (c56376f)
- arrays: add
concat
method equivalent to__add__
/__radd__
(0ed0ab1) - arrays: add
repeat
method equivalent to__mul__
/__rmul__
(b457c7b) - backends: add
current_schema
API (955a9d0) - bigquery: fill out
CREATE TABLE
DDL options including support foroverwrite
(5dac7ec) - datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
- datafusion: add extract url fields functions (4f5ea98)
- datafusion: add functions sign, power, nullifzero, log (ef72e40)
- datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
- datafusion: implement in-memory table (d4ec5c2)
- flink: add tests and translation rules for additional operators (fc2aa5d)
- flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
- flink: implement translation rules for literal expressions in flink compiler (a8f4880)
- improved error messages when missing backend dependencies (2fe851b)
- make output of
to_sql
a properstr
subclass (084bdb9) - pandas: add ExtractURLField functions (e369333)
- polars: implement
ops.SelfReference
(983e393) - pyspark: read/write delta tables (d403187)
- refactor ddl for create_database and add create_schema where relevant (d7a857c)
- sqlite: add scalar python udf support to sqlite (92f29e6)
- sqlite: implement extract url field functions (cb1956f)
- trino: implement support for
.sql
table expression method (479bc60) - trino: support table properties when creating a table (b9d65ef)
Bug Fixes
- api: allow scalar window order keys (3d3f4f3)
- backends: make
current_database
implementation and API consistent across all backends (eeeeee0) - bigquery: respect the fully qualified table name at the init (a25f460)
- clickhouse: check dispatching instead of membership in the registry for
has_operation
(acb7f3f) - datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
- deps: update dependency datafusion to v27 (3a311cd)
- druid: handle conversion issues from string, binary, and timestamp (b632063)
- duckdb: avoid double escaping backslashes for bind parameters (8436f57)
- duckdb: cast read_only to string for connection (27e17d6)
- duckdb: deduplicate results from
list_schemas()
(172520e) - duckdb: ensure that current_database returns the correct value (2039b1e)
- duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
- duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
- duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
- duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
- examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
- exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
- forward arguments through
__dataframe__
protocol (50f3be9) - ir: change “it not a” to “is not a” in errors (d0d463f)
- memtable: implement support for translation of empty memtable (05b02da)
- mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
- mysql: pass-through kwargs to connect_args (e3f3e2d)
- ops: ensure that name attribute is always valid for
ops.SelfReference
(9068aca) - polars: ensure that
pivot_longer
works with more than one column (822c912) - polars: fix collect implementation (c1182be)
- postgres: by default use domain socket (e44fdfb)
- pyspark: make
has_operation
method a[@classmethod](https://github.com/classmethod)
(c1b7dbc) - release: use @google/semantic-release-replace-plugin@1.2.0 to avoid module loading bug (673aab3)
- snowflake: fix broken unnest functionality (207587c)
- snowflake: reset the schema and database to the original schema after creating them (54ce26a)
- snowflake: reset to original schema when resetting the database (32ff832)
- snowflake: use
regexp_instr != 0
instead ofREGEXP
keyword (06e2be4) - sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
- sql: handle parsing aliases (3645cf4)
- trino: handle all remaining common datatype parsing (b3778c7)
- trino: remove filter index warning in Trino dialect (a2ae7ae)
Documentation
- add conda/mamba install instructions for specific backends (c643fca)
- add docstrings to
DataType.is_*
methods (ed40fdb) - backend-matrix: add ability to select a specific subset of backends (f663066)
- backends: document memtable support and performance for each backend (b321733)
- blog: v6.0.0 release blog (21fc5da)
- document versioning policy (242ea15)
- dot-sql: add examples of mixing ibis expressions and SQL strings (5abd30e)
- dplyr: small fixes to the dplyr getting started guide (4b57f7f)
- expand docstring for
dtype
function (39b7a24) - fix functions names in examples of extract url fields (872445e)
- fix heading in 6.0.0 blog (0ad3ce2)
- oracle: add note about old password checks in oracle (470b90b)
- postgres: fix postgres memtable docs (7423eb9)
- release-notes: fix typo (a319e3a)
- social: add social media preview cards (e98a0a6)
- update imports/exports for pyspark backend (16d73c4)
Refactors
- pyarrow: remove unnecessary calls to combine_chunks (c026d2d)
- pyarrow: use
schema.empty_table()
instead of manually constructing empty tables (c099302) - result-handling: remove
result_handler
in favor of expression specific methods (3dc7143) - snowflake: enable multiple statements and clean up duplicated parameter setting code (75824a6)
- tests: clean up backend test setup to make non-data-loading steps atomic (16b4632)
6.0.0 (2023-07-05)
⚠ BREAKING CHANGES
imports: Use of
ibis.udf
as a module is removed. Useibis.legacy.udf
instead.The minimum supported Python version is now Python 3.9
api:
group_by().count()
no longer automatically names the count aggregationcount
. Userelabel
to rename columns.backends:
Backend.ast_schema
is removed. Useexpr.as_table().schema()
instead.snowflake/postgres: Postgres UDFs now use the new
@udf.scalar.python
API. This should be a low-effort replacement for the existing API.ir:
ops.NullLiteral
is removeddatatypes:
dt.Interval
has no longer a default unit,dt.interval
is removeddeps:
snowflake-connector-python
’s lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulnerability. Please upgradesnowflake-connector-python
to at least version 3.0.2.api:
Table.difference()
,Table.intersection()
, andTable.union()
now require at least one argument.postgres: Ibis no longer automatically defines
first
/last
reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of thepgxn
implementations instead.api:
ibis.examples.<example-name>.fetch
no longer forwards arbitrary keyword arguments toread_csv
/read_parquet
.datatypes:
dt.Interval.value_type
attribute is removedapi:
Table.count()
is no longer automatically named"count"
. UseTable.count().name("count")
to achieve the previous behavior.trino: The trino backend now requires at least version 0.321 of the
trino
Python package.backends: removed
AlchemyTable
,AlchemyDatabase
,DaskTable
,DaskDatabase
,PandasTable
,PandasDatabase
,PySparkDatabaseTable
, useops.DatabaseTable
insteaddtypes: temporal unit enums are now available under
ibis.common.temporal
instead ofibis.common.enums
.clickhouse:
external_tables
can no longer be passed inibis.clickhouse.connect
. Passexternal_tables
directly inraw_sql
/execute
/to_pyarrow
/to_pyarrow_batches()
.datatypes:
dt.Set
is now an alias fordt.Array
bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it’s not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.
impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use
raw_sql
if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they’re operating on.api:
Column.first()
/Column.last()
are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function inselect
-based APIs should function unchanged.bigquery: when using the bigquery backend, casting float to int will no longer round floats to the nearest integer
ops.Hash: The
hash
method on table columns on longer accepts thehow
argument. The hashing functions available are highly backend-dependent and the intention of the hash operation is to provide a fast, consistent (on the same backend, only) integer value. If you have been passing in a value forhow
, you can remove it and you will get the same results as before, as there were no backends with multiple hash functions working.duckdb: Some CSV files may now have headers that did not have them previously. Set
header=False
to get the previous behavior.deps: New environments will have a different default setting for
compression
in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn’t include them by default. Installclickhouse-cityhash
andlz4
to preserve the previous behavior.api:
Table.set_column()
is removed; useTable.mutate(name=expr)
insteadapi: the
suffixes
argument in all join methods has been removed in favor oflname
/rname
args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass inlname="{name}_x", rname="{name}_y"
.ir:
IntervalType.unit
is now an enum instead of a stringtype-system: Inferred types of Python objects may be slightly different. Ibis now use
pyarrow
to infer the column types of pandas DataFrame and other types.backends:
path
argument ofBackend.connect()
is removed, use thedatabase
argument insteadapi: removed
Table.sort_by()
andTable.groupby()
, use.order_by()
and.group_by()
respectivelydatatypes:
DataType.scalar
andcolumn
class attributes are now strings.backends:
Backend.load_data()
,Backend.exists_database()
andBackend.exists_table()
are removedir:
Value.summary()
andNumericValue.summary()
are removedschema:
Schema.merge()
is removed, use the union operatorschema1 | schema2
insteadapi:
ibis.sequence()
is removeddrop support for Python 3.8 (747f4ca)
Features
- add dask windowing (9cb920a)
- add easy type hints to GroupBy (da330b1)
- add microsecond method to TimestampValue and TimeValue (e9df2da)
- api: add
__dataframe__
implementation (b3d9619) - api: add ALL_CAPS option to Table.relabel (c0b30e2)
- api: add first/last reduction APIs (8c01980)
- api: add zip operation and api (fecf695)
- api: allow passing multiple keyword arguments to
ibis.interval
(22ee854) - api: better repr and pickle support for deferred expressions (2b1ec9c)
- api: exact median (c53031c)
- api: raise better error on column name collision in joins (e04c38c)
- api: replace
suffixes
injoin
withlname
/rname
(3caf3a1) - api: support abstract type names in
selectors.of_type
(f6d2d56) - api: support list of strings and single strings in the
across
selector (a6b60e7) - api: use
create_table
to load example data (42e09a4) - bigquery: add client and storage_client params to connect (4cf1354)
- bigquery: enable group_concat over windows (d6a1117)
- cast: add table-level try_cast (5e4d16b)
- clickhouse: add array zip impl (efba835)
- clickhouse: move to clickhouse supported Python client (012557a)
- clickhouse: set default engine to native file (29815fa)
- clickhouse: support pyarrow decimal types (7472dd5)
- common: add a pure python egraph implementation (aed2ed0)
- common: add pattern matchers (b515d5c)
- common: add support for start parameter in StringFind (31ce741)
- common: add Topmost and Innermost pattern matchers (90b48fc)
- common: implement copy protocol for Immutable base class (e61c66b)
- create_table: support pyarrow Table in table creation (9dbb25c)
- datafusion: add string functions (66c0afb)
- datafusion: add support for scalar pyarrow UDFs (45935b7)
- datafusion: minimal decimal support (c550780)
- datafusion: register tables and datasets in datafusion (cb2cc58)
- datatypes: add support for decimal values with arrow-based APIs (b4ba6b9)
- datatypes: support creating Timestamp from units (66f2ff0)
- deps: load examples lazily (4ea0ddb)
- duckdb: add attach_sqlite method (bd32649)
- duckdb: add support for native and pyarrow UDFs (7e56fc4)
- duckdb: expand map support to
.values()
and map concatenation (ad49a09) - duckdb: set
header=True
by default (e4b515d) - duckdb: support 0.8.0 (ae9ae7d)
- duckdb: support array zip operation (2d14ccc)
- duckdb: support motherduck (053dc7e)
- duckdb: warn when querying an already consumed RecordBatchReader (5a013ff)
- flink: add initial flink SQL compiler (053a6d2)
- formats: support timestamps in delta output; default to micros for pyarrow conversion (d8d5710)
- implement read_delta and to_delta for some backends (74fc863)
- implement read_delta for datafusion (eb4602f)
- implement try_cast for a few backends (f488f0e)
- io: add
to_torch
API (685c8fc) - io: add az/gs prefixes to normalize_filename in utils (e9eebba)
- mysql: add re_extract (5ed40e1)
- oracle: add oracle backend (c9b038b)
- oracle: support temporary tables (6e64cd0)
- pandas: add approx_median (6714b9f)
- pandas: support passing memtables to
create_table
(3ea9a21) - polars: add any and all reductions (0bd3c01)
- polars: add argmin and argmax (78562d3)
- polars: add correlation operation (05ff488)
- polars: add polars support for
identical_to
(aab3bae) - polars: add support for
offset
, binary literals, anddropna(how='all')
(d2298e9) - polars: allow seamless connection for DataFrame as well as LazyFrame (a2a3e45)
- polars: implement
.sql
methods (86f2a34) - polars: lower-latency column return for non-temporal results (b009563)
- polars: support pyarrow decimal types (7e6c365)
- polars: support SQL dialect translation (c87f695)
- polars: support table registration from multiple parquet files (9c0a8be)
- postgres: add ApproxMedian aggregation (887f572)
- pyspark: add zip array impl (6c00cbc)
- snowflake/postgres: scalar UDFs (dbf5b62)
- snowflake: implement array zip (839e1f0)
- snowflake: implement proper approx median (b15a6fe)
- snowflake: support SSO and other forms of passwordless authentication (23ac53d)
- snowflake: use the client python version as the UDF runtime where possible (69a9101)
- sql: allow any SQL dialect accepted by sqlgllot in
Table.sql
andBackend.sql
(f38c447) - sqlite: add argmin and argmax functions (c8af9d4)
- sqlite: add arithmetic mode aggregation (6fcac44)
- sqlite: add ops.DateSub, ops.DateAdd, ops.DateDiff (cfd65a0)
- streamlit: add support for streamlit connection interface (05c9449)
- trino: implement zip (cd11daa)
Bug Fixes
- add issue write permission to assign.yml (9445cee)
- alchemy: close the cursor on error during dataframe construction (cc7dffb)
- backends: fix capitalize to lowercase subsequent characters (49978f9)
- backends: fix notall/notany translation (56b56b3)
- bigquery: add srid=4326 to the geography dtype mapping (57a825b)
- bigquery: allow passing both schema and obj in create_table (49cc2c4)
- bigquery: bigquery timestamp and datetime dtypes (067e8a5)
- bigquery: ensure that bigquery temporal ops work with the new timeunit/dateunit/intervalunit enums (0e00d86)
- bigquery: ensure that generated names are used when compiling columns and allow flexible column names (c7044fe)
- bigquery: fix table naming from
count
rename removal refactor (5b009d2) - bigquery: raise OperationNotDefinedError for IntervalAdd and IntervalSubtract (501aaf7)
- bigquery: support capture group functionality (3f4f05b)
- bigquery: truncate when casting float to int (267d8e1)
- ci: use mariadb-admin instead of mysqladmin in mariadb 11.x (d4ccd3d)
- clickhouse: avoid generating names for structs (5d11f48)
- clickhouse: clean up external tables per query to avoid leaking them across queries (6d32edd)
- clickhouse: close cursors more aggressively (478a40f)
- clickhouse: use correct functions for milli and micro extraction (49b3136)
- clickhouse: use named rather than positional group by (1f7e309)
- clickhouse: use the correct dialect to generate subquery string for Contains operation (f656bd5)
- common: fix bug in re_extract (6ebaeab), closes #6167
- core: interval resolution should upcast to smallest unit (f7f844d), closes #6139
- datafusion: fix incorrect order of predicate -> select compilation (0092304)
- deps: make pyarrow a required dependency (b217cde)
- deps: prevent vulnerable snowflake-connector-python versions (6dedb45)
- deps: support multipledispatch version 1 (805a7d7)
- deps: update dependency atpublic to v4 (3a44755)
- deps: update dependency datafusion to v22 (15d8d11)
- deps: update dependency datafusion to v23 (e4d666d)
- deps: update dependency datafusion to v24 (c158b78)
- deps: update dependency datafusion to v25 (c3a6264)
- deps: update dependency datafusion to v26 (7e84ffe)
- deps: update dependency deltalake to >=0.9.0,<0.11.0 (9817a83)
- deps: update dependency pyarrow to v12 (3cbc239)
- deps: update dependency sqlglot to v12 (5504bd4)
- deps: update dependency sqlglot to v13 (1485dd0)
- deps: update dependency sqlglot to v14 (9c40c06)
- deps: update dependency sqlglot to v15 (f149729)
- deps: update dependency sqlglot to v16 (46601ef)
- deps: update dependency sqlglot to v17 (9b50fb4)
- docs: fix failing doctests (04b9f19)
- docs: typo in code without selectors (b236893)
- docs: typo in docstrings and comments (0d3ed86)
- docs: typo in snowflake do_connect kwargs (671bc31)
- duckdb: better types for null literals (7b9d85e)
- duckdb: disable map values and map merge for columns (b5472b3)
- duckdb: ensure
to_timestamp
returns a UTC timestamp (0ce0b9f) - duckdb: ensure connection lifetime is greater than or equal to record batch reader lifetime (6ed353e)
- duckdb: ensure that quoted struct field names work (47de1c3)
- duckdb: ensure that types are inferred correctly across
duckdb_engine
versions (9c3d173) - duckdb: fix check for literal maps (b2b229b)
- duckdb: fix exporting pyarrow record batches by bumping duckdb to 0.8.1 (aca52ab)
- duckdb: fix read_csv problem with kwargs (6f71735), closes #6190
- examples: move lockfile creation to data directory (b8f6e6b)
- examples: use filelock to prevent pooch from clobbering files when fetching concurrently (e14662e)
- expr: fix graphviz rendering (6d4a34f)
- impala: do not cast
ca_cert
None
value to string (bfdfb0e) - impala: expose
hdfs_connect
function asibis.impala.hdfs_connect
(27a0d12) - impala: more aggressively clean up cursors internally (bf5687e)
- impala: replace
time_mapping
withTIME_MAPPING
and backwards compatible check (4c3ca20) - ir: force an alias if projecting or aggregating columns (9fb1e88)
- ir: raise Exception for group by with no keys (845f7ab), closes #6237
- mssql: dont yield from inside a cursor (4af0731)
- mysql: do not fail when we cannot set the session timezone (930f8ab)
- mysql: ensure enum string functions are coerced to the correct type (e499c7f)
- mysql: ensure that floats and double do not come back as Python Decimal objects (a3c329f)
- mysql: fix binary literals (e081252)
- mysql: handle the zero timestamp value (9ac86fd)
- operations: ensure that self refs have a distinct name from the table they are referencing (bd8eb88)
- oracle: disable autoload when cleaning up temp tables (b824142)
- oracle: disable statement cache (41d3857)
- oracle: disable temp tables to get inserts working (f9985fe)
- pandas, dask: allow overlapping non-predicate columns in asof join (09e26a0)
- pandas: fix first and last over windows (9079bc4), closes #5417
- pandas: fix string translate function (12b9569), closes #6157
- pandas: grouped aggregation using a case statement (d4ac345)
- pandas: preserve RHS values in asof join when column names collide (4514668)
- pandas: solve problem with first and last window function (dfdede5), closes #4918
- polars: avoid
implode
deprecation warning (ce3bdad) - polars: ensure that
to_pyarrow
is called from the backend (41bacf2) - polars: make list column operations backwards compatible (35fc5f7)
- postgres: ensure that
alias
method overwrites view even if types are different (7d5845b) - postgres: ensure that backend still works when create/drop first/last aggregates fails (eb5d534)
- pyspark: enable joining on columns with different names as well as complex predicates (dcee821)
- snowflake: always use pyarrow for memtables (da34d6f)
- snowflake: ensure connection lifetime is greater than or equal to record batch reader lifetime (34a0c59)
- snowflake: ensure that
_pandas_converter
attribute is resolved correctly (9058bbe) - snowflake: ensure that temp tables are only created once (43b8152)
- snowflake: ensure unnest works for nested struct/object types (fc6ffc2)
- snowflake: ensure use of the right timezone value (40426bf)
- snowflake: fix
tmpdir
construction for python <3.10 (a507ae2) - snowflake: fix incorrect arguments to snowflake regexp_substr (9261f70)
- snowflake: fix invalid attribute access when using pyarrow (bfd90a8)
- snowflake: handle broken upstream behavior when a table can’t be found (31a8366)
- snowflake: resolve import error from interval datatype refactor (3092012)
- snowflake: use
convert_timezone
for timezone conversion instead of invalid postgresAT TIME ZONE
syntax (1595e7b) - sqlalchemy: ensure that backends don’t clobber tables needed by inputs (76e38a3)
- sqlalchemy: ensure that union_all-generated memtables use the correct column names (a4f546b)
- sqlalchemy: prepend the table’s schema when querying metadata (d8818e2)
- sqlalchemy: quote struct field names (f5c91fc)
- tests: ensure that record batch readers are cleaned up (d230a8d)
- trino: bump lower bound to avoid having to handle
experimental_python_types
(bf6eeab) - trino: ensure that nested array types are inferred correctly (030f76d)
- trino: fix incorrect
version
computation (04d3a89) - trino: support trino 0.323 special tuple type for struct results (ea1529d)
- type-system: infer in-memory object types using pyarrow (f7018ee)
- typehint: update type hint for class instance (2e1e14f)
Documentation
- across: add documentation for across (b8941d3)
- add allowed input for memtable constructor (69cdee5)
- add disclaimer on no row order guarantees (75dd8b0)
- add examples to
if_any
andif_all
(5015677) - add platform comment in conda env creation (e38eacb)
- add read_delta and related to backends docs (90eaed2)
- api: ensure all top-level items have a description (c83d783)
- api: hide dunder methods in API docs (6724b7b)
- api: manually add inherited mixin methods to timey classes (7dbc96d)
- api: show source for classes to allow dunder method inspection (4cef0f8)
- backends: fix typo in pip install command (6a7207c)
- bigquery: add connection explainer to bigquery backend docs (84caa5b)
- blog: add Ibis + PyTorch + DuckDB blog post (1ad946c)
- change plural variable name cols to col (c33a3ed), closes #6115
- clarify map refers to Python Mapping container (f050a61)
- css: enable code block copy button, don’t select prompt (3510abe)
- de-template remaining backends (except pandas, dask, impala) (82b7408)
- describe NULL differences with pandas (688b293)
- dev-env: remove python 3.8 from environment support matrix (4f89565)
- drop
docker-compose
install for conda dev env setup (e19924d) - duckdb: add quick explainer on connecting to motherduck (4ef710e)
- file support: add badge and docstrings for
read_*
methods (0767b7c) - fill out more docstrings (dc0289c)
- fix errors and add ‘table’ before ‘expression’ (096b568)
- fix some redirects (3a23c1f)
- fix typo in Table.relabel return description (05cc51e)
- generic: add docstring examples in types/generic (1d87292)
- guides: add brief installation instructions at top of notebooks (dc3e694)
- guides: update ibis-for-dplyr-users.ipynb with latest (1aa172e), closes #6125
- improve docstrings for BooleanValue and BoleanColumn (30c1009)
- improve docstrings to map types (72a49b0)
- install: add quotes to all bracketed installs for shell compatibility (bb5c075)
- intersphinx: add mapping to autolink pyarrow and pandas refs (cd92019)
- intro: create Ibis for dplyr users document (e02a6f2)
- introguides: use DuckDB for intro pandas notebook, remove iris (a7e845a)
- link to Ibis for dplyr users (6e7c6a2)
- make pandas.md filename lowercase (4937d45)
- more group_by() and NULL in pandas guide (486b696)
- more spelling fixes (564abbe)
- move API docs to top-level (dcc409f)
- numeric: add examples to numeric methods (39b470f)
- oracle: add basic backend documentation (c871790)
- oracle: add oracle to matrix (89aecf2)
- python-versions: document how we decide to drop support for Python versions (3474dbc)
- redirect Pandas to pandas (4074284)
- remove trailing whitespace (63db643)
- reorder sections in pandas guide (3b66093)
- restructure and consistency (351d424)
- snowflake: add connection explainer to snowflake backend docs (a62bbcd)
- streamlit: fix ibis-framework install (a8cf773)
- update copyright and some minor edits (b9aed44)
- update notany/notall docstrings with arg (a5ec986), closes #5993
- update structs and fix constructor docstrings (493437a)
- use lowercase pandas (19b5d10)
- use to_pandas instead of execute (882949e)
Refactors
- alchemy: abstract out custom type mapping and fix sqlite (d712e2e)
- api: consolidate
ibis.date()
,ibis.time()
andibis.timestamp()
functions (20f71bf) - api: enforce at least one argument for
Table
set operations (57e948f) - api: remove automatic
count
name from relations (2cb19ec) - api: remove automatic group by count naming (15d9e50)
- api: remove deprecated
ibis.sequence()
function (de0bf69) - api: remove deprecated
Table.set_column()
method (aa5ed94) - api: remove deprecated
Table.sort_by()
andTable.groupby()
methods (1316635) - backends: remove
ast_schema
method (51b5ef8) - backends: remove backend specific
DatabaseTable
operations (d1bab97) - backends: remove deprecated
Backend.load_data()
,.exists_database()
and.exists_table()
methods (755555f) - backends: remove deprecated
path
argument ofBackend.connect()
(6737ea8) - bigquery: align datatype conversions with the new convention (70b8232)
- bigquery: support a broader range of interval units in temporal binary operations (f78ce73)
- common: add sanity checks for creating ENodes and Patterns (fc89cc3)
- common: cleanup unit conversions (73de24e)
- common: disallow unit conversions between days and hours (5619ce0)
- common: move
ibis.collections.DisjointSet
toibis.common.egraph
(07dde21) - common: move tests for re_extract to general suite (acd1774)
- common: use an enum as a sentinel value instead of NoMatch class (6674353), closes #6049
- dask/pandas: align datatype conversions with the new convention (cecc24c)
- datatypes: make pandas conversion backend specific if needed (544d27c)
- datatypes: normalize interval values to integers (80a40ab)
- datatypes: remove
Set()
in favor ofArray()
datatype (30a4f7e) - datatypes: remove
value_type
parametrization of the Interval datatype (463cdc3) - datatypes: remove direct
ir
dependency fromdatatypes
(d7f0be0) - datatypes: use typehints instead of rules (704542e)
- deps: remove optional dependency on clickhouse-cityhash and lz4 (736fe26)
- dtypes: add
normalize_datetime()
andnormalize_timezone()
common utilities (c00ab38) - dtypes: turn dt.dtype() into lazily dispatched factory function (5261003)
- formats: consolidate the dataframe conversion logic (53ed88e)
- formats: encapsulate conversions to TypeMapper, SchemaMapper and DataMapper subclasses (ab35311)
- formats: introduce a standalone subpackage to deal with common in-memory formats (e8f45f5)
- impala: rely on impyla cursor for _wait_synchronous (a1b8736)
- imports: move old UDF implementation to ibis.legacy module (cf93d5d)
- ir: encapsulate temporal unit handling in enums (1b8fa7b)
- ir: remove
rlz.column_from
,rlz.base_table_of
andrlz.function_of
rules (ed71d51) - ir: remove deprecated
Value.summary()
andNumericValue.summary()
expression methods (6cd8050) - ir: remove redundant
ops.NullLiteral()
operation (a881703) - ir: simplify
Expr._find_backends()
implementation by using theibis.common.graph
utilities (91ff8d4) - ir: use
dt.normalize()
to construct literals (bf72f16) - ops.Hash: remove
how
from backend-specific hash operation (46a55fc) - pandas: solve and remove stale TODOs (92d979e)
- polars: align datatype conversion functions with the new convention (5d61159)
- postgres: fail at execute time for UDFs to avoid db connections in
.compile()
(e3a4d4d) - pyspark: align datatype conversion functions with the new convention (3437bb6)
- pyspark: remove useless window branching in compiler (ad08da4)
- replace custom
_merge
usingpd.merge
(fe74f76) - schema: remove deprecated
Schema.merge()
method (d307722) - schema: use type annotations instead of rules (98cd539)
- snowflake: add flags to supplemental JavaScript UDFs (054add4)
- sql: align datatype conversions with the new convention (0ef145b)
- sqlite: remove roundtripping for DayOfWeekIndex and DayOfWeekName (b5a2bc5)
- test: cleanup test data (7ae2b24)
- to-pyarrow-batches: ensure that batch readers are always closed and exhausted (35a391f)
- trino: always clean up prepared statements created when accessing query metadata (4f3a4cd)
- util: use base32 to compress uuid table names (ba039a3)
Performance
Deprecations
- api: deprecate tuple syntax for order by keys (5ed5110)
5.1.0 (2023-04-11)
Features
- api: expand
distinct
API for dropping duplicates based on column subsets (3720ea5) - api: implement pyarrow memtables (9d4fbbd)
- api: support passing a format string to
Table.relabel
(0583959) - api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
- backends: add more array functions (5208801)
- bigquery: make
to_pyarrow_batches()
smarter (42f5987) - bigquery: support bignumeric type (d7c0f49)
- default repr to showing all columns in Jupyter notebooks (91a0811)
- druid: add re_search support (946202b)
- duckdb: add map operations (a4c4e77)
- duckdb: support sqlalchemy 2 (679bb52)
- mssql: implement ops.StandardDev, ops.Variance (e322f1d)
- pandas: support memtable in pandas backend (6e4d621), closes #5467
- polars: implement count distinct (aea4ccd)
- postgres: implement
ops.Arbitrary
(ee8dbab) - pyspark:
pivot_longer
(f600c90) - pyspark: add ArrayFilter operation (2b1301e)
- pyspark: add ArrayMap operation (e2c159c)
- pyspark: add DateDiff operation (bfd6109)
- pyspark: add partial support for interval types (067120d)
- pyspark: add read_csv, read_parquet, and register (7bd22af)
- pyspark: implement count distinct (db29e10)
- pyspark: support basic caching (ab0df7a)
- snowflake: add optional ‘connect_args’ param (8bf2043)
- snowflake: native pyarrow support (ce3d6a4)
- sqlalchemy: support unknown types (fde79fa)
- sqlite: implement
ops.Arbitrary
(9bcdf77) - sql: use temp views where possible (5b9d8c0)
- table: implement
pivot_wider
API (60e7731) - ux: move
ibis.expr.selectors
toibis.selectors
and deprecate for removal in 6.0 (0ae639d)
Bug Fixes
- api: disambiguate attribute errors from a missing
resolve
method (e12c4df) - api: support filter on literal followed by aggregate (68d65c8)
- clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
- clickhouse: ensure that clickhouse depends on sqlalchemy for
make_url
usage (ea10a27) - clickhouse: ensure that truncate works (1639914)
- clickhouse: fix
create_table
implementation (5a54489) - clickhouse: workaround sqlglot issue with calling
match
(762f4d6) - deps: support pandas 2.0 (4f1d9fe)
- duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
- duckdb: disable the progress bar by default (1a1892c)
- duckdb: drop use of experimental parallel csv reader (47d8b92)
- duckdb: generate
SIMILAR TO
instead of tilde to workaround sqlglot issue (434da27) - improve typing signature of .dropna() (e11de3f)
- mssql: improve aggregation on expressions (58aa78d)
- mssql: remove invalid aggregations (1ce3ef9)
- polars: backwards compatibility for the
time_zone
andtime_unit
properties (3a2c4df) - postgres: allow inference of unknown types (343fb37)
- pyspark: fail when aggregation contains a
having
filter (bd81a9f) - pyspark: raise proper error when trying to generate sql (51afc13)
- snowflake: fix new array operations; remove
ArrayRemove
operation (772668b) - snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
- snowflake: make sure pyarrow is used when possible (01f5154)
- sql: ensure that set operations resolve to a single relation (3a02965)
- sql: generate consistent
pivot_longer
semantics in the presence of multipleunnest
s (6bc301a) - sqlglot: work with newer versions (6f7302d)
- trino,duckdb,postgres: make cumulative
notany
/notall
aggregations work (c2e985f) - trino: only support
how='first'
witharbitrary
reduction (315b5e7) - ux: use guaranteed length-1 characters for
NULL
values (8618789)
Refactors
- api: remove explicit use of
.projection
in favor of the shorter.select
(73df8df) - cache: factor out ref counted cache (c816f00)
- duckdb: simplify
to_pyarrow_batches
implementation (d6235ee) - duckdb: source loaded and installed extensions from duckdb (fb06262)
- duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
- generate uuid-based names for temp tables (a1164df)
- memtable: clean up dispatch code (9a19302)
- memtable: dedup table proxy code (3bccec0)
- sqlalchemy: remove unused
_meta
instance attributes (523e198)
Deprecations
- api: deprecate
Table.set_column
in favor ofTable.mutate
(954a6b7)
Documentation
- add a getting started guide (8fd03ce)
- add warning about comparisons to
None
(5cf186a) - blog: add campaign finance blog post (383c708)
- blog: add campaign finance to
SUMMARY.md
(0bdd093) - clean up agg argument descriptions and add join examples (93d3059)
- comparison: add a “why ibis” page (011cc19)
- move conda before nix in dev setup instructions (6b2cbaa)
- nth: improve docstring for nth() (fb7b34b)
- patch docs build to fix anchor links (51be459)
- penguins: add citation for palmer penguins data (679848d)
- penguins: change to flipper (eec3706)
- refresh environment setup pages (b609571)
- selectors: make doctests more complete and actually run them (c8f2964)
- style and review fixes in getting started guide (3b0f8db)
5.0.0 (2023-03-15)
⚠ BREAKING CHANGES
- api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
- backend: Backends now raise
ibis.common.exceptions.UnsupportedOperationError
in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends. - ux:
Table.info
now returns an expression - ux: Passing a sequence of column names to
Table.drop
is removed. Replacedrop(cols)
withdrop(*cols)
. - The
spark
plugin alias is removed. Usepyspark
instead - ir: removed
ibis.expr.scope
andibis.expr.timecontext
modules, access them underibis.backends.base.df.<module>
- some methods have been removed from the top-level
ibis.<backend>
namespaces, access them on a connected backend instance instead. - common: removed
ibis.common.geospatial
, import the functions fromibis.backends.base.sql.registry.geospatial
- datatypes:
JSON
is no longer a subtype ofString
- datatype:
Category
,CategoryValue
/Column
/Scalar
are removed. Use string types instead. - ux: The
metric_name
argument tovalue_counts
is removed. UseTable.relabel
to change the metric column’s name. - deps: the minimum version of
parsy
is now 2.0 - ir/backends: removed the following symbols:
ibis.backends.duckdb.parse_type()
functionibis.backends.impala.Backend.set_database()
methodibis.backends.pyspark.Backend.set_database()
methodibis.backends.impala.ImpalaConnection.ping()
methodibis.expr.operations.DatabaseTable.change_name()
methodibis.expr.operations.ParseURL
classibis.expr.operations.Value.to_projection()
methodibis.expr.types.Table.get_column()
methodibis.expr.types.Table.get_columns()
methodibis.expr.types.StringValue.parse_url()
method- schema:
Schema.from_dict()
,.delete()
and.append()
methods are removed - datatype:
struct_type.pairs
is removed, usestruct_type.fields
instead - datatype:
Struct(names, types)
is not supported anymore, pass a dictionary toStruct
constructor instead
Features
- add
max_columns
option for table repr (a3aa236) - add examples API (b62356e)
- api: add
map
/array
accessors for easy conversion of JSON to stronger-typed values (d1e9d11) - api: add array to string join operation (74de349)
- api: add builtin support for relabeling columns to snake case (1157273)
- api: add support for passing a mapping to
ibis.map
(d365fd4) - api: allow single argument set operations (bb0a6f0)
- api: implement
to_pandas()
API for ecosystem compatibility (cad316c) - api: implement isin (ac31db2)
- api: make
cache
evaluate only once per session per expression (5a8ffe9) - api: make create_table uniform (833c698)
- api: more selectors (5844304)
- api: upcast pandas DataFrames to memtables in
rlz.table
rule (8dcfb8d) - backends: implement
ops.Time
for sqlalchemy backends (713cd33) - bigquery: add
BIGNUMERIC
type support (5c98ea4) - bigquery: add UUID literal support (ac47c62)
- bigquery: enable subqueries in select statements (ef4dc86)
- bigquery: implement create and drop table method (5f3c22c)
- bigquery: implement create_view and drop_view method (a586473)
- bigquery: support creating tables from in-memory tables (c3a25f1)
- bigquery: support in-memory tables (37e3279)
- change Rich repr of dtypes from blue to dim (008311f)
- clickhouse: implement
ArrayFilter
translation (f2144b6) - clickhouse: implement
ops.ArrayMap
(45000e7) - clickhouse: implement
ops.MapLength
(fc82eaa) - clickhouse: implement ops.Capitalize (914c64c)
- clickhouse: implement ops.ExtractMillisecond (ee74e3a)
- clickhouse: implement ops.RandomScalar (104aeed)
- clickhouse: implement ops.StringAscii (a507d17)
- clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
- clickhouse: improve error message for invalid types in literal (e4d7799)
- clickhouse: support asof_join (7ed5143)
- common: add abstract mapping collection with support for set operations (7d4aa0f)
- common: add support for variadic positional and variadic keyword annotations (baea1fa)
- common: hold typehint in the annotation objects (b3601c6)
- common: support
Callable
arguments and return types inValidator.from_annotable()
(ae57c36) - common: support positional only and keyword only arguments in annotations (340dca1)
- dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
- datafusion: implement ops.Degrees, ops.Radians (7e61391)
- datafusion: implement ops.Exp (7cb3ade)
- datafusion: implement ops.Pi, ops.E (5a74cb4)
- datafusion: implement ops.RandomScalar (5d1cd0f)
- datafusion: implement ops.StartsWith (8099014)
- datafusion: implement ops.StringAscii (b1d7672)
- datafusion: implement ops.StrRight (016a082)
- datafusion: implement ops.Translate (2fe3fc4)
- datafusion: support substr without end (a19fd87)
- datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
- datatype: enable inference of
Decimal
type (8761732) - datatype: implement
Mapping
abstract base class forStructType
(5df2022) - deps: add Python 3.11 support and tests (6f3f759)
- druid: add Apache Druid backend (c4cc2a6)
- druid: implement bitwise operations (3ac7447)
- druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
- druid: implement ops.Sign (35f52cc)
- druid: implement ops.StringJoin (42cd9a3)
- duckdb: add support for reading tables from sqlite databases (9ba2211)
- duckdb: add UUID type support (5cd6d76)
- duckdb: implement
ArrayFilter
translation (5f35d5c) - duckdb: implement
ops.ArrayMap
(063602d) - duckdb: implement create_view and drop_view method (4f73953)
- duckdb: implement ops.Capitalize (b17116e)
- duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
- duckdb: implement uuid result type (3150333)
- duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
- duckdb: use
read_json_auto
when reading json (4193867) - examples: add imdb dataset examples (3d63203)
- examples: add movielens small dataset (5f7c15c)
- examples: add wowah_data data to examples (bf9a7cc)
- examples: enable progressbar and faster hashing (4adfe29)
- impala: implement ops.Clip (279fd78)
- impala: implement ops.Radians, ops.Degrees (a794ace)
- impala: implement ops.RandomScalar (874f2ff)
- io: add to_parquet, to_csv to backends (fecca42)
- ir: add
ArrayFilter
operation (e719d60) - ir: add
ArrayMap
operation (49e5f7a) - mysql: support in-memory tables (4dfabbd)
- pandas/dask: implement bitwise operations (4994add)
- pandas/dask: implement ops.Pi, ops.E (091be3c)
- pandas: add basic unnest support (dd36b9d)
- pandas: implement ops.StartsWith, ops.EndsWith (2725423)
- pandas: support more pandas extension dtypes (54818ef)
- polars: implement
ops.Union
(17c6011) - polars: implement ops.Pi, ops.E (6d8fc4a)
- postgres: allow connecting with an explicit
schema
(39c9ea8) - postgres: fix interval literal (c0fa933)
- postgres: implement
argmin
/argmax
(82668ec) - postgres: parse tsvector columns as strings (fac8c47), closes #5402
- pyspark: add support for
ops.ArgMin
andops.ArgMax
(a3fa57c) - pyspark: implement ops.Between (ed83465)
- return Table from create_table(), create_view() (e4ea597)
- schema: implement
Mapping
abstract base class forSchema
(167d85a) - selectors: support ranges (e10caf4)
- snowflake: add support for alias in snowflake (b1b947a)
- snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
- snowflake: add UUID literal support (436c781)
- snowflake: implement argmin/argmax (8b998a5)
- snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
- snowflake: implement ops.GroupConcat (2219866)
- snowflake: implement remaining map functions (c48c9a6)
- snowflake: support binary variance reduction with filters (eeabdee)
- snowflake: support cross-database table access (79cb445)
- sqlalchemy: generalize unnest to work on backends that don’t support it (5943ce7)
- sqlite: add sqlite type support (addd6a9)
- sqlite: support in-memory tables (1b24848)
- sql: support for creating temporary tables in sql based backends (466cf35)
- tables: cast table using schema (96ce109)
- tables: implement
pivot_longer
API (11c5736) - trino: enable
MapLength
operation (a7ad1db) - trino: implement
ArrayFilter
translation (50f6fcc) - trino: implement
ops.ArrayMap
(657bf61) - trino: implement
ops.Between
(d70b9c0) - trino: support sqlalchemy 2 (0d078c1)
- ux: accept selectors in
Table.drop
(325140f) - ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
- ux: easy interactive setup (6850146)
- ux: expose
between
,rows
andrange
keyword arguments invalue.over()
(5763063)
Bug Fixes
- analysis: extract
Limit
subqueries (62f6e14) - api: add a
name
attribute to backend proxy modules (d6d8e7e) - api: fix broken
__radd__
array concat operation (121d9a0) - api: only include valid python identifiers in struct tab completion (8f33775)
- api: only include valid python identifiers in table tab completion (031a48c)
- backend: provide useful error if default backend is unavailable (1dbc682)
- backends: fix capitalize implementations across all backends (d4f0275)
- backends: fix null literal handling (7f46342)
- bigquery: ensure that memtables are translated correctly (d6e56c5)
- bigquery: fix decimal literals (4a04c9b)
- bigquery: regenerate negative string index sql snapshots (3f02c73)
- bigquery: regenerate sql for predicate pushdown fix (509806f)
- cache: remove bogus schema argument and validate database argument type (c4254f6)
- ci: fix invalid test id (f70de1d)
- clickhouse: fix decimal literal (4dcd2cb)
- clickhouse: fix set ops with table operands (86bcf32)
- clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
- clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
- clickhouse: use a bool type supported by
clickhouse_driver
(ab8f064) - clickhouse: workaround sqlglot’s insistence on uppercasing (6151f37)
- compiler: generate aliases in a less clever way (04a4aa5)
- datafusion: support sum aggregation on bool column (9421400)
- deps: bump duckdb to 0.7.0 (38d2276)
- deps: bump snowflake-connector-python upper bound (b368b04)
- deps: ensure that pyspark depends on sqlalchemy (60c7382)
- deps: update dependency pyarrow to v11 (2af5d8d)
- deps: update dependency sqlglot to v11 (e581e2f)
- don’t expose backend methods on
ibis.<backend>
directly (5a16431) - druid: remove invalid operations (19f214c)
- duckdb: add
null
to duckdb datatype parser (07d2a86) - duckdb: ensure that
temp_directory
exists (00ba6cb) - duckdb: explicitly set timezone to UTC on connection (6ae4a06)
- duckdb: fix blob type in literal (f66e8a1)
- duckdb: fix memtable
to_pyarrow
/to_pyarrow_batches
(0e8b066) - duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
- duckdb: quote identifiers if necessary in
struct_pack
(6e598cc) - duckdb: support casting to unsigned integer types (066c158)
- duckdb: treat
g
re_replace
flag as literal text (aa3c31c) - duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
- duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
- imports: remove top-level import of sqlalchemy from base backend (b13cf25)
- io: add
read_parquet
andread_csv
to base backend mixin (ce80d36), closes #5420 - ir: incorrect predicate pushdown (9a9204f)
- ir: make
find_subqueries
return in topological order (3587910) - ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
- ir: reorder the right schema of set operations to align with the left schema (58e60ae)
- ir: use
rlz.map_to()
rule instead ofisin
to normalize temporal units (a1c46a2) - ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
- mssql: set sqlglot to tsql (1044573)
- mysql: remove invalid operations (8f34a2b)
- pandas/dask: handle non numpy scalar results in
wrap_case_result
(a3b82f7) - pandas: don’t try to dispatch on arrow dtype if not available (d22ae7b)
- pandas: handle casting to arrays with None elements (382b90f)
- pandas: handle NAs in array conversion (06bd15d)
- polars: back compat for
concat_str
separator argument (ced5a61) - polars: back compat for the
reverse
/descending
argument (f067d81) - polars: polars execute respect limit kwargs (d962faf)
- polars: properly infer polars categorical dtype (5a4707a)
- polars: use metric name in aggregate output to dedupe columns (234d8c1)
- pyspark: fix incorrect
ops.EndsWith
translation rule (4c0a5a2) - pyspark: fix isnan and isinf to work on bool (8dc623a)
- snowflake: allow loose casting of objects and arrays (1cf8df0)
- snowflake: ensure that memtables are translated correctly (b361e07)
- snowflake: ensure that null comparisons are correct (9b83699)
- snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
- snowflake: ensure that we do not try to use a None schema or database (03e0265)
- snowflake: handle the case where pyarrow isn’t installed (b624fa3)
- snowflake: make
array_agg
preserve nulls (24b95bf) - snowflake: quote column names on construction of
sa.Column
(af4db5c) - snowflake: remove broken pyarrow fetch support (c440adb)
- snowflake: return
NULL
when trying to call map functions on non-object JSON (d85fb28) - snowflake: use
_flatten
to avoid overriding unrelated function in other backends (8c31594) - sqlalchemy: ensure that isin contains full column expression (9018eb6)
- sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
- sqlalchemy: make
strip
family of functions behave like Python (dd0a04c) - sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
- sqlalchemy: use
sa.true
instead of Python literal (8423eba) - sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
- sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
- sqlite: explicitly disallow array in literal (de73b37)
- sqlite: fix random scalar range (26d0dde)
- support negative string indices (f84a54d)
- trino: workaround broken dialect (b502faf)
- types: fix argument types of Table.order_by() (6ed3a97)
- util: make convert_unit work with python types (cb3a90c)
- ux: give the
value_counts
aggregate column a better name (abab1d7) - ux: make string range selectors inclusive (7071669)
- ux: make top level set operations work (f5976b2)
Performance
Refactors
- analysis: slightly simplify
find_subqueries()
(ab3712f) - backend: normalize exceptions (065b66d)
- clickhouse: clean up parsing rules (6731772)
- common: move
frozendict
andDotDict
toibis.common.collections
(4451375) - common: move the
geospatial
module to the base SQL backend (3e7bfa3) - dask: remove unneeded create_table() (86885a6)
- datatype: clean up parsing rules (c15fb5f)
- datatype: remove
Category
type and related APIs (bb0ee78) - datatype: remove
StructType.pairs
property in favor of identicalfields
attribute (6668122) - datatypes: move sqlalchemy datatypes to specific backend (d7b49eb)
- datatypes: remove
String
parent type fromJSON
type (34f3898) - datatype: use a dictionary to store
StructType
fields rather thannames
andtypes
tuples (84455ac) - datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
- drop
limit
kwarg fromto_parquet
/to_csv
(a54460c) - duckdb: clean up parsing rules (30da8f9)
- duckdb: handle parsing timestamp scale (16c1443)
- duckdb: remove unused
list<...>
parsing rule (f040b86) - duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
- ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
- ir/backends: remove various deprecated functions and methods (a8d3007)
- ir: reorganize the
scope
andtimecontext
utilities (80bd494) - ir: update
ArrayMap
to use the newcallable_with
validation rule (560474e) - move pretty repr tests back to their own file (4a75988)
- nix: clean up marker argument construction (12eb916)
- postgres: clean up datatype parsing (1f61661)
- postgres: clean up literal arrays (21b122d)
- pyspark: remove another private function (c5081cf)
- remove unnecessary top-level rich console (8083a6b)
- rules: remove unused
non_negative_integer
andpair
rules (e00920a) - schema: remove deprecated
Schema.from_dict()
,.delete()
and.append()
methods (8912b24) - snowflake: remove the need for
parsy
(c53403a) - sqlalchemy: set session parameters once per connection (ed4b476)
- sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
- test_sqlalchemy.py: move to snapshot testing (96998f0)
- tests: reorganize
rules
test file to theibis.expr
subpackage (47f0909) - tests: reorganize
schema
test file to theibis.expr
subpackage (40033e1) - tests: reorganize datatype test files to the datatypes subpackage (16199c6)
- trino: clean up datatype parsing (84c0e35)
- ux: return expression from
Table.info
(71cc0e0)
Deprecations
Documentation
- add a bunch of string expression examples (18d3112)
- add Apache Druid to backend matrix (764d9c3)
- add CNAME file to mkdocs source (6d19111)
- add druid to the backends index docs page (ad0b6a3)
- add missing DataFusion entry to the backends in the README (8ce025a)
- add redirects for common old pages (c9087f2)
- api: document deferred API and its pitfalls (8493604)
- api: improve
collect
method API documentation (b4fcef1) - array expression examples (6812c17)
- backends: document default backend configuration (6d917d3)
- backends: link to configuration from the backends list (144044d)
- blob: blog on ibis + substrait + duckdb (5dc7a0a)
- blog: adds examples sneak peek blog + assets folder (fcbb3d5)
- blog: adds to file sneak peek blog (128194f)
- blog: specify parsy 2.0 in substrait blog article (c264477)
- bump query engine count in README and use project-preferred names (11169f7)
- don’t sort backends by coverage percentage by default (68f73b1)
- drop docs versioning (d7140e7)
- duckdb: fix broken docstring examples (51084ad)
- enable light/dark mode toggle in docs (b9e812a)
- fill out table API with working examples (16fc8be)
- fix notebook logging example (04b75ef)
- how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
- improve Expr.substitute() docstring (b954edd)
- improve/update pandas walkthrough (80b05d8)
- io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
- io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
- move tutorial from docs to external ibis-examples repo (11b0237)
- parquet: add docstring examples for to_parquet incl. partitioning (8040164)
- point to
ibis-examples
repo in the README (1205636) - README.md: clean up readme, fix typos, alter the example (383a3d3)
- remove duplicate “or” (b6ef3cc)
- remove duplicate spark backend in install docs (5954618)
- render
__dunder__
method API documentation (b532c63) - rerender ci-analysis notebook with new table header colors (50507b6)
- streamlit: fix url for support matrix (594199b)
- tutorial: remove impala from sql tutorial (7627c13)
- use teal for primary & accent colors (24be961)
4.1.0 (2023-01-25)
Features
- add
ibis.get_backend
function (2d27df8) - add py.typed to allow mypy to type check packages that use ibis (765d42e)
- api: add
ibis.set_backend
function (e7fabaf) - api: add selectors for easier selection of columns (306bc88)
- bigquery: add JS UDF support (e74328b)
- bigquery: add SQL UDF support (db24173)
- bigquery: add to_pyarrow method (30157c5)
- bigquery: implement bitwise operations (55b69b1)
- bigquery: implement ops.Typeof (b219919)
- bigquery: implement ops.ZeroIfNull (f4c5607)
- bigquery: implement struct literal (c5f2a1d)
- clickhouse: properly support native boolean types (31cc7ba)
- common: add support for annotating with coercible types (ae4a415)
- common: make frozendict truly immutable (1c25213)
- common: support annotations with typing.Literal (6f89f0b)
- common: support generic mapping and sequence type annotations (ddc6603)
- dask: support
connect()
with no arguments (67eed42) - datatype: add optional timestamp scale parameter (a38115a)
- datatypes: add
as_struct
method to convert schemas to structs (64be7b1) - duckdb: add
read_json
function for consuming newline-delimited JSON files (65e65c1) - mssql: add a bunch of missing types (c698d35)
- mssql: implement inference for
DATETIME2
andDATETIMEOFFSET
(aa9f151) - nicer repr for Backend.tables (0d319ca)
- pandas: support
connect()
with no arguments (78cbbdd) - polars: allow ibis.polars.connect() to function without any arguments (d653a07)
- polars: handle casting to scaled timestamps (099d1ec)
- postgres: add
Map(string, string)
support via the built-inHSTORE
extension (f968f8f) - pyarrow: support conversion to pyarrow map and struct types (54a4557)
- snowflake: add more array operations (8d8bb70)
- snowflake: add more map operations (7ae6e25)
- snowflake: any/all/notany/notall reductions (ba1af5e)
- snowflake: bitwise reductions (5aba997)
- snowflake: date from ymd (035f856)
- snowflake: fix array slicing (bd7af2a)
- snowflake: implement
ArrayCollect
(c425f68) - snowflake: implement
NthValue
(0dca57c) - snowflake: implement
ops.Arbitrary
(45f4f05) - snowflake: implement
ops.StructColumn
(41698ed) - snowflake: implement
StringSplit
(e6acc09) - snowflake: implement
StructField
and struct literals (286a5c3) - snowflake: implement
TimestampFromUNIX
(314637d) - snowflake: implement
TimestampFromYMDHMS
(1eba8be) - snowflake: implement
typeof
operation (029499c) - snowflake: implement exists/not exists (7c8363b)
- snowflake: implement extract millisecond (3292e91)
- snowflake: make literal maps and params work (dd759d3)
- snowflake: regex extract, search and replace (9c82179)
- snowflake: string to timestamp (095ded6)
- sqlite: implement
_get_schema_using_query
in SQLite backend (7ff84c8) - trino: compile timestamp types with scale (67683d3)
- trino: enable
ops.ExistsSubquery
andops.NotExistsSubquery
(9b9b315) - trino: map parameters (53bd910)
- ux: improve error message when column is not found (b527506)
Bug Fixes
- backend: read the default backend setting in
_default_backend
(11252af) - bigquery: move connection logic to do_connect (42f2106)
- bigquery: remove invalid operations from registry (911a080)
- bigquery: resolve deprecation warnings for
StructType
andSchema
(c9e7078) - clickhouse: fix position call (702de5d)
- correctly visualize array type (26b0b3f)
- deps: make sure pyarrow is not an implicit dependency (10373f4)
- duckdb: make
read_csv
on URLs work (9e61816) - duckdb: only try to load extensions when necessary for csv (c77bde7)
- duckdb: remove invalid operations from registry (ba2ec59)
- fallback to default backend with
to_pyarrow
/to_pyarrow_batches
(a1a6902) - impala: remove broken alias elision (32b120f)
- ir: error for
order_by
on nonexistent column (57b1dd8) - ir: ops.Where output shape should consider all arguments (6f87064)
- mssql: infer bit as boolean everywhere (24f9d7c)
- mssql: pull nullability from column information (490f8b4)
- mysql: fix mysql query schema inference (12f6438)
- polars: remove non-working Binary and Decimal literal inference (0482d15)
- postgres: use permanent views to avoid connection pool defeat (49a4991)
- pyspark: fix substring constant translation (40d2072)
- set ops: raise if no tables passed to set operations (bf4bdde)
- snowflake: bring back bitwise operations (260facd)
- snowflake: don’t always insert a cast (ee8817b)
- snowflake: implement working
TimestampNow
(42d95b0) - snowflake: make sqlalchemy 2.0 compatible (8071255)
- snowflake: re-enable
ops.TableArrayView
(a1ad2b7) - snowflake: remove invalid operations from registry (2831559)
- sql: add
typeof
test and bring back implementations (7dc5356) - sqlalchemy: 2.0 compatibility (837a736)
- sqlalchemy: fix view creation with select stmts that have bind parameters (d760e69)
- sqlalchemy: handle correlated exists sanely (efa42bd)
- sqlalchemy: handle generic geography/geometry by name instead of geotype (23c35e1)
- sqlalchemy: use
exec_driver_sql
in view teardown (2599c9b) - sqlalchemy: use the backend’s compiler instead of
AlchemyCompiler
(9f4ff54) - sql: fix broken call to
ibis.map
(045edc7) - sqlite: interpolate
pathlib.Path
correctly inattach
(0415bd3) - trino: ensure connecting works with trino 0.321 (07cee38)
- trino: remove invalid operations from registry (665265c)
- ux: remove extra trailing newline in expression repr (ee6d58a)
Documentation
- add BigQuery backend docs (09d8995)
- add streamlit app for showing the backend operation matrix (3228f64)
- allow deselecting geospatial ops in backend support matrix (012da8c)
- api: document more public expression APIs (337018f)
- backend-info: prevent app from trying install duckdb extensions (3d94082)
- clean up gen_matrix.py after adding streamlit app (deb80f2)
- duckdb: add
to_pyarrow_batches
documentation (ec1ffce) - embed streamlit operation matrix app to docs (469a50d)
- make firefox render the proper iframe height (ff1d4dc)
- publish raw data for operation matrix (62e68da)
- re-order when to download test data (8ce8c16)
- release: update breaking changes in the release notes for 4.0.0 (4e91401)
- remove trailing parenthesis (4294397)
- update ibis-version-4.0.0-release.md (f6701df)
- update links to contributing guides (da615e4)
Refactors
- bigquery: explicitly disallow INT64 in JS UDF (fb33bf9)
- datatype: add custom sqlalchemy nested types for backend differentiation (dec70f5)
- datatype: introduce to_sqla_type dispatching on dialect (a8bbc00)
- datatypes: remove Geography and Geometry types in favor of GeoSpatial (d44978c)
- datatype: use a mapping to store
StructType
fields rather thannames
andtypes
tuples (ff34c7b) - dtypes: expose nbytes property for integer and floating point datatypes (ccf80fd)
- duckdb: remove
.raw_sql
call (abc939e) - duckdb: use sqlalchemy-views to reduce string hacking (c162750)
- ir: remove UnnamedMarker (dd352b1)
- postgres: use a bindparam for metadata queries (b6b4669)
- remove empty unused file (9d63fd6)
- schema: use a mapping to store
Schema
fields rather thannames
andtypes
tuples (318179a) - simplify
_find_backend
implementation (60f1a1b) - snowflake: remove unnecessary
parse_json
call inops.StructField
impl (9e80231) - snowflake: remove unnecessary casting (271554c)
- snowflake: use
unary
instead offixed_arity(..., 1)
(4a1c7c9) - sqlalchemy: clean up quoting implementation (506ce01)
- sqlalchemy: generalize handling of failed type inference (b0f4e4c)
- sqlalchemy: move
_get_schema_using_query
to base class (296cd7d) - sqlalchemy: remove the need for deferred columns (e4011aa)
- sqlalchemy: remove use of deprecated
isnot
(4ec53a4) - sqlalchemy: use
exec_driver_sql
everywhere (e8f96b6) - sql: finally remove
_CorrelatedRefCheck
(f49e429)
Deprecations
4.0.0 (2023-01-09)
⚠ BREAKING CHANGES
- functions, methods and classes marked as deprecated are removed now
- ir: replace
HLLCardinality
withApproxCountDistinct
andCMSMedian
withApproxMedian
operations. - backends: the datatype of returned execution results now more closely matches that of the ibis expression’s type. Downstream code may need to be adjusted.
- ir: the
JSONB
type is replaced by theJSON
type. - dev-deps: expression types have been removed from
ibis.expr.api
. Useimport ibis.expr.types as ir
to access these types. - common: removed
@immutable_property
decorator, use@attribute.default
instead - timestamps: the
timezone
argument toto_timestamp
is gone. This was only supported in the BigQuery backend. Append%Z
to the format string and the desired time zone to the input column if necessary. - deps: ibis now supports at minimum duckdb 0.3.3. Please upgrade your duckdb install as needed.
- api: previously
ibis.connect
would return aTable
object when callingconnect
on a parquet/csv file. This now returns a backend containing a single table created from that file. When possible users may useibis.read
instead to read files into ibis tables. - api:
histogram()
’sclosed
argument no longer exists because it never had any effect. Remove it from yourhistogram
method calls. - pandas/dask: the pandas and Dask backends now interpret casting ints to/from timestamps as seconds since the unix epoch, matching other backends.
- datafusion:
register_csv
andregister_parquet
are removed. Pass filename toregister
method instead. - ir:
ops.NodeList
andir.List
are removed. Use tuples to represent sequence of expressions instead. - api:
re_extract
now followsre.match
behavior. In particular, the0
th group is now the entire string if there’s a match, otherwise the groups are 1-based. - datatypes: enums are now strings. Likely no action needed since no functionality existed.
- ir: Replace
t[t.x.topk(...)]
witht.semi_join(t.x.topk(...), "x")
. - ir:
ir.Analytic.type()
andir.TopK.type()
methods are removed. - api: the default limit for table/column expressions is now
None
(meaning no limit). - ir: join changes: previously all column names that collided between
left
andright
tables were renamed with an appended suffix. Now for the case of inner joins with only equality predicates, colliding columns that are known to be equal due to the join predicates aren’t renamed. - impala: kerberos support is no longer installed by default for the
impala
backend. To add support you’ll need to install thekerberos
package separately. - ir:
ops.DeferredSortKey
is removed. Useops.SortKey
directly instead. - ir:
ibis.common.grounds.Annotable
is mutable by default now - ir:
node.has_resolved_name()
is removed, useisinstance(node, ops.Named)
instead;node.resolve_name()
is removed usenode.name
instead - ir: removed
ops.Node.flat_args()
, directly usenode.args
property instead - ir: removed
ops.Node.inputs
property, use the multipledispatchedget_node_arguments()
function in the pandas backend - ir:
Node.blocks()
method has been removed. - ir:
HasSchema
mixin class is no longer available, directly subclassops.TableNode
and implement schema property instead - ir: Removed
Node.output_type
property in favor of abstractmethodNode.to_expr()
which now must be explicitly implemented - ir:
Expr(Op(Expr(Op(Expr(Op)))))
is now represented asExpr(Op(Op(Op)))
, so code using ibis internals must be migrated - pandas: Use timezone conversion functions to compute the original machine localized value
- common: use
ibis.common.validators.{Parameter, Signature}
instead - ir:
ibis.expr.lineage.lineage()
is now removed - ir: removed
ir.DestructValue
,ir.DestructScalar
andir.DestructColumn
, usetable.unpack()
instead - ir: removed
Node.root_tables()
method, useibis.expr.analysis.find_immediate_parent_tables()
instead - impala: use other methods for pinging the database
Features
- add experimental decorator (791335f)
- add to_pyarrow and to_pyarrow_batches (a059cf9)
- add unbind method to expressions (4b91b0b), closes #4536
- add way to specify sqlglot dialect on backend (f1c0608)
- alchemy: implement json getitem for sqlalchemy backends (7384087)
- api: add
agg
alias foraggregate
(907583f) - api: add
agg
alias togroup_by
(6b6367c) - api: add
ibis.read
top level API function (e67132c) - api: add JSON
__getitem__
operation (3e2efb4) - api: implement
__array__
(1402347) - api: make
drop
variadic (1d69702) - api: return object from
to_sql
to support notebook syntax highlighting (87c9833) - api: use
rich
for interactive__repr__
(04758b8) - backend: make
ArrayCollect
filterable (1e1a5cf) - backends/mssql: add backend support for Microsoft Sql Server (fc39323)
- bigquery: add ops.DateFromYMD, ops.TimeFromHMS, ops.TimestampFromYMDHMS (a4a7936)
- bigquery: add ops.ExtractDayOfYear (30c547a)
- bigquery: add support for correlation (4df9f8b)
- bigquery: implement
argmin
andargmax
(40c5f0d) - bigquery: implement
pi
ande
(b91370a) - bigquery: implement array repeat (09d1e2f)
- bigquery: implement JSON getitem functionality (9c0e775)
- bigquery: implement ops.ArraySlice (49414ef)
- bigquery: implement ops.Capitalize (5757bb0)
- bigquery: implement ops.Clip (5495d6d)
- bigquery: implement ops.Degrees, ops.Radians (5119b93)
- bigquery: implement ops.ExtractWeekOfYear (477d287)
- bigquery: implement ops.RandomScalar (5dc8482)
- bigquery: implement ops.StructColumn, ops.ArrayColumn (2bbf73c)
- bigquery: implement ops.Translate (77a4b3e)
- bigquery: implementt ops.NthValue (b43ba28)
- bigquery: move bigquery backend back into the main repo (cd5e881)
- clickhouse: handle more options in
parse_url
implementation (874c5c0) - clickhouse: implement
INTERSECT ALL
/EXCEPT ALL
(f65fbc3) - clickhouse: implement quantile/multiquantile (96d7d1b)
- common: support function annotations with both typehints and rules (7e23f3e)
- dask: implement
mode
aggregation (017f07a) - dask: implement json getitem (381d805)
- datafusion: convert column expressions to pyarrow (0a888de)
- datafusion: enable
topk
(d44903f) - datafusion: implement
Limit
(1ddc876) - datafusion: implement
ops.StringConcat
(6bb5b4f) - decompile: support rendering ibis expression as python code (7eebc67)
- deps: support shapely 2.0 (68dff10)
- display qualified named in deprecation warnings (a6e2a49)
- docs: first draft of Ibis for pandas users (7f7c9b5)
- duckdb: enable registration of parquet files from s3 (fced465)
- duckdb: implement
mode
aggregation (36fd152) - duckdb: implement
to_timestamp
(26ca1e4) - duckdb: implement quantile/multiquantile (fac9705)
- duckdb: overwrite views when calling
register
(ae07438) - duckdb: pass through kwargs to file loaders (14fa2aa)
- duckdb: support out of core execution for in-memory connections (a4d4ba2)
- duckdb: support registering external postgres tables with duckdb (8633e6b)
- expr: split ParseURL operation into multiple URL extract operations (1f0fcea)
- impala: implement
strftime
(d3ede8d) - impala: support date literals (cd334c4)
- insert: add support for list+dict to sqlalchemy backends (15d399e)
- ir/pandas/dask/clickhouse: revamp Map type support (62b6f2d)
- ir: add
is_*
methods toDataType
s (79f5c2b) - ir: prototype for parsing SQL into an ibis expression (1301183)
- ir: support python 3.10 pattern matching on Annotable nodes (eca93eb)
- mssql: add window function support (ef1be45)
- mssql: detect schema from SQL (ff79928)
- mssql: extract quarter (7d04266)
- mssql: implement ops.DayOfWeekIndex (4125593)
- mssql: implement ops.ExtractDayOfYear (ae026d5)
- mssql: implement ops.ExtractEpochSeconds (4f49b5b)
- mssql: implement ops.ExtractWeekOfYear (f1394bc)
- mssql: implement ops.Ln, ops.Log, ops.Log2, ops.Log10 (f8ee1d8)
- mssql: implement ops.RandomScalar (4149450)
- mssql: implement ops.TimestampTruncate, ops.DateTruncate (738e496)
- mssql: implementt ops.DateFromYMD, ops.TimestampFromYMDHMS, ops.TimeFromHMS (e84f2ce)
- open
*.db
files with sqlite inibis.connect
(37baf05) - pandas: implement
mode
aggregation (fc023b5) - pandas: implement
RegexReplace
forstr
(23713cc) - pandas: implement json getitem (8fa1190)
- pandas: implement quantile/multiquantile (cd4dcaa)
- pandas: support
histogram
API (5bfc0fe) - polars: enable
topk
(8bfb16a) - polars: implement
mode
aggregation (7982ba2) - polars: initial support for polars backend (afecb0a)
- postgres: implement
mode
aggregation (b2f1c2d) - postgres: implement quantile and multiquantile (82ed4f5)
- postgres: prettify array literals (cdc60d5)
- pyspark: add support for struct operations (ce05987)
- pyspark: enable
topk
(0f748e0) - pyspark: implement
pi
ande
(fea81c6) - pyspark: implement json getitem (9bfb748)
- pyspark: implement quantile and multiquantile (743f411)
- pyspark: support
histogram
API (8f4808c) - snowflake: enable day-of-week column expression (6fd9c33)
- snowflake: handle date and timestamp literals (ec2392d)
- snowflake: implement
mode
aggregation (f35915e) - snowflake: implement
parse_url
(a9746e3) - snowflake: implement
rowid
scalar (7e1425a) - snowflake: implement
time
literal (068fc50) - snowflake: implement scalar (cc07d91)
- snowflake: initial commit for snowflake backend (a8687dd)
- snowflake: support reductions in window functions via automatic ordering (0234e5c)
- sql: add ops.StringSQLILike (7dc4924)
- sqlalchemy: implement
ops.Where
usingIF
/IFF
functions (4cc9c15) - sqlalchemy: in-memory tables have name in generated SQL (01b4c60)
- sql: improve error message in fixed_arity helper (891a1ad)
- sqlite: add
type_map
arg to override type inference (1961bad) - sqlite: fix impl for missing
pi
ande
functions (24b6d2f) - sqlite: support
con.sql
with explicit schema specified (7ca82f3) - sqlite: support wider range of datetime formats (f65093a)
- support both
postgresql://
andpostgres://
inibis.connect
(2f7a7b4) - support deferred predicates in join (b51a64b)
- support more operations with unsigned integers (9992953)
- support passing callable to relabel (0bceefd)
- support tab completion for getitem access of table columns (732dba4)
- support Table.fillna for SQL backends (26d4cac)
- trino: add
bit_xor
aggregation (830acf4) - trino: add
EXTRACT
-based functionality (6549657) - trino: add millisecond scale to *_trunc function (3065248)
- trino: add some basic aggregation ops (7ecf7ab)
- trino: extract milliseconds (09517a5)
- trino: implement
approx_median
(1cba8bd) - trino: implement
parse_url
(2bc87fc) - trino: implement
round
,cot
,pi
, ande
(c0e8736) - trino: implement arbitrary first support (0c7d3b3)
- trino: implement array collect support (dfeb600)
- trino: implement array column support (dadf9a8)
- trino: implement array concat (240c55d)
- trino: implement array index (c5f3a96)
- trino: implement array length support (2d7cc65)
- trino: implement array literal support (2182177)
- trino: implement array repeat (2ee3d10)
- trino: implement array slicing (643792e)
- trino: implement basic struct operations (cc3c937)
- trino: implement bitwise agg support (5288b35)
- trino: implement bitwise scalar/column ops (ac4876c)
- trino: implement default precision and scale (37f8a47)
- trino: implement group concat support (5c41439)
- trino: implement json getitem support (7c41566)
- trino: implement map operations (4efc5ce)
- trino: implement more generic and numeric ops (63b45c8)
- trino: implement ops.Capitalize (dff14fc)
- trino: implement ops.DateFromYMD (edd2994)
- trino: implement ops.DateTruncate, ops.TimestampTruncate (32f4862)
- trino: implement ops.DayOfWeekIndex, ops.DayOfWeekName (a316d6d)
- trino: implement ops.ExtractDayOfYear (b0a3465)
- trino: implement ops.ExtractEpochSeconds (10b82f1)
- trino: implement ops.ExtractWeekOfYear (cf719b8)
- trino: implement ops.Repeat (e9f6851)
- trino: implement ops.Strftime (a436823)
- trino: implement ops.StringAscii (93fd32d)
- trino: implement ops.StringContains (d5cb2ec)
- trino: implement ops.StringSplit (62d79a6)
- trino: implement ops.StringToTimestamp (b766f62)
- trino: implement ops.StrRight (691b39c)
- trino: implement ops.TimeFromHMS (e5cacc2)
- trino: implement ops.TimestampFromUNIX (ce5d726)
- trino: implement ops.TimestampFromYMDHMS (9fa7304)
- trino: implement ops.TimestampNow (c832e4c)
- trino: implement ops.Translate (410ae1e)
- trino: implement quantile/multiquantile (bc7fdab)
- trino: implement regex functions (9e493c5)
- trino: implement window function support (5b6cc45)
- trino: initial trino backend (c367865)
- trino: support string date scalar parameter (9092530)
- trino: use proper
approx_distinct
function (3766fff)
Bug Fixes
ibis.connect
always returns a backend (2d5b155)- allow inserting memtable with alchemy backends (c02fcc3)
- always display at least one column in the table repr (5ea9e5a)
- analysis: only lower sort keys that are in an agg’s output (6bb4f66)
- api: allow arbitrary sort keys (a980b34)
- api: allow boolean scalars in predicate APIs (2a2636b)
- api: allow deferred instances as input to
ibis.desc
andibis.asc
(6861347) - api: ensure that window functions are propagated (4fb1106)
- api: make
re_extract
conform to semantics of Python’sre.match
(5981227) - auto-register csv and parquet with duckdb using
ibis.connect
(67c4f87) - avoid renaming known equal columns for inner joins with equality predicates (5d4b0ed)
- backends: fix casting and execution result types in many backends (46c21dc)
- bigquery: don’t try to parse database when name is already fully qualified (ae3c113)
- bigquery: fix integer to timestamp casting (f5bacad)
- bigquery: normalize W frequency in *_trunc (893cd49)
- catch
TypeError
instead of more specific error (6db19d8) - change default limit to None (8d1526a)
- clarify and normalize behavior of
Table.rowid
(92b03d6) - clickhouse: ensure that correlated subqueries’ columns can be referenced (708d682)
- clickhouse: fix list_tables to use database name (edc3511)
- clickhouse: make
any
/all
filterable and reduce code size (99b10e2) - clickhouse: use clickhouse’s dbapi (bd0da12)
- common: support copying variadic annotable instances (ee0d9ad)
- dask: make filterable reductions work (0f759fc)
- dask: raise TypeError with informative message in ibis.dask.connect (4e67f7a)
- define
to_pandas
/to_pyarrow
on DataType/Schema classes directly (22f3b4d) - deps: bound shapely to a version that doesn’t segfault (be5a779)
- deps: update dependency datafusion to >=0.6,<0.8 (4c73870)
- deps: update dependency geopandas to >=0.6,<0.13 (58a32dc)
- deps: update dependency packaging to v22 (e0b6177)
- deps: update dependency rich to v13 (4f313dd)
- deps: update dependency sqlglot to v10 (db19d43)
- deps: update dependency sqlglot to v9 (cf330ac)
- docs: make sure data can be downloaded when building notebooks (fa7da17)
- don’t fuse filters & selections that contain window functions (d757069)
- drop snowflake support for RowID (dd378f1)
- duckdb: drop incorrect
translate
implementation (8690151) - duckdb: fix bug in json getitem for duckdb (49ce739)
- duckdb: keep
ibis.now()
type semantics (eca4a2c) - duckdb: make array repeat actually work (021f4de)
- duckdb: replace all in
re_replace
(c138f0f) - duckdb: rereflect sqla table on re-registration (613b311), closes #4729
- duckdb: s3 priority (a2d03d1)
- duckdb: silence duckdb-engine warnings (359adc3)
- ensure numpy ops dont accidentally cast ibis types (a7ca6c8)
- exclude geospatial ops from pandas/dask/polars
has_operation
(6f1d265) - fix
table.mutate
with deferred named expressions (5877d0b) - fix bug when disabling
show_types
in interactive repr (2402506) - fix expression repr for table -> value operations (dbf92f5)
- handle dimensionality of empty outputs (3a88170)
- improve rich repr support (522db9c)
- ir: normalize
date
types (39056b5) - ir: normalize timestamps to
datetime.datetime
values (157efde) - make
col.day_of_week
not an expr (96e1580) - mssql: fix integer to timestamp casting (9122eef)
- mssql: fix ops.TimeFromHMS (d2188e1)
- mssql: fix ops.TimestampFromUNIX (ec28add)
- mssql: fix round without argument (52a60ce)
- mssql: use double-dollar sign to prevent from interpolating a value (b82da5d)
- mysql: fix mysql
startswith
/endswith
to be case sensitive (d7469cc) - mysql: handle out of bounds timestamps and fix milliseconds calculation (1f7649a)
- mysql: upcast bool agg args (8c5f9a5)
- pandas/dask now cast int<->timestamp as seconds since epoch (bbfe998)
- pandas: drop
RowID
implementation (05f5016) - pandas: make quantile/multiquantile with filter work (6b5abd6)
- pandas: support
substr
with nolength
(b2c2922) - pandas: use localized UTC time for
now
operation (f6d7327) - pandas: use the correct context when aggregating over a window (e7fa5c0)
- polars: fix polars
startswith
to call the right method (9e6f397) - polars: workaround passing
pl.Null
to the null type (fd9633b) - postgres/duckdb: fix negative slicing by copying the trino impl (39e3962)
- postgres: fix array repeat to work with literals (3c46eb1)
- postgres: fix array_index operation (63ef892)
- postgres: make any/all translation rules use
reduction
helper (78bfd1d) - pyspark: handle
datetime.datetime
literals (4f94abe) - remove kerberos extra for impala dialect (6ed3e5f)
- repr: don’t repeat value in repr for literals (974eeb6)
- repr: fix off by one in repr (322c8dc)
- s3: fix quoting and autonaming for s3 (ce09266)
- select: raise error on attempt to select no columns in projection (94ac10e)
- snowflake: fix extracting query parameter by (75af240)
- snowflake: fix failing snowflake url extraction functions (2eee50b)
- snowflake: fix snowflake list_databases (680cd24)
- snowflake: handle schema when getting table (f6fff5b)
- snowflake: snowflake now likes Tuesdays (1bf9d7c)
- sqlalchemy: allow passing pd.DataFrame to create (1a083f6)
- sqlalchemy: ensure that arbitrary expressions are valid sort keys (cb1a013)
- sql: avoid generating cartesian products yet again (fdc52a2)
- sqlite: fix sqlite
startswith
/endswith
to be case sensitive (fd4a88d) - standardize list_tables signature everywhere (abafe1b), closes #2877
- support
arbitrary
with no arguments (45156f5) - support dtype in
__array__
methods (1294b76) - test: ensure that file-based url tests don’t expect data to exist (c2b635a)
- trino: fix integer to timestamp casting (49321a6)
- trino: make filterable any/all reductions work (992bd18)
- truncate columns in repr for wide tables (aadcba1)
- typo: in StringValue helpstr (b2e2093)
- ux: improve error messages for rlz.comparable failures (5ca41d2)
- ux: prevent infinite looping when formatting a floating column of all nans (b6afe98)
- visualize(label_edges=True) works for NodeList ops (a91ceae)
- visualize: dedup nodes and edges and add
verbose
argument for debugging (521e188) - visualize: handle join predicates in visualize (d63cb57)
- window: allow window range tuples in preceding or following (77172b3)
Deprecations
Performance
add benchmark for known-slow table expression (e9617f0)
expr: traverse nodes only once during compilation (69019ed)
fix join performance by avoiding Projection construction (ed532bf)
node: give
Node
s the default Python repr (eb26b11)ux: remove pandas import overhead from
import ibis
(ea452fc)deps: bump duckdb lower bound (4539683)
dev-deps: replace flake8 et al with
ruff
and fix lints (9c1b282)
Refactors
- add
lazy_singledispatch
utility (180ecff) - add
rlz.lazy_instance_of
(4e30480) - add
Temporal
base class for temporal data types (694eec4) - api: add deprecated Node.op() #4519 (2b0826b)
- avoid roundtripping to expression for
IFF
(3068ae2) - clean up
cot
implementations to have one less function call (0f304e5) - clean up timezone support in ops.TimestampFromYMDHMS (2e183a9)
- cleanup str method docstrings (36bd36c)
- clickhouse: implement sqlglot-based compiler (5cc5d4b)
- clickhouse: simplify Quantile and MultiQuantile implementation (9e16e9e)
- common: allow traversal and substitution of tuple and dictionary arguments (60f4806)
- common: enforce slots definitions for Base subclasses (6c3df91)
- common: move Parameter and Signature to validators.py (da20537)
- common: reduce implementation complexity of annotations (27cee71)
- datafusion: align register API across backends (08046aa)
- datafusion: get name from expr (fea3e5b)
- datatypes: remove Enum (145e706)
- dev-deps: remove unnecessary
poetry2nix
overrides (5ed95bc) - don’t sort new columns in mutate (72ec96a)
- duckdb: use lambda to define backend operations (5d14de6)
- impala: move impala SQL tests to snapshots (927bf65)
- impala: replace custom pooling with sqlalchemy QueuePool (626cdca)
- ir:
ops.List
->ops.NodeList
(6765bd2) - ir: better encapsulate graph traversal logic, schema and datatype objects are not traversable anymore (1a07725)
- ir: generalize handling and traversal of node sequences (e8bcd0f)
- ir: make all value operations ‘Named’ for more consistent naming semantics (f1eb4d2)
- ir: move random() to api.py (e136f1b)
- ir: remove
ops.DeferredSortKey
(e629633) - ir: remove
ops.TopKNode
andir.TopK
(d4dc544) - ir: remove Analytic expression’s unused type() method (1864bc1)
- ir: remove DecimalValue.precision(), DecimalValue.scale() method (be975bc)
- ir: remove DestructValue expressions (762d384)
- ir: remove duplicated literal creation code (7dfb56f)
- ir: remove intermediate expressions (c6fb0c0)
- ir: remove lin.lineage() since it’s not used anywhere (120b1d7)
- ir: remove node.blocks() in favor of more explicit type handling (37d8ce4)
- ir: remove Node.inputs since it is an implementation detail of the pandas backend (6d2c49c)
- ir: remove node.root_tables() and unify parent table handling (fbb07c1)
- ir: remove ops.AggregateSelection in favor of an.simplify_aggregation (ecf6ed3)
- ir: remove ops.NodeList and ir.List in favor of builtin tuples (a90ce35)
- ir: remove pydantic dependency and make grounds more composable (9da0f41)
- ir: remove sch.HasSchema and introduce ops.Projection base class for ops.Selection (c3b0139)
- ir: remove unnecessary complexity introduced by variadic annotation (698314b)
- ir: resolve circular imports so operations can be globally imported for types (d2a3919)
- ir: simplify analysis.substitute_unbound() (a6c7406)
- ir: simplify SortKey construction using rules (4d63280)
- ir: simplify switch-case builders (9acf717)
- ir: split datatypes package into multiple submodules (cce6535)
- ir: split out table count into
CountStar
operation (e812e6e) - ir: support replacing nodes in the tree (6a0df5a)
- ir: support variadic annotable arguments and add generic graph traversal routines (5d6a289)
- ir: unify aggregation construction to use AggregateSelection (c7d6a6f)
- make
quantile
,any
, andall
reductions filterable (1bafc9e) - make sure
value_counts
always has a projection (a70a302) - mssql: use lambda to define backend operations (1437cfb)
- mysql: dedup extract code (d551944)
- mysql: use lambda to define backend operations (d10bff8)
- polars: match duckdb registration api (ac59dac)
- postgres: use lambda to define backend operations (4c85d7b)
- remove dead
compat.py
module (eda0fdb) - remove deprecated approximate aggregation classes (53fc6cb)
- remove deprecated functions and classes (be1cdda)
- remove duplicate
_random_identifier
calls (26e7942) - remove setup.py and related infrastructure (adfcce1)
- remove the
JSONB
type (c4fc0ec) - rename some infer methods for consistency (a8f5579)
- replace isinstance dtype checking with
is_*
methods (386adc2) - rework registration / file loading (c60e30d)
- rules: generalize field referencing using rlz.ref() (0afb8b9)
- simplify
ops.ArrayColumn
in postgres backend (f9677cc) - simplify histogram implementation by using window functions (41cbc29)
- simplify ops.ArrayColumn in alchemy backend (28ff4a8)
- snowflake: use lambda to define backend operations (cb33fce)
- split up custom nix code; remove unused derivations (57dff10)
- sqlite: use lambda to define backend operations (b937391)
- test: make clickhouse tests use
pytest-snapshot
(413dbd2) - tests: move sql output to golden dir (6a6a453)
- test: sort regex test cases by name instead of posix-ness (0dfb0e7)
- tests: replace
sqlgolden
withpytest-snapshot
(5700eb0) - timestamps: remove
timezone
argument toto_timestamp
API (eb4762e) - trino: use lambda to define backend operations (dbd61a5)
- uncouple
MultiQuantile
class fromQuantile
(9c48f8c) - use
rlz.lazy_instance_of
to delay shapely import (d14badc) - use lazy dispatch for
dt.infer
(2e56540)
Documentation
- add
backend_sensitive
decorator (836f237) - add
pip install poetry
dev env setup step (69940b1) - add bigquery ci data analysis notebook (2b1d4e5)
- add how to sessionize guide (18989dd)
- add issue templates (4480c18)
- add missing argument descriptions (ea757fa)
- add mssql backend page (63c0f19)
- added 4.0 release blog post (bcc0eca)
- added memtable howto guide (5dde9bd)
- backends: add duckdb and mssql to the backend index page (7b13218)
- bring back git revision localized date plugin (e4fc2c9)
- created how to guide for deferred expressions (2a9f6ab)
- dev: python-duckdb now available for windows with conda (7f76b09)
- document how to create a table from a pandas dataframe using ibis.memtable (c6521ec)
- fix backends label in feature request issue form (cf852d3)
- fix broken docstrings; reduce docstring noise; workaround griffe (bd1c637)
- fix docs for building docs (23af567)
- fix feature-request issue template (6fb62f5)
- fix installation section for conda (7af6ac1)
- fix landing page links (1879362)
- fix links to make docs work locally and remotely (13c7810)
- fix pyarrow batches docstring (dba9594)
- fix single line docstring summaries (8028201)
- fix snowflake doc link in readme.md (9aff68e)
- fix the inline example for ibis.dask.do_connect (6a533f0)
- fix tutorial link on install page (b34811a)
- fix typo in first example of the homepage (9a8a25a)
- formatting and syntax highlighting fixes (50864da)
- front page rework (24b795a)
- how-to: use parquet data source for sessionization, fix typos, more deferred usage (974be37)
- improve the docstring of the generic connect method (ee87802)
- issue template cleanups (fed37da)
- list (e331247)
- polars: add backend docs page (e303b68)
- remove hrs (4c30de4)
- renamed how to guides to be more consistent (1bdc5bd)
- sentence structure in the Notes section (ac20232)
- show interactive prompt for python (5d7d913)
- split out geospatial operations in the support matrix docs (0075c28)
- trino: add backend docs (2f262cd)
- typo (6bac645)
- typos headers and formatting (9566cbb)
- udf: examples in pandas have the incorrect import path (49028b8)
- update filename (658a296)
- update line (4edfce0)
- update readme (19a3f3c)
- use buf/feat prefix only (2561a29)
- use components instead of pieces (179ca1e)
- use heading instead of bulleted bold (99b044e)
- use library instead of project (fd2d915)
- use present tense for use cases and “why” section (6cc7416)
- www: fix frontpage example (7db39e8)
3.2.0 (2022-09-15)
Features
- add api to get backend entry points (0152f5e)
- api: add
and_
andor_
helpers (94bd4df) - api: add
argmax
andargmin
column methods (b52216a) - api: add
distinct
toIntersection
andDifference
operations (cd9a34c) - api: add
ibis.memtable
API for constructing in-memory table expressions (0cc6948) - api: add
ibis.sql
to easily get a formatted SQL string (d971cc3) - api: add
Table.unpack()
andStructValue.lift()
APIs for projecting struct fields (ced5f53) - api: allow transmute-style select method (d5fc364)
- api: implement all bitwise operators (7fc5073)
- api: promote
psql
to ashow_sql
public API (877a05d) - clickhouse: add dataframe external table support for memtables (bc86aa7)
- clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
- clickhouse: enable support for working window functions (310a5a8)
- clickhouse: implement
argmin
andargmax
(ee7c878) - clickhouse: implement bitwise operations (348cd08)
- clickhouse: implement struct scalars (1f3efe9)
- dask: implement
StringReplace
execution (1389f4b) - dask: implement ungrouped
argmin
andargmax
(854aea7) - deps: support duckdb 0.5.0 (47165b2)
- duckdb: handle query parameters in
ibis.connect
(fbde95d) - duckdb: implement
argmin
andargmax
(abf03f1) - duckdb: implement bitwise xor (ca3abed)
- duckdb: register tables from pandas/pyarrow objects (36e48cc)
- duckdb: support unsigned integer types (2e67918)
- impala: implement bitwise operations (c5302ab)
- implement dropna for SQL backends (8a747fb)
- log: make BaseSQLBackend._log print by default (12de5bb)
- mysql: register BLOB types (1e4fb92)
- pandas: implement
argmin
andargmax
(bf9b948) - pandas: implement
NotContains
on grouped data (976dce7) - pandas: implement
StringReplace
execution (578795f) - pandas: implement Contains with a group by (c534848)
- postgres: implement bitwise xor (9b1ebf5)
- pyspark: add option to treat nan as null in aggregations (bf47250)
- pyspark: implement
ibis.connect
for pyspark (a191744) - pyspark: implement
Intersection
andDifference
(9845a3c) - pyspark: implement bitwise operators (33cadb1)
- sqlalchemy: implement bitwise operator translation (bd9f64c)
- sqlalchemy: make
ibis.connect
with sqlalchemy backends (b6cefb9) - sqlalchemy: properly implement
Intersection
andDifference
(2bc0b69) - sql: implement
StringReplace
translation (29daa32) - sqlite: implement bitwise xor and bitwise not (58c42f9)
- support
table.sort_by(ibis.random())
(693005d) - type-system: infer pandas’ string dtype (5f0eb5d)
- ux: add duckdb as the default backend (8ccb81d)
- ux: use
rich
to formatTable.info()
output (67234c3) - ux: use
sqlglot
for pretty printing SQL (a3c81c5) - variadic union, intersect, & difference functions (05aca5a)
Bug Fixes
- api: make sure column names that are already inferred are not overwritten (6f1cb16)
- api: support deferred objects in existing API functions (241ce6a)
- backend: ensure that chained limits respect prior limits (02a04f5)
- backends: ensure select after filter works (e58ca73)
- backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
- base-sql: fix String-generating backend string concat implementation (3cf78c1)
- clickhouse: add IPv4/IPv6 literal inference (0a2f315)
- clickhouse: cast repeat
times
argument toUInt64
(b643544) - clickhouse: fix listing tables from databases with no tables (08900c3)
- compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
- compiler: use
repr
for SQL stringVALUES
data (75af658) - dask: ensure predicates are computed before projections (5cd70e1)
- dask: implement timestamp-date binary comparisons (48d5058)
- dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
- decimal: add decimal type inference (3fe3fd8)
- deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
- deps: update dependency duckdb-engine to >=0.1.8,<0.5.0 (ef97c9d)
- deps: update dependency parsy to v2 (9a06131)
- deps: update dependency shapely to >=1.6,<1.8.4 (0c787d2)
- deps: update dependency shapely to >=1.6,<1.8.5 (d08c737)
- deps: update dependency sqlglot to v5 (f210bb8)
- deps: update dependency sqlglot to v6 (5ca4533)
- duckdb: add missing types (59bad07)
- duckdb: ensure that in-memory connections remain in their creating thread (39bc537)
- duckdb: use
fetch_arrow_table()
to be able to handle big timestamps (85a76eb) - fix bug in pandas & dask
difference
implementation (88a78fa) - fix dask
where
implementation (49f8845) - impala: add date column dtype to impala to ibis type dict (c59e94e), closes #4449
- pandas where supports scalar for
left
(48f6c1e) - pandas: fix anti-joins (10a659d)
- pandas: implement timestamp-date binary comparisons (4fc666d)
- pandas: properly handle empty groups when aggregating with
GroupConcat
(6545f4d) - pyspark: fix broken
StringReplace
implementation (22cb297) - pyspark: make sure
ibis.connect
works with pyspark (a7ab107) - pyspark: translate predicates before projections (b3d1c80)
- sqlalchemy: fix float64 type mapping (8782773)
- sqlalchemy: handle reductions with multiple arguments (5b2039b)
- sqlalchemy: implement
SQLQueryResult
translation (786a50f) - sql: fix sql compilation after making
InMemoryTable
a subclass ofPhysicalTable
(aac9524) - squash several bugs in
sort_by
asc/desc handling (222b2ba) - support chained set operations in SQL backends (227aed3)
- support filters on InMemoryTable exprs (abfaf1f)
- typo: in BaseSQLBackend.compile docstring (0561b13)
Deprecations
Performance
Documentation
- add
to_sql
(e2821a5) - add back constraints for transitive doc dependencies and fix docs (350fd43)
- add coc reporting information (c2355ba)
- add community guidelines documentation (fd0893f)
- add HeavyAI to the readme (4c5ca80)
- add how-to bfill and ffill (ff84027)
- add how-to for ibis+duckdb register (73a726e)
- add how-to section to docs (33c4b93)
- duckdb: add installation note for duckdb >= 0.5.0 (608b1fb)
- fix
memtable
docstrings (72bc0f5) - fix flake8 line length issues (fb7af75)
- fix markdown (4ab6b95)
- fix relative links in tutorial (2bd075f), closes #4064 #4201
- make attribution style uniform across the blog (05561e0)
- move the blog out to the top level sidebar for visibility (417ba64)
- remove underspecified UDF doc page (0eb0ac0)
3.1.0 (2022-07-26)
Features
- add
__getattr__
support toStructValue
(75bded1) - allow selection subclasses to define new node args (2a7dc41)
- api: accept
Schema
objects in publicibis.schema
(0daac6c) - api: add
.tables
accessor toBaseBackend
(7ad27f0) - api: add
e
function to public API (3a07e70) - api: add
ops.StructColumn
operation (020bfdb) - api: add cume_dist operation (6b6b185)
- api: add toplevel ibis.connect() (e13946b)
- api: handle literal timestamps with timezone embedded in string (1ae976b)
- api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
- api: make struct metadata more convenient to access (3fd9bd8)
- api: support tab completion for backends (eb75fc5)
- api: underscore convenience api (81716da)
- api: unnest (98ecb09)
- backends: allow column expressions from non-foreign tables on the right side of
isin
/notin
(e1374a4) - base-sql: implement trig and math functions (addb2c1)
- clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
- clickhouse: implement
ops.StructColumn
operation (0063007) - clickhouse: implement array collect (8b2577d)
- clickhouse: implement ArrayColumn (1301f18)
- clickhouse: implement bit aggs (f94a5d2)
- clickhouse: implement clip (12dfe50)
- clickhouse: implement covariance and correlation (a37c155)
- clickhouse: implement degrees (7946c0f)
- clickhouse: implement proper type serialization (80f4ab9)
- clickhouse: implement radians (c7b7f08)
- clickhouse: implement strftime (222f2b5)
- clickhouse: implement struct field access (fff69f3)
- clickhouse: implement trig and math functions (c56440a)
- clickhouse: support subsecond timestamp literals (e8698a6)
- compiler: restore
intersect_class
anddifference_class
overrides in base SQL backend (2c46a15) - dask: implement trig functions (e4086bb)
- dask: implement zeroifnull (38487db)
- datafusion: implement negate (69dd64d)
- datafusion: implement trig functions (16803e1)
- duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
- duckdb: enable find_in_set test (377023d)
- duckdb: enable group_concat test (4b9ad6c)
- duckdb: implement
ops.StructColumn
operation (211bfab) - duckdb: implement approx_count_distinct (03c89ad)
- duckdb: implement approx_median (894ce90)
- duckdb: implement arbitrary first and last aggregation (8a500bc)
- duckdb: implement NthValue (1bf2842)
- duckdb: implement strftime (aebc252)
- duckdb: return the
ir.Table
instance from DuckDB’sregister
API (0d05d41) - mysql: implement FindInSet (e55bbbf)
- mysql: implement StringToTimestamp (169250f)
- pandas: implement bitwise aggregations (37ff328)
- pandas: implement degrees (25b4f69)
- pandas: implement radians (6816b75)
- pandas: implement trig functions (1fd52d2)
- pandas: implement zeroifnull (48e8ed1)
- postgres/duckdb: implement covariance and correlation (464d3ef)
- postgres: implement ArrayColumn (7b0a506)
- pyspark: implement approx_count_distinct (1fe1d75)
- pyspark: implement approx_median (07571a9)
- pyspark: implement covariance and correlation (ae818fb)
- pyspark: implement degrees (f478c7c)
- pyspark: implement nth_value (abb559d)
- pyspark: implement nullifzero (640234b)
- pyspark: implement radians (18843c0)
- pyspark: implement trig functions (fd7621a)
- pyspark: implement Where (32b9abb)
- pyspark: implement xor (550b35b)
- pyspark: implement zeroifnull (db13241)
- pyspark: topk support (9344591)
- sqlalchemy: add degrees and radians (8b7415f)
- sqlalchemy: add xor translation rule (2921664)
- sqlalchemy: allow non-primitive arrays (4e02918)
- sqlalchemy: implement approx_count_distinct as count distinct (4e8bcab)
- sqlalchemy: implement clip (8c02639)
- sqlalchemy: implement trig functions (34c1514)
- sqlalchemy: implement Where (7424704)
- sqlalchemy: implement zeroifnull (4735e9a)
- sqlite: implement BitAnd, BitOr and BitXor (e478479)
- sqlite: implement cotangent (01e7ce7)
- sqlite: implement degrees and radians (2cf9c5e)
Bug Fixes
- api: bring back null datatype parsing (fc131a1)
- api: compute the type from both branches of
Where
expressions (b8f4120) - api: ensure that
Deferred
objects work in aggregations (bbb376c) - api: ensure that nulls can be cast to any type to allow caller promotion (fab4393)
- api: make ExistSubquery and NotExistsSubquery pure boolean operations (dd70024)
- backends: make execution transactional where possible (d1ea269)
- clickhouse: cast empty result dataframe (27ae68a)
- clickhouse: handle empty IN and NOT IN expressions (2c892eb)
- clickhouse: return null instead of empty string for group_concat when values are filtered out (b826b40)
- compiler: fix bool bool comparisons (1ac9a9e)
- dask/pandas: allow limit to be
None
(9f91d6b) - dask: aggregation with multi-key groupby fails on dask backend (4f8bc70)
- datafusion: handle predicates in aggregates (4725571)
- deps: update dependency datafusion to >=0.4,<0.7 (f5b244e)
- deps: update dependency duckdb to >=0.3.2,<0.5.0 (57ee818)
- deps: update dependency duckdb-engine to >=0.1.8,<0.3.0 (3e379a0)
- deps: update dependency geoalchemy2 to >=0.6.3,<0.13 (c04a533)
- deps: update dependency geopandas to >=0.6,<0.12 (b899c37)
- deps: update dependency Shapely to >=1.6,<1.8.3 (87a49ad)
- deps: update dependency toolz to >=0.11,<0.13 (258a641)
- don’t mask udf module in init.py (3e567ba)
- duckdb: ensure that paths with non-extension
.
chars are parsed correctly (9448fd3) - duckdb: fix struct datatype parsing (5124763)
- duckdb: force string_agg separator to be a constant (21cdf2f)
- duckdb: handle multiple dotted extensions; quote names; consolidate implementations (1494246)
- duckdb: remove timezone function invocation (33d38fc)
- geospatial: ensure that later versions of numpy are compatible with geospatial code (33f0afb)
- impala: a delimited table explicitly declare stored as textfile (04086a4), closes #4260
- impala: remove broken nth_value implementation (dbc9cc2)
- ir: don’t attempt fusion when projections aren’t exactly equivalent (3482ba2)
- mysql: cast mysql timestamp literals to ensure correct return type (8116e04)
- mysql: implement integer to timestamp using
from_unixtime
(1b43004) - pandas/dask: look at pre_execute for has_operation reporting (cb44efc)
- pandas: execute negate on bool as
not
(330ab4f) - pandas: fix struct inference from dict in the pandas backend (5886a9a)
- pandas: force backend options registration on trace.enable() calls (8818fe6)
- pandas: handle empty boolean column casting in Series conversion (f697e3e)
- pandas: handle struct columns with NA elements (9a7c510)
- pandas: handle the case of selection from a join when remapping overlapping column names (031c4c6)
- pandas: perform correct equality comparison (d62e7b9)
- postgres/duckdb: cast after milliseconds computation instead of after extraction (bdd1d65)
- pyspark: handle predicates in Aggregation (842c307)
- pyspark: prevent spark from trying to convert timezone of naive timestamps (dfb4127)
- pyspark: remove xpassing test for #2453 (c051e28)
- pyspark: specialize implementation of
has_operation
(5082346) - pyspark: use empty check for collect_list in GroupConcat rule (df66acb)
- repr: allow DestructValue selections to be formatted by fmt (4b45d87)
- repr: when formatting DestructValue selections, use struct field names as column names (d01fe42)
- sqlalchemy: fix parsing and construction of nested array types (e20bcc0)
- sqlalchemy: remove unused second argument when creating temporary views (8766b40)
- sqlite: register conversion to isoformat for
pandas.Timestamp
(fe95dca) - sqlite: test case with whitespace at the end of the line (7623ae9)
- sql: use isoformat for timestamp literals (70d0ba6)
- type-system: infer null datatype for empty sequence of expressions (f67d5f9)
- use bounded precision for decimal aggregations (596acfb)
Performance Improvements
Reverts
- ci: install sqlite3 on ubuntu (1f2705f)
3.0.2 (2022-04-28)
Bug Fixes
- docs: fix tempdir location for docs build (dcd1b22)
3.0.1 (2022-04-28)
Bug Fixes
- build: replace version before exec plugin runs (573139c)
3.0.0 (2022-04-25)
⚠ BREAKING CHANGES
- ir: The following are breaking changes due to simplifying expression internals
ibis.expr.datatypes.DataType.scalar_type
andDataType.column_type
factory methods have been removed,DataType.scalar
andDataType.column
class fields can be used to directly construct a corresponding expression instance (though prefer to useoperation.to_expr()
)ibis.expr.types.ValueExpr._name
andValueExpr._dtype
fields are not accessible anymore. While these were not supposed to used directly, nowValueExpr.has_name()
,ValueExpr.get_name()
andValueExpr.type()
methods are the only way to retrieve the expression’s name and datatype.ibis.expr.operations.Node.output_type
is a property now not a method, decorate those methods with@property
ibis.expr.operations.Value
subclasses must defineoutput_shape
andoutput_dtype
properties from now on (note the datatype abbreviationdtype
in the property name)ibis.expr.rules.cast()
,scalar_like()
andarray_like()
rules have been removed
- api: Replace
t["a"].distinct()
witht[["a"]].distinct()
. - deps: The sqlalchemy lower bound is now 1.4
- ir: Schema.names and Schema.types attributes now have tuple type rather than list
- expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn’t preserve insertion order. Now columns will appear in the order in which they were passed to
aggregate
ormutate
- api:
dt.float
is nowdt.float64
; usedt.float32
for the previous behavior. - ir: Relation-based
execute_node
dispatch rules must now accept tuples of expressions. - ir: removed ibis.expr.lineage.{roots,find_nodes} functions
- config: Use
ibis.options.graphviz_repr = True
to enable - hdfs: Use
fsspec
instead of HDFS from ibis - udf: Vectorized UDF coercion functions are no longer a public API.
- The minimum supported Python version is now Python 3.8
- config:
register_option
is no longer supported, please submit option requests upstream - backends: Read tables with pandas.read_hdf and use the pandas backend
- The CSV backend is removed. Use Datafusion for CSV execution.
- backends: Use the datafusion backend to read parquet files
Expr() -> Expr.pipe()
- coercion functions previously in expr/schema.py are now in udf/vectorized.py
- api:
materialize
is removed. Joins with overlapping columns now have suffixes. - kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
- Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.
Features
- add repr_html for expressions to print as tables in ipython (cd6fa4e)
- add duckdb backend (667f2d5)
- allow construction of decimal literals (3d9e865)
- api: add
ibis.asc
expression (efe177e), closes #1454 - api: add has_operation API to the backend (4fab014)
- api: implement type for SortExpr (ab19bd6)
- clickhouse: implement string concat for clickhouse (1767205)
- clickhouse: implement StrRight operation (67749a0)
- clickhouse: implement table union (e0008d7)
- clickhouse: implement trim, pad and string predicates (a5b7293)
- datafusion: implement Count operation (4797a86)
- datatypes: unbounded decimal type (f7e6f65)
- date: add ibis.date(y,m,d) functionality (26892b6), closes #386
- duckdb/postgres/mysql/pyspark: implement
.sql
on tables for mixing sql and expressions (00e8087) - duckdb: add functionality needed to pass integer to interval test (e2119e8)
- duckdb: implement _get_schema_using_query (93cd730)
- duckdb: implement now() function (6924f50)
- duckdb: implement regexp replace and extract (18d16a7)
- implement
force
argument in sqlalchemy backend base class (9df7f1b) - implement coalesce for the pyspark backend (8183efe)
- implement semi/anti join for the pandas backend (cb36fc5)
- implement semi/anti join for the pyspark backend (3e1ba9c)
- implement the remaining clickhouse joins (b3aa1f0)
- ir: rewrite and speed up expression repr (45ce9b2)
- mysql: implement _get_schema_from_query (456cd44)
- mysql: move string join impl up to alchemy for mysql (77a8eb9)
- postgres: implement _get_schema_using_query (f2459eb)
- pyspark: implement Distinct for pyspark (4306ad9)
- pyspark: implement log base b for pyspark (527af3c)
- pyspark: implement percent_rank and enable testing (c051617)
- repr: add interval info to interval repr (df26231)
- sqlalchemy: implement ilike (43996c0)
- sqlite: implement date_truncate (3ce4f2a)
- sqlite: implement ISO week of year (714ff7b)
- sqlite: implement string join and concat (6f5f353)
- support of arrays and tuples for clickhouse (db512a8)
- ver: dynamic version identifiers (408f862)
Bug Fixes
- added wheel to pyproject toml for venv users (b0b8e5c)
- allow major version changes in CalVer dependencies (9c3fbe5)
- annotable: allow optional arguments at any position (778995f), closes #3730
- api: add ibis.map and .struct (327b342), closes #3118
- api: map string multiplication with integer to repeat method (b205922)
- api: thread suffixes parameter to individual join methods (31a9aff)
- change TimestampType to Timestamp (e0750be)
- clickhouse: disconnect from clickhouse when computing version (11cbf08)
- clickhouse: use a context manager for execution (a471225)
- combine windows during windowization (7fdd851)
- conform epoch_seconds impls to expression return type (18a70f1)
- context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
- dask: fix asof joins for newer version of dask (50711cc)
- dask: workaround dask bug (a0f3bd9)
- deps: update dependency atpublic to v3 (3fe8f0d)
- deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
- deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
- deps: update dependency graphviz to >=0.16,<0.21 (3014445)
- duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
- duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
- duckdb: fix log with base b impl (4920097)
- duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
- enforce the schema’s column names in
apply_to
(b0f334d) - expose ops.IfNull for mysql backend (156c2bd)
- expr: add more binary operators to char list and implement fallback (b88184c)
- expr: fix formatting of table info using tabulate (b110636)
- fix float vs real data type detection in sqlalchemy (24e6774)
- fix list_schemas argument (69c1abf)
- fix postgres udfs and re-enable ci tests (7d480d2)
- fix tablecolumn execution for filter following join (064595b)
- format: remove some newlines from formatted expr repr (ed4fa78)
- histogram: cross_join needs onclause=True (5d36a58), closes #622
- ibis.expr.signature.Parameter is not pickleable (828fd54)
- implement coalesce properly in the pandas backend (aca5312)
- implement count on tables for pyspark (7fe5573), closes #2879
- infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
- mutate: do not lift table column that results from mutate (ba4e5e5)
- pandas: disable range windows with order by (e016664)
- pandas: don’t reassign the same column to silence SettingWithCopyWarning warning (75dc616)
- pandas: implement percent_rank correctly (d8b83e7)
- prevent unintentional cross joins in mutate + filter (83eef99)
- pyspark: fix range windows (a6f2aa8)
- regression in Selection.sort_by with resolved_keys (c7a69cd)
- regression in sort_by with resolved_keys (63f1382), closes #3619
- remove broken csv pre_execute (93b662a)
- remove importorskip call for backend tests (2f0bcd8)
- remove incorrect fix for pandas regression (339f544)
- remove passing schema into register_parquet (bdcbb08)
- repr: add ops.TimeAdd to repr binop lookup table (fd94275)
- repr: allow ops.TableNode in fmt_value (6f57003)
- reverse the predicate pushdown substitution (f3cd358)
- sort_index to satisfy pandas 1.4.x (6bac0fc)
- sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
- sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
- sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
- sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
- sql: walk right join trees and substitute joins with right-side joins with views (0231592)
- store schema on the pandas backend to allow correct inference (35070be)
Performance Improvements
- datatypes: speed up str and hash (262d3d7)
- fast path for simple column selection (d178498)
- ir: global equality cache (13c2bb2)
- ir: introduce CachedEqMixin to speed up equality checks (b633925)
- repr: remove full tree repr from rule validator error message (65885ab)
- speed up attribute access (89d1c05)
- use assign instead of concat in projections when possible (985c242)
Miscellaneous Chores
Code Refactoring
- api: make primitive types more cohesive (71da8f7)
- api: remove distinct ColumnExpr API (3f48cb8)
- api: remove materialize (24285c1)
- backends: remove the hdf5 backend (ff34f3e)
- backends: remove the parquet backend (b510473)
- config: disable graphviz-repr-in-notebook by default (214ad4e)
- config: remove old config code and port to pydantic (4bb96d1)
- dt.UUID inherits from DataType, not String (2ba540d)
- expr: preserve column ordering in aggregations/mutations (668be0f)
- hdfs: replace HDFS with
fsspec
(cc6eddb) - ir: make Annotable immutable (1f2b3fa)
- ir: make schema annotable (b980903)
- ir: remove unused lineage
roots
andfind_nodes
functions (d630a77) - ir: simplify expressions by not storing dtype and name (e929f85)
- kudu: remove support for use of kudu through kudu-python (36bd97f)
- move coercion functions from schema.py to udf (58eea56), closes #3033
- remove blanket call for Expr (3a71116), closes #2258
- remove the csv backend (0e3e02e)
- udf: make coerce functions in ibis.udf.vectorized private (9ba4392)
2.1.1 (2022-01-12)
Bug Fixes
- setup.py: set the correct version number for 2.1.0 (f3d267b)
2.1.0 (2022-01-12)
Bug Fixes
- consider all packages’ entry points (b495cf6)
- datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)
- deps: bump minimum dask version to 2021.10.0 (e6b5c09)
- deps: constrain numpy to ensure wheels are used on windows (70c308b)
- deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)
- deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)
- deps: update dependency pyarrow to v6 (#3092) (61e52b5)
- don’t force backends to override do_connect until 3.0.0 (4b46973)
- execute materialized joins in the pandas and dask backends (#3086) (9ed937a)
- literal: allow creating ibis literal with uuid (#3131) (b0f4f44)
- restore the ability to have more than two option levels (#3151) (fb4a944)
- sqlalchemy: fix correlated subquery compilation (43b9010)
- sqlite: defer db connection until needed (#3127) (5467afa), closes #64
Features
- allow column_of to take a column expression (dbc34bb)
- ci: More readable workflow job titles (#3111) (d8fd7d9)
- datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627
- datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627
- make dayofweek impls conform to pandas semantics (#3161) (9297828)
Reverts
- “ci: install gdal for fiona” (8503361)
2.0.0 (2021-10-06)
Features
- Serialization-deserialization of Node via pickle is now byte compatible between different processes (#2938)
- Support joining on different columns in ClickHouse backend (#2916)
- Support summarization of empty data in pandas backend (#2908)
- Unify implementation of fillna and isna in Pyspark backend (#2882)
- Support binary operation with Timedelta in Pyspark backend (#2873)
- Add
group_concat
operation for Clickhouse backend (#2839) - Support comparison of ColumnExpr to timestamp literal (#2808)
- Make op schema a cached property (#2805)
- Implement
.insert()
for SQLAlchemy backends (#2613, #2613) - Infer categorical and decimal Series to more specific Ibis types in pandas backend (#2792)
- Add
startswith
andendswith
operations (#2790) - Allow more flexible return type for UDFs (#2776, #2797)
- Implement Clip in the Pyspark backend (#2779)
- Use
ndarray
as array representation in pandas backend (#2753) - Support Spark filter with window operation (#2687)
- Support context adjustment for udfs for pandas backend (#2646)
- Add
auth_local_webserver
,auth_external_data
, andauth_cache
parameters to BigQuery connect method. Setauth_local_webserver
to use a local server instead of copy-pasting an authorization code. Setauth_external_data
to true to request additional scopes required to query Google Drive and Sheets. Setauth_cache
toreauth
ornone
to force reauthentication. (#2655) - Add
bit_and
,bit_or
, andbit_xor
integer column aggregates (BigQuery and MySQL backends) (#2641) - Backends are defined as entry points (#2379)
- Add
ibis.array
for creating array expressions (#2615) - Implement Not operation in PySpark backend (#2607)
- Added support for case/when in PySpark backend (#2610)
- Add support for np.array as literals for backends that already support lists as literals (#2603)
Bugs
- Fix data races in impala connection pool accounting (#2991)
- Fix null literal compilation in the Clickhouse backend (#2985)
- Fix order of limit and offset parameters in the Clickhouse backend (#2984)
- Replace
equals
operation for geospatial datatype togeo_equals
(#2956) - Fix .drop(fields). The argument can now be either a list of strings or a string. (#2829)
- Fix projection on differences and intersections for SQL backends (#2845)
- Backends are loaded in a lazy way, so third-party backends can import Ibis without circular imports (#2827)
- Disable aggregation optimization due to N squared performance (#2830)
- Fix
.cast()
to array outputting list instead of np.array in pandas backend (#2821) - Fix aggregation with mixed reduction datatypes (array + scalar) on Dask backend (#2820)
- Fix error when using reduction UDF that returns np.array in a grouped aggregation (#2770)
- Fix time context trimming error for multi column udfs in pandas backend (#2712)
- Fix error during compilation of range_window in base_sql backends (:issue:
2608
) (#2710) - Fix wrong row indexing in the result for ‘window after filter’ for timecontext adjustment (#2696)
- Fix
aggregate
exploding the output of Reduction ops that return a list/ndarray (#2702) - Fix issues with context adjustment for filter with PySpark backend (#2693)
- Add temporary struct col in pyspark backend to ensure that UDFs are executed only once (#2657)
- Fix BigQuery connect bug that ignored project ID parameter (#2588)
- Fix overwrite logic to account for DestructColumn inside mutate API (#2636)
- Fix fusion optimization bug that incorrectly changes operation order (#2635)
- Fixes a NPE issue with substr in PySpark backend (#2610)
- Fixes binary data type translation into BigQuery bytes data type (#2354)
- Make StructValue picklable (#2577)
Support
- Improvement of the backend API. The former
Client
subclasses have been replaced by aBackend
class that must subclassibis.backends.base.BaseBackend
. TheBaseBackend
class contains abstract methods for the minimum subset of methods that backends must implement, and their signatures have been standardized across backends. The Ibis compiler has been refactored, and backends don’t need to implement all compiler classes anymore if the default works for them. Only a subclass ofibis.backends.base.sql.compiler.Compiler
is now required. Backends now need to register themselves as entry points. (#2678) - Deprecate
exists_table(table)
in favor oftable in list_tables()
(#2905) - Remove handwritten type parser; parsing errors that were previously
IbisTypeError
are nowparsy.ParseError
.parsy
is now a hard requirement. (#2977) - Methods
current_database
andlist_databases
raise an exception for backends that do not support databases (#2962) - Method
set_database
has been deprecated, in favor of creating a new connection to a different database (#2913) - Removed
log
method of clients, in favor ofverbose_log
option (#2914) - Output of
Client.version
returned as a string, instead of a setuptoolsVersion
(#2883) - Deprecated
list_schemas
in SQLAlchemy backends in favor oflist_databases
(#2862) - Deprecated
ibis.<backend>.verify()
in favor of capturing exception inibis.<backend>.compile()
(#2865) - Simplification of data fetching. Backends don’t need to implement
Query
anymore (#2789) - Move BigQuery backend to a
separate repository <https://github.com/ibis-project/ibis-bigquery>
_. The backend will be released separately, usepip install ibis-bigquery
orconda install ibis-bigquery
to install it, and then use as before. (#2665) - Supporting SQLAlchemy 1.4, and requiring minimum 1.3 (#2689)
- Namespace time_col config, fix type check for trim_with_timecontext for pandas window execution (#2680)
- Remove deprecated
ibis.HDFS
,ibis.WebHDFS
andibis.hdfs_connect
(#2505)
1.4.0 (2020-11-07)
Features
- Add Struct.from_dict (#2514)
- Add hash and hashbytes support for BigQuery backend (#2310)
- Support reduction UDF without groupby to return multiple columns for pandas backend (#2511)
- Support analytic and reduction UDF to return multiple columns for pandas backend (#2487)
- Support elementwise UDF to return multiple columns for pandas and PySpark backend (#2473)
- FEAT: Support Ibis interval for window in pyspark backend (#2409)
- Use Scope class for scope in pyspark backend (#2402)
- Add PySpark support for ReductionVectorizedUDF (#2366)
- Add time context in
scope
in execution for pandas backend (#2306) - Add
start_point
andend_point
to PostGIS backend. (#2081) - Add set difference to general ibis api (#2347)
- Add
rowid
expression, supported by SQLite and OmniSciDB (#2251) - Add intersection to general ibis api (#2230)
- Add
application_name
argument toibis.bigquery.connect
to allow attributing Google API requests to projects that use Ibis. (#2303) - Add support for casting category dtype in pandas backend (#2285)
- Add support for Union in the PySpark backend (#2270)
- Add support for implementign custom window object for pandas backend (#2260)
- Implement two level dispatcher for execute_node (#2246)
- Add ibis.pandas.trace module to log time and call stack information. (#2233)
- Validate that the output type of a UDF is a single element (#2198)
- ZeroIfNull and NullIfZero implementation for OmniSciDB (#2186)
- IsNan implementation for OmniSciDB (#2093)
- [OmnisciDB] Support add_columns and drop_columns for OmnisciDB table (#2094)
- Create ExtractQuarter operation and add its support to Clickhouse, CSV, Impala, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2175)
- Add translation rules for isnull() and notnull() for pyspark backend (#2126)
- Add window operations support to SQLite (#2232)
- Implement read_csv for omniscidb backend (#2062)
- [OmniSciDB] Add support to week extraction (#2171)
- Date, DateDiff and TimestampDiff implementations for OmniSciDB (#2097)
- Create ExtractWeekOfYear operation and add its support to Clickhouse, CSV, MySQL, pandas, Parquet, PostgreSQL, PySpark and Spark (#2177)
- Add initial support for ibis.random function (#2060)
- Added epoch_seconds extraction operation to Clickhouse, CSV, Impala, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite, Spark and BigQuery :issue:
2273
(#2178) - [OmniSciDB] Add “method” parameter to load_data (#2165)
- Add non-nullable info to schema output (#2117)
- fillna and nullif implementations for OmnisciDB (#2083)
- Add load_data to sqlalchemy’s backends and fix database parameter for load/create/drop when database parameter is the same than the current database (#1981)
- [OmniSciDB] Add support for within, d_fully_within and point (#2125)
- OmniSciDB - Refactor DDL and Client; Add temporary parameter to create_table and “force” parameter to drop_view (#2086)
- Create ExtractDayOfYear operation and add its support to Clickhouse, CSV, MySQL, OmniSciDB, pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2173)
- Implementations of Log Log2 Log10 for OmniSciDB backend (#2095)
Bugs
- Table expressions do not recognize inet datatype (Postgres backend) (#2462)
- Table expressions do not recognize macaddr datatype (Postgres backend) (#2461)
- Fix
aggcontext.Summarize
not always producing scalar (pandas backend) (#2410) - Fix same window op with different window size on table lead to incorrect results for pyspark backend (#2414)
- Fix same column with multiple aliases not showing properly in repr (#2229)
- Fix reduction UDFs over ungrouped, bounded windows on pandas backend (#2395)
- FEAT: Support rolling window UDF with non numeric inputs for pandas backend. (#2386)
- Fix scope get to use hashmap lookup instead of list lookup (#2386)
- Fix equality behavior for Literal ops (#2387)
- Fix analytic ops over ungrouped and unordered windows on pandas backend (#2376)
- Fix the covariance operator in the BigQuery backend. (#2367)
- Update impala kerberos dependencies (#2342)
- Added verbose logging to SQL backends (#1320)
- Fix issue with sql_validate call to OmnisciDB. (#2256)
- Add missing float types to pandas backend (#2237)
- Allow group_by and order_by as window operation input in pandas backend (#2252)
- Fix PySpark compiler error when elementwise UDF output_type is Decimal or Timestamp (#2223)
- Fix interactive mode returning a expression instead of the value when used in Jupyter (#2157)
- Fix PySpark error when doing alias after selection (#2127)
- Fix millisecond issue for OmniSciDB :issue:
2167
, MySQL :issue:2169
, PostgreSQL :issue:2166
, pandas :issue:2168
, BigQuery :issue:2273
backends (#2170) - [OmniSciDB] Fix TopK when used as filter (#2134)
Support
- Move
ibis.HDFS
,ibis.WebHDFS
andibis.hdfs_connect
toibis.impala.*
(#2497) - Drop support to Python 3.6 (#2288)
- Simplifying tests directories structure (#2351)
- Update
google-cloud-bigquery
dependency minimum version to 1.12.0 (#2304) - Remove “experimental” mentions for OmniSciDB and pandas backends (#2234)
- Use an OmniSciDB image stable on CI (#2244)
- Added fragment_size to table creation for OmniSciDB (#2107)
- Added round() support for OmniSciDB (#2096)
- Enabled cumulative ops support for OmniSciDB (#2113)
1.3.0 (2020-02-27)
Features
- Improve many arguments UDF performance in pandas backend. (#2071)
- Add DenseRank, RowNumber, MinRank, Count, PercentRank/CumeDist window operations to OmniSciDB (#1976)
- Introduce a top level vectorized UDF module (experimental). Implement element-wise UDF for pandas and PySpark backend. (#2047)
- Add support for multi arguments window UDAF for the pandas backend (#2035)
- Clean up window translation logic in pyspark backend (#2004)
- Add docstring check to CI for an initial subset files (#1996)
- Pyspark backend bounded windows (#2001)
- Add more POSTGIS operations (#1987)
- SQLAlchemy Default precision and scale to decimal types for PostgreSQL and MySQL (#1969)
- Add support for array operations in PySpark backend (#1983)
- Implement sort, if_null, null_if and notin for PySpark backend (#1978)
- Add support for date/time operations in PySpark backend (#1974)
- Add support for params, query_schema, and sql in PySpark backend (#1973)
- Implement join for PySpark backend (#1967)
- Validate AsOfJoin tolerance and attempt interval unit conversion (#1952)
- filter for PySpark backend (#1943)
- window operations for pyspark backend (#1945)
- Implement IntervalSub for pandas backend (#1951)
- PySpark backend string and column ops (#1942)
- PySpark backend (#1913)
- DDL support for Spark backend (#1908)
- Support timezone aware arrow timestamps (#1923)
- Add shapely geometries as input for literals (#1860)
- Add geopandas as output for omniscidb (#1858)
- Spark UDFs (#1885)
- Add support for Postgres UDFs (#1871)
- Spark tests (#1830)
- Spark client (#1807)
- Use pandas rolling apply to implement rows_with_max_lookback (#1868)
Bugs
- Pin “clickhouse-driver” to “>=0.1.3” (#2089)
- Fix load data stage for Linux CI (#2069)
- Fix datamgr.py fail if IBIS_TEST_OMNISCIDB_DATABASE=omnisci (#2057)
- Change pymapd connection parameter from “session_id” to “sessionid” (#2041)
- Fix pandas backend to treat trailing_window preceding arg as window bound rather than window size (e.g. preceding=0 now indicates current row rather than window size 0) (#2009)
- Fix handling of Array types in Postgres UDF (#2015)
- Fix pydocstyle config (#2010)
- Pinning clickhouse-driver<0.1.2 (#2006)
- Fix CI log for database (#1984)
- Fixes explain operation (#1933)
- Fix incorrect assumptions about attached SQLite databases (#1937)
- Upgrade to JDK11 (#1938)
sql
method doesn’t work when the query uses LIMIT clause (#1903)- Fix union implementation (#1910)
- Fix failing com imports on master (#1912)
- OmniSci/MapD - Fix reduction for bool (#1901)
- Pass scope to grouping execution in the pandas backend (#1899)
- Fix various Spark backend issues (#1888)
- Make Nodes enforce the proper signature (#1891)
- Fix according to bug in pd.to_datetime when passing the unit flag (#1893)
- Fix small formatting buglet in PR merge tool (#1883)
- Fix the case where we do not have an index when using preceding with intervals (#1876)
- Fixed issues with geo data (#1872)
- Remove -x from pytest call in linux CI (#1869)
- Fix return type of Struct.from_tuples (#1867)
Support
- Add support to Python 3.8 (#2066)
- Pin back version of isort (#2079)
- Use user-defined port variables for Omnisci and PostgreSQL tests (#2082)
- Change omniscidb image tag from v5.0.0 to v5.1.0 on docker-compose recipe (#2077)
- [Omnisci] The same SRIDs for test_geo_spatial_binops (#2051)
- Unpin rtree version (#2078)
- Link pandas issues with xfail tests in pandas/tests/test_udf.py (#2074)
- Disable Postgres tests on Windows CI. (#2075)
- use conda for installation black and isort tools (#2068)
- CI: Fix CI builds related to new pandas 1.0 compatibility (#2061)
- Fix data map for int8 on OmniSciDB backend (#2056)
- Add possibility to run tests for separate backend via
make test BACKENDS=[YOUR BACKEND]
(#2052) - Fix “cudf” import on OmniSciDB backend (#2055)
- CI: Drop table only if it exists (OmniSciDB) (#2050)
- Add initial documentation for OmniSciDB, MySQL, PySpark and SparkSQL backends, add initial documentation for geospatial methods and add links to Ibis wiki page (#2034)
- Implement covariance for bigquery backend (#2044)
- Add Spark to supported backends list (#2046)
- Ping dependency of rtree to fix CI failure (#2043)
- Drop support for Python 3.5 (#2037)
- HTML escape column names and types in png repr. (#2023)
- Add geospatial tutorial notebook (#1991)
- Change omniscidb image tag from v4.7.0 to v5.0.0 on docker-compose recipe (#2031)
- Pin “semantic_version” to “<2.7” in the docs build CI, fix “builddoc” and “doc” section inside “Makefile” and skip mysql tzinfo on CI to allow to run MySQL using docker container on a hard disk drive. (#2030)
- Fixed impala start up issues (#2012)
- cache all ops in translate() (#1999)
- Add black step to CI (#1988)
- Json UUID any (#1962)
- Add log for database services (#1982)
- Fix BigQuery backend fixture so batting and awards_players fixture re… (#1972)
- Disable BigQuery explicitly in all/test_join.py (#1971)
- Re-formatting all files using pre-commit hook (#1963)
- Disable codecov report upload during CI builds (#1961)
- Developer doc enhancements (#1960)
- Missing geospatial ops for OmniSciDB (#1958)
- Remove pandas deprecation warnings (#1950)
- Add developer docs to get docker setup (#1948)
- More informative IntegrityError on duplicate columns (#1949)
- Improve geospatial literals and smoke tests (#1928)
- PostGIS enhancements (#1925)
- Rename mapd to omniscidb backend (#1866)
- Fix failing BigQuery tests (#1926)
- Added missing null literal op (#1917)
- Update link to Presto website (#1895)
- Removing linting from windows (#1896)
- Fix link to NUMFOCUS CoC (#1884)
- Added CoC section (#1882)
- Remove pandas exception for rows_with_max_lookback (#1859)
- Move CI pipelines to Azure (#1856)
1.2.0 (2019-06-24)
Features
Bugs
Support
- Skip SQLAlchemy backend tests in connect method in backends.py (#1847)
- Validate order_by when using rows_with_max_lookback window (#1848)
- Generate release notes from commits (#1845)
- Raise exception on backends where rows_with_max_lookback can’t be implemented (#1844)
- Tighter version spec for pytest (#1840)
- Allow passing a branch to ci/feedstock.py (#1826)
1.1.0 (2019-06-09)
Features
- Conslidate trailing window functions (#1809)
- Call to_interval when casting integers to intervals (#1766)
- Add session feature to mapd client API (#1796)
- Add min periods parameter to Window (#1792)
- Allow strings for types in pandas UDFs (#1785)
- Add missing date operations and struct field operation for the pandas backend (#1790)
- Add window operations to the OmniSci backend (#1771)
- Reimplement the pandas backend using topological sort (#1758)
- Add marker for xfailing specific backends (#1778)
- Enable window function tests where possible (#1777)
- is_computable_arg dispatcher (#1743)
- Added float32 and geospatial types for create table from schema (#1753)
Bugs
- Fix group_concat test and implementations (#1819)
- Fix failing strftime tests on Python 3.7 (#1818)
- Remove unnecessary (and erroneous in some cases) frame clauses (#1757)
- Chained mutate operations are buggy (#1799)
- Allow projections from joins to attempt fusion (#1783)
- Fix Python 3.5 dependency versions (#1798)
- Fix compatibility and bugs associated with pandas toposort reimplementation (#1789)
- Fix outer_join generating LEFT join instead of FULL OUTER (#1772)
- NullIf should enforce that its arguments are castable to a common type (#1782)
- Fix conda create command in documentation (#1775)
- Fix preceding and following with
None
(#1765) - PostgreSQL interval type not recognized (#1661)
Support
- Remove decorator hacks and add custom markers (#1820)
- Add development deps to setup.py (#1814)
- Fix design and developer docs (#1805)
- Pin sphinx version to 2.0.1 (#1810)
- Add pep8speaks integration (#1793)
- Fix typo in UDF signature specification (#1821)
- Clean up most xpassing tests (#1779)
- Update omnisci container version (#1781)
- Constrain PyMapD version to get passing builds (#1776)
- Remove warnings and clean up some docstrings (#1763)
- Add StringToTimestamp as unsupported (#1638)
- Add isort pre-commit hooks (#1759)
- Add Python 3.5 testing back to CI (#1750)
- Re-enable CI for building step (#1700)
- Update README reference to MapD to say OmniSci (#1749)
1.0.0 (2019-03-26)
Features
- Add black as a pre-commit hook (#1735)
- Add support for the arbitrary aggregate in the mapd backend (#1680)
- Add SQL method for the MapD backend (#1731)
- Clean up merge PR script and use the actual merge feature of GitHub (#1744)
- Add cross join to the pandas backend (#1723)
- Implement default handler for multiple client
pre_execute
(#1727) - Implement BigQuery auth using
pydata_google_auth
(#1728) - Timestamp literal accepts a timezone parameter (#1712)
- Remove support for passing integers to
ibis.timestamp
(#1725) - Add
find_nodes
to lineage (#1704) - Remove a bunch of deprecated APIs and clean up warnings (#1714)
- Implement table distinct for the pandas backend (#1716)
- Implement geospatial functions for MapD (#1678)
- Implement geospatial types for MapD (#1666)
- Add pre commit hook (#1685)
- Getting started with mapd, mysql and pandas (#1686)
- Support column names with special characters in mapd (#1675)
- Allow operations to hide arguments from display (#1669)
- Remove implicit ordering requirements in the PostgreSQL backend (#1636)
- Add cross join operator to MapD (#1655)
- Fix UDF bugs and add support for non-aggregate analytic functions (#1637)
- Support string slicing with other expressions (#1627)
- Publish the ibis roadmap (#1618)
- Implement
approx_median
in BigQuery (#1604) - Make ibis node instances hashable (#1611)
- Add
range_window
andtrailing_range_window
to docs (#1608)
Bugs
- Make
dev/merge-pr.py
script handle PR branches (#1745) - Fix
NULLIF
implementation for the pandas backend (#1742) - Fix casting to float in the MapD backend (#1737)
- Fix testing for BigQuery after auth flow update (#1741)
- Fix skipping for new BigQuery auth flow (#1738)
- Fix bug in
TableExpr.drop
(#1732) - Filter the
raw
warning from newer pandas to support older pandas (#1729) - Fix BigQuery credentials link (#1706)
- Add Union as an unsuppoted operation for MapD (#1639)
- Fix visualizing an ibis expression when showing a selection after a table join (#1705)
- Fix MapD exception for
toDateTime
(#1659) - Use
==
to compare strings (#1701) - Resolves joining with different column names (#1647)
- Fix map get with compatible types (#1643)
- Fixed where operator for MapD (#1653)
- Remove parameters from mapd (#1648)
- Make sure we cast when NULL is else in CASE expressions (#1651)
- Fix equality (#1600)
Support
- Do not build universal wheels (#1748)
- Remove tag prefix from versioneer (#1747)
- Use releases to manage documentation (#1746)
- Use cudf instead of pygdf (#1694)
- Fix multiple CI issues (#1696)
- Update mapd ci to v4.4.1 (#1681)
- Enabled mysql CI on azure pipelines (#1672)
- Remove support for Python 2 (#1670)
- Fix flake8 and many other warnings (#1667)
- Update README.md for impala and kudu (#1664)
- Remove defaults as a channel from azure pipelines (#1660)
- Fixes a very typo in the pandas/core.py docstring (#1658)
- Unpin clickhouse-driver version (#1657)
- Add test for reduction returning lists (#1650)
- Fix Azure VM image name (#1646)
- Updated MapD server-CI (#1641)
- Add TableExpr.drop to API documentation (#1645)
- Fix Azure deployment step (#1642)
- Set up CI with Azure Pipelines (#1640)
- Fix conda builds (#1609)
0.14 (2018-08-23)
This release brings refactored, more composable core components and rule system to ibis. We also focused quite heavily on the BigQuery backend this release.
New Features
- Allow keyword arguments in Node subclasses (#968)
- Splat args into Node subclasses instead of requiring a list (#969)
- Add support for
UNION
in the BigQuery backend (#1408, #1409) - Support for writing UDFs in BigQuery (#1377). See the BigQuery UDF docs for more details.
- Support for cross-project expressions in the BigQuery backend. (#1427, #1428)
- Add
strftime
andto_timestamp
support for BigQuery (#1422, #1410) - Require
google-cloud-bigquery >=1.0
(#1424) - Limited support for interval arithmetic in the pandas backend (#1407)
- Support for subclassing
TableExpr
(#1439) - Fill out pandas backend operations (#1423)
- Add common DDL APIs to the pandas backend (#1464)
- Implement the
sql
method for BigQuery (#1463) - Add
to_timestamp
for BigQuery (#1455) - Add the
mapd
backend (#1419) - Implement range windows (#1349)
- Support for map types in the pandas backend (#1498)
- Add
mean
andsum
forboolean
types in BigQuery (#1516) - All recent versions of SQLAlchemy are now supported (#1384)
- Add support for
NUMERIC
types in the BigQuery backend (#1534) - Speed up grouped and rolling operations in the pandas backend (#1549)
- Implement
TimestampNow
for BigQuery and pandas (#1575)
Bug Fixes
- Nullable property is now propagated through value types (#1289)
- Implicit casting between signed and unsigned integers checks boundaries
- Fix precedence of case statement (#1412)
- Fix handling of large timestamps (#1440)
- Fix
identical_to
precedence (#1458) - pandas 0.23 compatibility (#1458)
- Preserve timezones in timestamp-typed literals (#1459)
- Fix incorrect topological ordering of
UNION
expressions (#1501) - Fix projection fusion bug when attempting to fuse columns of the same name (#1496)
- Fix output type for some decimal operations (#1541)
API Changes
- The previous, private rules API has been rewritten (#1366)
- Defining input arguments for operations happens in a more readable fashion instead of the previous input_type list.
- Removed support for async query execution (only Impala supported)
- Remove support for Python 3.4 (#1326)
- BigQuery division defaults to using
IEEE_DIVIDE
(#1390) - Add
tolerance
parameter toasof_join
(#1443)
0.13 (2018-03-30)
This release brings new backends, including support for executing against files, MySQL, pandas user defined scalar and aggregations along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Backends
New Features
- Support for Unsigned Integer Types (#1194)
- Support for Interval types and expressions with support for execution on the Impala and Clickhouse backends (#1243)
- Isnan, isinf operations for float and double values (#1261)
- Support for an interval with a quarter period (#1259)
ibis.pandas.from_dataframe
convenience function (#1155)- Remove the restriction on
ROW_NUMBER()
requiring it to have anORDER BY
clause (#1371) - Add
.get()
operation on a Map type (#1376) - Allow visualization of custom defined expressions
- Add experimental support for pandas UDFs/UDAFs (#1277)
- Functions can be used as groupby keys (#1214, #1215)
- Generalize the use of the
where
parameter to reduction operations (#1220) - Support for interval operations thanks to @kszucs (#1243, #1260, #1249)
- Support for the
PARTITIONTIME
column in the BigQuery backend (#1322) - Add
arbitrary()
method for selecting the first non null value in a column (#1230, #1309) - Windowed
MultiQuantile
operation in the pandas backend thanks to @DiegoAlbertoTorres (#1343) - Rules for validating table expressions thanks to @DiegoAlbertoTorres (#1298)
- Complete end-to-end testing framework for all supported backends (#1256)
contains
/not contains
now supported in the pandas backend (#1210, #1211)- CI builds are now reproducible locally thanks to @kszucs (#1121, #1237, #1255, #1311)
isnan
/isinf
operations thanks to @kszucs (#1261)- Framework for generalized dtype and schema inference, and implicit casting thanks to @kszucs (#1221, #1269)
- Generic utilities for expression traversal thanks to @kszucs (#1336)
day_of_week
API (#306, #1047)- Design documentation for ibis (#1351)
Bug Fixes
- Unbound parameters were failing in the simple case of a
ibis.expr.types.TableExpr.mutate
call with no operation (#1378) - Fix parameterized subqueries (#1300, #1331, #1303, #1378)
- Fix subquery extraction, which wasn’t happening in topological order (#1342)
- Fix parenthesization if
isnull
(#1307) - Calling drop after mutate did not work (#1296, #1299)
- SQLAlchemy backends were missing an implementation of
ibis.expr.operations.NotContains
. - Support
REGEX_EXTRACT
in PostgreSQL 10 (#1276, #1278)
API Changes
- Fixing #1378 required the removal of the
name
parameter to theibis.param
function. Use theibis.expr.types.Expr.name
method instead.
0.12 (2017-10-28)
This release brings Clickhouse and BigQuery SQL support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Backends
New Features
- Add support for
Binary
data type (#1183) - Allow users of the BigQuery client to define their own API proxy classes (#1188)
- Add support for HAVING in the pandas backend (#1182)
- Add struct field tab completion (#1178)
- Add expressions for Map/Struct types and columns (#1166)
- Support Table.asof_join (#1162)
- Allow right side of arithmetic operations to take over (#1150)
- Add a data_preload step in pandas backend (#1142)
- expressions in join predicates in the pandas backend (#1138)
- Scalar parameters (#1075)
- Limited window function support for pandas (#1083)
- Implement Time datatype (#1105)
- Implement array ops for pandas (#1100)
- support for passing multiple quantiles in
.quantile()
(#1094) - support for clip and quantile ops on DoubleColumns (#1090)
- Enable unary math operations for pandas, sqlite (#1071)
- Enable casting from strings to temporal types (#1076)
- Allow selection of whole tables in pandas joins (#1072)
- Implement comparison for string vs date and timestamp types (#1065)
- Implement isnull and notnull for pandas (#1066)
- Allow like operation to accept a list of conditions to match (#1061)
- Add a pre_execute step in pandas backend (#1189)
Bug Fixes
- Remove global expression caching to ensure repeatable code generation (#1179, #1181)
- Fix
ORDER BY
generation without aGROUP BY
(#1180, #1181) - Ensure that
~ibis.expr.datatypes.DataType
and subclasses hash properly (#1172) - Ensure that the pandas backend can deal with unary operations in groupby
- (#1182)
- Incorrect impala code generated for NOT with complex argument (#1176)
- BUG/CLN: Fix predicates on Selections on Joins (#1149)
- Don't use SET LOCAL to allow redshift to work (#1163)
- Allow empty arrays as arguments (#1154)
- Fix column renaming in groupby keys (#1151)
- Ensure that we only cast if timezone is not None (#1147)
- Fix location of conftest.py (#1107)
- TST/Make sure we drop tables during postgres testing (#1101)
- Fix misleading join error message (#1086)
- BUG/TST: Make hdfs an optional dependency (#1082)
- Memoization should include expression name where available (#1080)
Performance Enhancements
Contributors
The following people contributed to the 0.12.0 release :
$ git shortlog -sn --no-merges v0.11.2..v0.12.0
63 Phillip Cloud
8 Jeff Reback
2 Krisztián Szűcs
2 Tory Haavik
1 Anirudh
1 Szucs Krisztian
1 dlovell
1 kwangin
0.11 (2017-06-28)
This release brings initial pandas backend support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Features
- Experimental pandas backend to allow execution of ibis expression against pandas DataFrames
- Graphviz visualization of ibis expressions. Implements
_repr_png_
for Jupyter Notebook functionality - Ability to create a partitioned table from an ibis expression
- Support for missing operations in the SQLite backend: sqrt, power, variance, and standard deviation, regular expression functions, and missing power support for PostgreSQL
- Support for schemas inside databases with the PostgreSQL backend
- Appveyor testing on core ibis across all supported Python versions
- Add
year
/month
/day
methods todate
types - Ability to sort, group by and project columns according to positional index rather than only by name
- Added a
type
parameter toibis.literal
to allow user specification of literal types
Bug Fixes
- Fix broken conda recipe
- Fix incorrectly typed fillna operation
- Fix postgres boolean summary operations
- Fix kudu support to reflect client API Changes
- Fix equality of nested types and construction of nested types when the value type is specified as a string
API Changes
- Deprecate passing integer values to the
ibis.timestamp
literal constructor, this will be removed in 0.12.0 - Added the
admin_timeout
parameter to the kudu clientconnect
function
Contributors
$ git shortlog --summary --numbered v0.10.0..v0.11.0
58 Phillip Cloud
1 Greg Rahn
1 Marius van Niekerk
1 Tarun Gogineni
1 Wes McKinney
0.8 (2016-05-19)
This release brings initial PostgreSQL backend support along with a number of critical bug fixes and usability improvements. As several correctness bugs with the SQL compiler were fixed, we recommend that all users upgrade from earlier versions of Ibis.
New Features
- Initial PostgreSQL backend contributed by Phillip Cloud.
- Add
groupby
as an alias forgroup_by
to table expressions
Bug Fixes
- Fix an expression error when filtering based on a new field
- Fix Impala's SQL compilation of using
OR
with compound filters - Various fixes with the
having(...)
function in grouped table expressions - Fix CTE (
WITH
) extraction insideUNION ALL
expressions. - Fix
ImportError
on Python 2 whenmock
library not installed
API Changes
- The deprecated
ibis.impala_connect
andibis.make_client
APIs have been removed
0.7 (2016-03-16)
This release brings initial Kudu-Impala integration and improved Impala and SQLite support, along with several critical bug fixes.
New Features
- Apache Kudu (incubating) integration for Impala users. Will add some documentation here when possible.
- Add
use_https
option toibis.hdfs_connect
for WebHDFS connections in secure (Kerberized) clusters without SSL enabled. - Correctly compile aggregate expressions involving multiple subqueries.
To explain this last point in more detail, suppose you had:
= ibis.table([('flag', 'string'),
table 'value', 'double')],
('tbl')
= table[table.flag == '1']
flagged = table[table.flag == '0']
unflagged
= flagged.value
fv = unflagged.value
uv
= (fv.mean() / fv.sum()) - (uv.mean() / uv.sum()) expr
The last expression now generates the correct Impala or SQLite SQL:
SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '1'
) t0CROSS JOIN (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '0'
) t1
Bug Fixes
CHAR(n)
andVARCHAR(n)
Impala types now correctly map to Ibis string expressions- Fix inappropriate projection-join-filter expression rewrites resulting in incorrect generated SQL.
ImpalaClient.create_table
correctly passesSTORED AS PARQUET
forformat='parquet'
.- Fixed several issues with Ibis dependencies (impyla, thriftpy, sasl, thrift_sasl), especially for secure clusters. Upgrading will pull in these new dependencies.
- Do not fail in
ibis.impala.connect
when trying to create the temporary Ibis database if no HDFS connection passed. - Fix join predicate evaluation bug when column names overlap with table attributes.
- Fix handling of fully-materialized joins (aka
select *
joins) in SQLAlchemy / SQLite.
Contributors
Thank you to all who contributed patches to this release.
$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
21 Wes McKinney
1 Uri Laserson
1 Kristopher Overholt
0.6 (2015-12-01)
This release brings expanded pandas and Impala integration, including support for managing partitioned tables in Impala. See the new Ibis for Impala Users
guide for more on using Ibis with Impala.
The Ibis for SQL Programmers
guide also was written since the 0.5 release.
This release also includes bug fixes affecting generated SQL correctness. All users should upgrade as soon as possible.
New Features
- New integrated Impala functionality. See
Ibis for Impala Users
for more details on these things.- Improved Impala-pandas integration. Create tables or insert into existing tables from pandas
DataFrame
objects. - Partitioned table metadata management API. Add, drop, alter, and insert into table partitions.
- Add
is_partitioned
property toImpalaTable
. - Added support for
LOAD DATA
DDL using theload_data
function, also supporting partitioned tables. - Modify table metadata (location, format, SerDe properties etc.) using
ImpalaTable.alter
- Interrupting Impala expression execution with Control-C will attempt to cancel the running query with the server.
- Set the compression codec (e.g. snappy) used with
ImpalaClient.set_compression_codec
. - Get and set query options for a client session with
ImpalaClient.get_options
andImpalaClient.set_options
. - Add
ImpalaTable.metadata
method that parses the output of theDESCRIBE FORMATTED
DDL to simplify table metadata inspection. - Add
ImpalaTable.stats
andImpalaTable.column_stats
to see computed table and partition statistics. - Add
CHAR
andVARCHAR
handling - Add
refresh
,invalidate_metadata
DDL options and addincremental
option tocompute_stats
forCOMPUTE INCREMENTAL STATS
.
- Improved Impala-pandas integration. Create tables or insert into existing tables from pandas
- Add
substitute
method for performing multiple value substitutions in an array or scalar expression. - Division is by default true division like Python 3 for all numeric data. This means for SQL systems that use C-style division semantics, the appropriate
CAST
will be automatically inserted in the generated SQL. - Easier joins on tables with overlapping column names. See
Ibis for SQL Programmers
. - Expressions like
string_expr[:3]
now work as expected. - Add
coalesce
instance method to all value expressions. - Passing
limit=None
to theexecute
method on expressions disables any default row limits.
API Changes
ImpalaTable.rename
no longer mutates the calling table expression.
Contributors
$ git log v0.5.0..v0.6.0 --pretty=format:%aN | sort | uniq -c | sort -rn
46 Wes McKinney
3 Uri Laserson
1 Phillip Cloud
1 mariusvniekerk
1 Kristopher Overholt
0.5 (2015-09-10)
Highlights in this release are the SQLite, Python 3, Impala UDA support, and an asynchronous execution API. There are also many usability improvements, bug fixes, and other new features.
New Features
- SQLite client and built-in function support
- Ibis now supports Python 3.4 as well as 2.6 and 2.7
- Ibis can utilize Impala user-defined aggregate (UDA) functions
- SQLAlchemy-based translation toolchain to enable more SQL engines having SQLAlchemy dialects to be supported
- Many window function usability improvements (nested analytic functions and deferred binding conveniences)
- More convenient aggregation with keyword arguments in
aggregate
functions - Built preliminary wrapper API for MADLib-on-Impala
- Add
var
andstd
aggregation methods and support in Impala - Add
nullifzero
numeric method for all SQL engines - Add
rename
method to Impala tables (for renaming tables in the Hive metastore) - Add
close
method toImpalaClient
for session cleanup (#533) - Add
relabel
method to table expressions - Add
insert
method to Impala tables - Add
compile
andverify
methods to all expressions to test compilation and ability to compile (since many operations are unavailable in SQLite, for example)
API Changes
- Impala Ibis client creation now uses only
ibis.impala.connect
, andibis.make_client
has been deprecated
Contributors
$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
55 Wes McKinney
9 Uri Laserson
1 Kristopher Overholt
0.4 (2015-08-14)
New Features
- Add tooling to use Impala C++ scalar UDFs within Ibis (#262, #195)
- Support and testing for Kerberos-enabled secure HDFS clusters
- Many table functions can now accept functions as parameters (invoked on the calling table) to enhance composability and emulate late-binding semantics of languages (like R) that have non-standard evaluation (#460)
- Add
any
,all
,notany
, andnotall
reductions on boolean arrays, as well ascumany
andcumall
- Using
topk
now produces an analytic expression that is executable (as an aggregation) but can also be used as a filter as before (#392, #91) - Added experimental database object "usability layer", see
ImpalaClient.database
. - Add
TableExpr.info
- Add
compute_stats
API to table expressions referencing physical Impala tables - Add
explain
method toImpalaClient
to show query plan for an expression - Add
chmod
andchown
APIs toHDFS
interface for superusers - Add
convert_base
method to strings and integer types - Add option to
ImpalaClient.create_table
to create empty partitioned tables ibis.cross_join
can now join more than 2 tables at once- Add
ImpalaClient.raw_sql
method for running naked SQL queries ImpalaClient.insert
now validates schemas locally prior to sending query to cluster, for better usability.- Add conda installation recipes
Contributors
$ git log v0.3.0..v0.4.0 --pretty=format:%aN | sort | uniq -c | sort -rn
38 Wes McKinney
9 Uri Laserson
2 Meghana Vuyyuru
2 Kristopher Overholt
1 Marius van Niekerk
0.3 (2015-07-20)
First public release. See https://ibis-project.org for more.
New Features
- Implement window / analytic function support
- Enable non-equijoins (join clauses with operations other than
==
). - Add remaining
string functions
supported by Impala. - Add
pipe
method to tables (hat-tip to the pandas dev team). - Add
mutate
convenience method to tables. - Fleshed out
WebHDFS
implementations: get/put directories, move files, etc. See thefull HDFS API
. - Add
truncate
method for timestamp values ImpalaClient
can execute scalar expressions not involving any table.- Can also create internal Impala tables with a specific HDFS path.
- Make Ibis's temporary Impala database and HDFS paths configurable (see
ibis.options
). - Add
truncate_table
function to client (if the user's Impala cluster supports it). - Python 2.6 compatibility
- Enable Ibis to execute concurrent queries in multithreaded applications (earlier versions were not thread-safe).
- Test data load script in
scripts/load_test_data.py
- Add an internal operation type signature API to enhance developer productivity.
Contributors
$ git log v0.2.0..v0.3.0 --pretty=format:%aN | sort | uniq -c | sort -rn
59 Wes McKinney
29 Uri Laserson
4 Isaac Hodes
2 Meghana Vuyyuru
0.2 (2015-06-16)
New Features
insert
method on Ibis client for inserting data into existing tables.parquet_file
,delimited_file
, andavro_file
client methods for querying datasets not yet available in Impala- New
ibis.hdfs_connect
method andHDFS
client API for WebHDFS for writing files and directories to HDFS - New timedelta API and improved timestamp data support
- New
bucket
andhistogram
methods on numeric expressions - New
category
logical datatype for handling bucketed data, among other things - Add
summary
API to numeric expressions - Add
value_counts
convenience API to array expressions - New string methods
like
,rlike
, andcontains
for fuzzy and regex searching - Add
options.verbose
option and configurableoptions.verbose_log
callback function for improved query logging and visibility - Support for new SQL built-in functions
ibis.coalesce
ibis.greatest
andibis.least
ibis.where
for conditional logic (see alsoibis.case
andibis.cases
)nullif
method on value expressionsibis.now
- New aggregate functions:
approx_median
,approx_nunique
, andgroup_concat
where
argument in aggregate functions- Add
having
method togroup_by
intermediate object - Added group-by convenience
table.group_by(exprs).COLUMN_NAME.agg_function()
- Add default expression names to most aggregate functions
- New Impala database client helper methods
create_database
drop_database
exists_database
list_databases
set_database
- Client
list_tables
searching / listing method - Add
add
,sub
, and other explicit arithmetic methods to value expressions
API Changes
- New Ibis client and Impala connection workflow. Client now combined from an Impala connection and an optional HDFS connection
Bug Fixes
- Numerous expression API bug fixes and rough edges fixed
Contributors
$ git log v0.1.0..v0.2.0 --pretty=format:%aN | sort | uniq -c | sort -rn
71 Wes McKinney
1 Juliet Hougland
1 Isaac Hodes
0.1 (2015-03-26)
First Ibis release.
Expression DSL design and type system
Expression to ImpalaSQL compiler toolchain
Impala built-in function wrappers
$ git log 84d0435..v0.1.0 –pretty=format:%aN | sort | uniq -c | sort -rn 78 Wes McKinney 1 srus 1 Henry Robinson