Skip to content

Release Notes

3.0.2 (2022-04-28)

Bug Fixes

  • docs: fix tempdir location for docs build (dcd1b22)

3.0.1 (2022-04-28)

Bug Fixes

  • build: replace version before exec plugin runs (573139c)

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

  • ir: The following are breaking changes due to simplifying expression internals
  • ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory methods have been removed, DataType.scalar and DataType.column class fields can be used to directly construct a corresponding expression instance (though prefer to use operation.to_expr())
  • ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly nowValueExpr.has_name(),ValueExpr.get_name()andValueExpr.type()` methods are the only way to retrieve the expression's name and datatype.
  • ibis.expr.operations.Node.output_type is a property now not a method, decorate those methods with @property
  • ibis.expr.operations.ValueOp subclasses must define output_shape and output_dtype properties from now on (note the datatype abbreviation dtype in the property name)
  • ibis.expr.rules.cast(), scalar_like() and array_like() rules have been removed
  • api: Replace t["a"].distinct() with t[["a"]].distinct().
  • deps: The sqlalchemy lower bound is now 1.4
  • ir: Schema.names and Schema.types attributes now have tuple type rather than list
  • expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn't preserve insertion order. Now columns will appear in the order in which they were passed to aggregate or mutate
  • api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
  • ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
  • ir: removed ibis.expr.lineage.{roots,find_nodes} functions
  • config: Use ibis.options.graphviz_repr = True to enable
  • hdfs: Use fsspec instead of HDFS from ibis
  • udf: Vectorized UDF coercion functions are no longer a public API.
  • The minimum supported Python version is now Python 3.8
  • config: register_option is no longer supported, please submit option requests upstream
  • backends: Read tables with pandas.read_hdf and use the pandas backend
  • The CSV backend is removed. Use Datafusion for CSV execution.
  • backends: Use the datafusion backend to read parquet files
  • Expr() -> Expr.pipe()
  • coercion functions previously in expr/schema.py are now in udf/vectorized.py
  • api: materialize is removed. Joins with overlapping columns now have suffixes.
  • kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
  • Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.

Features

  • add repr_html for expressions to print as tables in ipython (cd6fa4e)
  • add duckdb backend (667f2d5)
  • allow construction of decimal literals (3d9e865)
  • api: add ibis.asc expression (efe177e), closes #1454
  • api: add has_operation API to the backend (4fab014)
  • api: implement type for SortExpr (ab19bd6)
  • clickhouse: implement string concat for clickhouse (1767205)
  • clickhouse: implement StrRight operation (67749a0)
  • clickhouse: implement table union (e0008d7)
  • clickhouse: implement trim, pad and string predicates (a5b7293)
  • datafusion: implement Count operation (4797a86)
  • datatypes: unbounded decimal type (f7e6f65)
  • date: add ibis.date(y,m,d) functionality (26892b6), closes #386
  • duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
  • duckdb: add functionality needed to pass integer to interval test (e2119e8)
  • duckdb: implement _get_schema_using_query (93cd730)
  • duckdb: implement now() function (6924f50)
  • duckdb: implement regexp replace and extract (18d16a7)
  • implement force argument in sqlalchemy backend base class (9df7f1b)
  • implement coalesce for the pyspark backend (8183efe)
  • implement semi/anti join for the pandas backend (cb36fc5)
  • implement semi/anti join for the pyspark backend (3e1ba9c)
  • implement the remaining clickhouse joins (b3aa1f0)
  • ir: rewrite and speed up expression repr (45ce9b2)
  • mysql: implement _get_schema_from_query (456cd44)
  • mysql: move string join impl up to alchemy for mysql (77a8eb9)
  • postgres: implement _get_schema_using_query (f2459eb)
  • pyspark: implement Distinct for pyspark (4306ad9)
  • pyspark: implement log base b for pyspark (527af3c)
  • pyspark: implement percent_rank and enable testing (c051617)
  • repr: add interval info to interval repr (df26231)
  • sqlalchemy: implement ilike (43996c0)
  • sqlite: implement date_truncate (3ce4f2a)
  • sqlite: implement ISO week of year (714ff7b)
  • sqlite: implement string join and concat (6f5f353)
  • support of arrays and tuples for clickhouse (db512a8)
  • ver: dynamic version identifiers (408f862)

Bug Fixes

  • added wheel to pyproject toml for venv users (b0b8e5c)
  • allow major version changes in CalVer dependencies (9c3fbe5)
  • annotable: allow optional arguments at any position (778995f), closes #3730
  • api: add ibis.map and .struct (327b342), closes #3118
  • api: map string multiplication with integer to repeat method (b205922)
  • api: thread suffixes parameter to individual join methods (31a9aff)
  • change TimestampType to Timestamp (e0750be)
  • clickhouse: disconnect from clickhouse when computing version (11cbf08)
  • clickhouse: use a context manager for execution (a471225)
  • combine windows during windowization (7fdd851)
  • conform epoch_seconds impls to expression return type (18a70f1)
  • context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
  • dask: fix asof joins for newer version of dask (50711cc)
  • dask: workaround dask bug (a0f3bd9)
  • deps: update dependency atpublic to v3 (3fe8f0d)
  • deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
  • deps: update dependency graphviz to >=0.16,<0.21 (3014445)
  • duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
  • duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
  • duckdb: fix log with base b impl (4920097)
  • duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
  • enforce the schema's column names in apply_to (b0f334d)
  • expose ops.IfNull for mysql backend (156c2bd)
  • expr: add more binary operators to char list and implement fallback (b88184c)
  • expr: fix formatting of table info using tabulate (b110636)
  • fix float vs real data type detection in sqlalchemy (24e6774)
  • fix list_schemas argument (69c1abf)
  • fix postgres udfs and reenable ci tests (7d480d2)
  • fix tablecolumn execution for filter following join (064595b)
  • format: remove some newlines from formatted expr repr (ed4fa78)
  • histogram: cross_join needs onclause=True (5d36a58), closes #622
  • ibis.expr.signature.Parameter is not pickleable (828fd54)
  • implement coalesce properly in the pandas backend (aca5312)
  • implement count on tables for pyspark (7fe5573), closes #2879
  • infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
  • mutate: do not lift table column that results from mutate (ba4e5e5)
  • pandas: disable range windows with order by (e016664)
  • pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
  • pandas: implement percent_rank correctly (d8b83e7)
  • prevent unintentional cross joins in mutate + filter (83eef99)
  • pyspark: fix range windows (a6f2aa8)
  • regression in Selection.sort_by with resolved_keys (c7a69cd)
  • regression in sort_by with resolved_keys (63f1382), closes #3619
  • remove broken csv pre_execute (93b662a)
  • remove importorskip call for backend tests (2f0bcd8)
  • remove incorrect fix for pandas regression (339f544)
  • remove passing schema into register_parquet (bdcbb08)
  • repr: add ops.TimeAdd to repr binop lookup table (fd94275)
  • repr: allow ops.TableNode in fmt_value (6f57003)
  • reverse the predicate pushdown subsitution (f3cd358)
  • sort_index to satisfy pandas 1.4.x (6bac0fc)
  • sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
  • sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
  • sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
  • sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
  • sql: walk right join trees and substitute joins with right-side joins with views (0231592)
  • store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

  • datatypes: speed up str and hash (262d3d7)
  • fast path for simple column selection (d178498)
  • ir: global equality cache (13c2bb2)
  • ir: introduce CachedEqMixin to speed up equality checks (b633925)
  • repr: remove full tree repr from rule validator error message (65885ab)
  • speed up attribute access (89d1c05)
  • use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

  • deps: increase sqlalchemy lower bound to 1.4 (560854a)
  • drop support for Python 3.7 (0afd138)

Code Refactoring

  • api: make primitive types more cohesive (71da8f7)
  • api: remove distinct ColumnExpr API (3f48cb8)
  • api: remove materialize (24285c1)
  • backends: remove the hdf5 backend (ff34f3e)
  • backends: remove the parquet backend (b510473)
  • config: disable graphviz-repr-in-notebook by default (214ad4e)
  • config: remove old config code and port to pydantic (4bb96d1)
  • dt.UUID inherits from DataType, not String (2ba540d)
  • expr: preserve column ordering in aggregations/mutations (668be0f)
  • hdfs: replace HDFS with fsspec (cc6eddb)
  • ir: make Annotable immutable (1f2b3fa)
  • ir: make schema annotable (b980903)
  • ir: remove unused lineage roots and find_nodes functions (d630a77)
  • ir: simplify expressions by not storing dtype and name (e929f85)
  • kudu: remove support for use of kudu through kudu-python (36bd97f)
  • move coercion functions from schema.py to udf (58eea56), closes #3033
  • remove blanket call for Expr (3a71116), closes #2258
  • remove the csv backend (0e3e02e)
  • udf: make coerce functions in ibis.udf.vectorized private (9ba4392)

2.1.1 (2022-01-12)

Bug Fixes

  • setup.py: set the correct version number for 2.1.0 (f3d267b)

2.1.0 (2022-01-12)

Bug Fixes

  • consider all packages' entry points (b495cf6)
  • datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)
  • deps: bump minimum dask version to 2021.10.0 (e6b5c09)
  • deps: constrain numpy to ensure wheels are used on windows (70c308b)
  • deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)
  • deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)
  • deps: update dependency pyarrow to v6 (#3092) (61e52b5)
  • don't force backends to override do_connect until 3.0.0 (4b46973)
  • execute materialized joins in the pandas and dask backends (#3086) (9ed937a)
  • literal: allow creating ibis literal with uuid (#3131) (b0f4f44)
  • restore the ability to have more than two option levels (#3151) (fb4a944)
  • sqlalchemy: fix correlated subquery compilation (43b9010)
  • sqlite: defer db connection until needed (#3127) (5467afa), closes #64

Features

  • allow column_of to take a column expression (dbc34bb)
  • ci: More readable workflow job titles (#3111) (d8fd7d9)
  • datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627
  • datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627
  • make dayofweek impls conform to pandas semantics (#3161) (9297828)

Reverts

  • "ci: install gdal for fiona" (8503361)

2.0.0 (2021-10-06)

Features

  • Serialization-deserialization of Node via pickle is now byte compatible between different processes (#2938)
  • Support joining on different columns in ClickHouse backend (#2916)
  • Support summarization of empty data in Pandas backend (#2908)
  • Unify implementation of fillna and isna in Pyspark backend (#2882)
  • Support binary operation with Timedelta in Pyspark backend (#2873)
  • Add group_concat operation for Clickhouse backend (#2839)
  • Support comparison of ColumnExpr to timestamp literal (#2808)
  • Make op schema a cached property (#2805)
  • Implement .insert() for SQLAlchemy backends (#2613, #2613)
  • Infer categorical and decimal Series to more specific Ibis types in Pandas backend (#2792)
  • Add startswith and endswith operations (#2790)
  • Allow more flexible return type for UDFs (#2776, #2797)
  • Implement Clip in the Pyspark backend (#2779)
  • Use ndarray as array representation in Pandas backend (#2753)
  • Support Spark filter with window operation (#2687)
  • Support context adjustment for udfs for pandas backend (#2646)
  • Add auth_local_webserver, auth_external_data, and auth_cache parameters to BigQuery connect method. Set auth_local_webserver to use a local server instead of copy-pasting an authorization code. Set auth_external_data to true to request additional scopes required to query Google Drive and Sheets. Set auth_cache to reauth or none to force reauthentication. (#2655)
  • Add bit_and, bit_or, and bit_xor integer column aggregates (BigQuery and MySQL backends) (#2641)
  • Backends are defined as entry points (#2379)
  • Add ibis.array for creating array expressions (#2615)
  • Implement Not operation in PySpark backend (#2607)
  • Added support for case/when in PySpark backend (#2610)
  • Add support for np.array as literals for backends that already support lists as literals (#2603)

Bugs

  • Fix data races in impala connection pool accounting (#2991)
  • Fix null literal compilation in the Clickhouse backend (#2985)
  • Fix order of limit and offset parameters in the Clickhouse backend (#2984)
  • Replace equals operation for geospatial datatype to geo_equals (#2956)
  • Fix .drop(fields). The argument can now be either a list of strings or a string. (#2829)
  • Fix projection on differences and intersections for SQL backends (#2845)
  • Backends are loaded in a lazy way, so third-party backends can import Ibis without circular imports (#2827)
  • Disable aggregation optimization due to N squared performance (#2830)
  • Fix .cast() to array outputting list instead of np.array in Pandas backend (#2821)
  • Fix aggregation with mixed reduction datatypes (array + scalar) on Dask backend (#2820)
  • Fix error when using reduction UDF that returns np.array in a grouped aggregation (#2770)
  • Fix time context trimming error for multi column udfs in pandas backend (#2712)
  • Fix error during compilation of range_window in base_sql backends (:issue:2608) (#2710)
  • Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment (#2696)
  • Fix aggregate exploding the output of Reduction ops that return a list/ndarray (#2702)
  • Fix issues with context adjustment for filter with PySpark backend (#2693)
  • Add temporary struct col in pyspark backend to ensure that UDFs are executed only once (#2657)
  • Fix BigQuery connect bug that ignored project ID parameter (#2588)
  • Fix overwrite logic to account for DestructColumn inside mutate API (#2636)
  • Fix fusion optimization bug that incorrectly changes operation order (#2635)
  • Fixes a NPE issue with substr in PySpark backend (#2610)
  • Fixes binary data type translation into BigQuery bytes data type (#2354)
  • Make StructValue picklable (#2577)

Support

  • Improvement of the backend API. The former Client subclasses have been replaced by a Backend class that must subclass ibis.backends.base.BaseBackend. The BaseBackend class contains abstract methods for the minimum subset of methods that backends must implement, and their signatures have been standardized across backends. The Ibis compiler has been refactored, and backends don't need to implement all compiler classes anymore if the default works for them. Only a subclass of ibis.backends.base.sql.compiler.Compiler is now required. Backends now need to register themselves as entry points. (#2678)
  • Deprecate exists_table(table) in favor of table in list_tables() (#2905)
  • Remove handwritten type parser; parsing errors that were previously IbisTypeError are now parsy.ParseError. parsy is now a hard requirement. (#2977)
  • Methods current_database and list_databases raise an exception for backends that do not support databases (#2962)
  • Method set_database has been deprecated, in favor of creating a new connection to a different database (#2913)
  • Removed log method of clients, in favor of verbose_log option (#2914)
  • Output of Client.version returned as a string, instead of a setuptools Version (#2883)
  • Deprecated list_schemas in SQLAlchemy backends in favor of list_databases (#2862)
  • Deprecated ibis.<backend>.verify() in favor of capturing exception in ibis.<backend>.compile() (#2865)
  • Simplification of data fetching. Backends don't need to implement Query anymore (#2789)
  • Move BigQuery backend to a separate repository <https://github.com/ibis-project/ibis-bigquery>_. The backend will be released separately, use pip install ibis-bigquery or conda install ibis-bigquery to install it, and then use as before. (#2665)
  • Supporting SQLAlchemy 1.4, and requiring minimum 1.3 (#2689)
  • Namespace time_col config, fix type check for trim_with_timecontext for pandas window execution (#2680)
  • Remove deprecated ibis.HDFS, ibis.WebHDFS and ibis.hdfs_connect (#2505)

1.4.0 (2020-11-07)

Features

  • Add Struct.from_dict (#2514)
  • Add hash and hashbytes support for BigQuery backend (#2310)
  • Support reduction UDF without groupby to return multiple columns for Pandas backend (#2511)
  • Support analytic and reduction UDF to return multiple columns for Pandas backend (#2487)
  • Support elementwise UDF to return multiple columns for Pandas and PySpark backend (#2473)
  • FEAT: Support Ibis interval for window in pyspark backend (#2409)
  • Use Scope class for scope in pyspark backend (#2402)
  • Add PySpark support for ReductionVectorizedUDF (#2366)
  • Add time context in scope in execution for pandas backend (#2306)
  • Add start_point and end_point to PostGIS backend. (#2081)
  • Add set difference to general ibis api (#2347)
  • Add rowid expression, supported by SQLite and OmniSciDB (#2251)
  • Add intersection to general ibis api (#2230)
  • Add application_name argument to ibis.bigquery.connect to allow attributing Google API requests to projects that use Ibis. (#2303)
  • Add support for casting category dtype in pandas backend (#2285)
  • Add support for Union in the PySpark backend (#2270)
  • Add support for implementign custom window object for pandas backend (#2260)
  • Implement two level dispatcher for execute_node (#2246)
  • Add ibis.pandas.trace module to log time and call stack information. (#2233)
  • Validate that the output type of a UDF is a single element (#2198)
  • ZeroIfNull and NullIfZero implementation for OmniSciDB (#2186)
  • IsNan implementation for OmniSciDB (#2093)
  • [OmnisciDB] Support add_columns and drop_columns for OmnisciDB table (#2094)
  • Create ExtractQuarter operation and add its support to Clickhouse, CSV, Impala, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2175)
  • Add translation rules for isnull() and notnull() for pyspark backend (#2126)
  • Add window operations support to SQLite (#2232)
  • Implement read_csv for omniscidb backend (#2062)
  • [OmniSciDB] Add support to week extraction (#2171)
  • Date, DateDiff and TimestampDiff implementations for OmniSciDB (#2097)
  • Create ExtractWeekOfYear operation and add its support to Clickhouse, CSV, MySQL, Pandas, Parquet, PostgreSQL, PySpark and Spark (#2177)
  • Add initial support for ibis.random function (#2060)
  • Added epoch_seconds extraction operation to Clickhouse, CSV, Impala, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite, Spark and BigQuery :issue:2273 (#2178)
  • [OmniSciDB] Add "method" parameter to load_data (#2165)
  • Add non-nullable info to schema output (#2117)
  • fillna and nullif implementations for OmnisciDB (#2083)
  • Add load_data to sqlalchemy's backends and fix database parameter for load/create/drop when database parameter is the same than the current database (#1981)
  • [OmniSciDB] Add support for within, d_fully_within and point (#2125)
  • OmniSciDB - Refactor DDL and Client; Add temporary parameter to create_table and "force" parameter to drop_view (#2086)
  • Create ExtractDayOfYear operation and add its support to Clickhouse, CSV, MySQL, OmniSciDB, Pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark (#2173)
  • Implementations of Log Log2 Log10 for OmniSciDB backend (#2095)

Bugs

  • Table expressions do not recognize inet datatype (Postgres backend) (#2462)
  • Table expressions do not recognize macaddr datatype (Postgres backend) (#2461)
  • Fix aggcontext.Summarize not always producing scalar (Pandas backend) (#2410)
  • Fix same window op with different window size on table lead to incorrect results for pyspark backend (#2414)
  • Fix same column with multiple aliases not showing properly in repr (#2229)
  • Fix reduction UDFs over ungrouped, bounded windows on Pandas backend (#2395)
  • FEAT: Support rolling window UDF with non numeric inputs for pandas backend. (#2386)
  • Fix scope get to use hashmap lookup instead of list lookup (#2386)
  • Fix equality behavior for Literal ops (#2387)
  • Fix analytic ops over ungrouped and unordered windows on Pandas backend (#2376)
  • Fix the covariance operator in the BigQuery backend. (#2367)
  • Update impala kerberos dependencies (#2342)
  • Added verbose logging to SQL backends (#1320)
  • Fix issue with sql_validate call to OmnisciDB. (#2256)
  • Add missing float types to pandas backend (#2237)
  • Allow group_by and order_by as window operation input in pandas backend (#2252)
  • Fix PySpark compiler error when elementwise UDF output_type is Decimal or Timestamp (#2223)
  • Fix interactive mode returning a expression instead of the value when used in Jupyter (#2157)
  • Fix PySpark error when doing alias after selection (#2127)
  • Fix millisecond issue for OmniSciDB :issue:2167, MySQL :issue:2169, PostgreSQL :issue:2166, Pandas :issue:2168, BigQuery :issue:2273 backends (#2170)
  • [OmniSciDB] Fix TopK when used as filter (#2134)

Support

  • Move ibis.HDFS, ibis.WebHDFS and ibis.hdfs_connect to ibis.impala.* (#2497)
  • Drop support to Python 3.6 (#2288)
  • Simplifying tests directories structure (#2351)
  • Update google-cloud-bigquery dependency minimum version to 1.12.0 (#2304)
  • Remove "experimental" mentions for OmniSciDB and Pandas backends (#2234)
  • Use an OmniSciDB image stable on CI (#2244)
  • Added fragment_size to table creation for OmniSciDB (#2107)
  • Added round() support for OmniSciDB (#2096)
  • Enabled cumulative ops support for OmniSciDB (#2113)

1.3.0 (2020-02-27)

Features

  • Improve many arguments UDF performance in pandas backend. (#2071)
  • Add DenseRank, RowNumber, MinRank, Count, PercentRank/CumeDist window operations to OmniSciDB (#1976)
  • Introduce a top level vectorized UDF module (experimental). Implement element-wise UDF for pandas and PySpark backend. (#2047)
  • Add support for multi arguments window UDAF for the pandas backend (#2035)
  • Clean up window translation logic in pyspark backend (#2004)
  • Add docstring check to CI for an initial subset files (#1996)
  • Pyspark backend bounded windows (#2001)
  • Add more POSTGIS operations (#1987)
  • SQLAlchemy Default precision and scale to decimal types for PostgreSQL and MySQL (#1969)
  • Add support for array operations in PySpark backend (#1983)
  • Implement sort, if_null, null_if and notin for PySpark backend (#1978)
  • Add support for date/time operations in PySpark backend (#1974)
  • Add support for params, query_schema, and sql in PySpark backend (#1973)
  • Implement join for PySpark backend (#1967)
  • Validate AsOfJoin tolerance and attempt interval unit conversion (#1952)
  • filter for PySpark backend (#1943)
  • window operations for pyspark backend (#1945)
  • Implement IntervalSub for pandas backend (#1951)
  • PySpark backend string and column ops (#1942)
  • PySpark backend (#1913)
  • DDL support for Spark backend (#1908)
  • Support timezone aware arrow timestamps (#1923)
  • Add shapely geometries as input for literals (#1860)
  • Add geopandas as output for omniscidb (#1858)
  • Spark UDFs (#1885)
  • Add support for Postgres UDFs (#1871)
  • Spark tests (#1830)
  • Spark client (#1807)
  • Use pandas rolling apply to implement rows_with_max_lookback (#1868)

Bugs

  • Pin "clickhouse-driver" to ">=0.1.3" (#2089)
  • Fix load data stage for Linux CI (#2069)
  • Fix datamgr.py fail if IBIS_TEST_OMNISCIDB_DATABASE=omnisci (#2057)
  • Change pymapd connection parameter from "session_id" to "sessionid" (#2041)
  • Fix pandas backend to treat trailing_window preceding arg as window bound rather than window size (e.g. preceding=0 now indicates current row rather than window size 0) (#2009)
  • Fix handling of Array types in Postgres UDF (#2015)
  • Fix pydocstyle config (#2010)
  • Pinning clickhouse-driver<0.1.2 (#2006)
  • Fix CI log for database (#1984)
  • Fixes explain operation (#1933)
  • Fix incorrect assumptions about attached SQLite databases (#1937)
  • Upgrade to JDK11 (#1938)
  • sql method doesn't work when the query uses LIMIT clause (#1903)
  • Fix union implementation (#1910)
  • Fix failing com imports on master (#1912)
  • OmniSci/MapD - Fix reduction for bool (#1901)
  • Pass scope to grouping execution in the pandas backend (#1899)
  • Fix various Spark backend issues (#1888)
  • Make Nodes enforce the proper signature (#1891)
  • Fix according to bug in pd.to_datetime when passing the unit flag (#1893)
  • Fix small formatting buglet in PR merge tool (#1883)
  • Fix the case where we do not have an index when using preceding with intervals (#1876)
  • Fixed issues with geo data (#1872)
  • Remove -x from pytest call in linux CI (#1869)
  • Fix return type of Struct.from_tuples (#1867)

Support

  • Add support to Python 3.8 (#2066)
  • Pin back version of isort (#2079)
  • Use user-defined port variables for Omnisci and PostgreSQL tests (#2082)
  • Change omniscidb image tag from v5.0.0 to v5.1.0 on docker-compose recipe (#2077)
  • [Omnisci] The same SRIDs for test_geo_spatial_binops (#2051)
  • Unpin rtree version (#2078)
  • Link pandas issues with xfail tests in pandas/tests/test_udf.py (#2074)
  • Disable Postgres tests on Windows CI. (#2075)
  • use conda for installation black and isort tools (#2068)
  • CI: Fix CI builds related to new pandas 1.0 compatibility (#2061)
  • Fix data map for int8 on OmniSciDB backend (#2056)
  • Add possibility to run tests for separate backend via make test BACKENDS=[YOUR BACKEND] (#2052)
  • Fix "cudf" import on OmniSciDB backend (#2055)
  • CI: Drop table only if it exists (OmniSciDB) (#2050)
  • Add initial documentation for OmniSciDB, MySQL, PySpark and SparkSQL backends, add initial documentation for geospatial methods and add links to Ibis wiki page (#2034)
  • Implement covariance for bigquery backend (#2044)
  • Add Spark to supported backends list (#2046)
  • Ping dependency of rtree to fix CI failure (#2043)
  • Drop support for Python 3.5 (#2037)
  • HTML escape column names and types in png repr. (#2023)
  • Add geospatial tutorial notebook (#1991)
  • Change omniscidb image tag from v4.7.0 to v5.0.0 on docker-compose recipe (#2031)
  • Pin "semantic_version" to "<2.7" in the docs build CI, fix "builddoc" and "doc" section inside "Makefile" and skip mysql tzinfo on CI to allow to run MySQL using docker container on a hard disk drive. (#2030)
  • Fixed impala start up issues (#2012)
  • cache all ops in translate() (#1999)
  • Add black step to CI (#1988)
  • Json UUID any (#1962)
  • Add log for database services (#1982)
  • Fix BigQuery backend fixture so batting and awards_players fixture re… (#1972)
  • Disable BigQuery explicitly in all/test_join.py (#1971)
  • Re-formatting all files using pre-commit hook (#1963)
  • Disable codecov report upload during CI builds (#1961)
  • Developer doc enhancements (#1960)
  • Missing geospatial ops for OmniSciDB (#1958)
  • Remove pandas deprecation warnings (#1950)
  • Add developer docs to get docker setup (#1948)
  • More informative IntegrityError on duplicate columns (#1949)
  • Improve geospatial literals and smoke tests (#1928)
  • PostGIS enhancements (#1925)
  • Rename mapd to omniscidb backend (#1866)
  • Fix failing BigQuery tests (#1926)
  • Added missing null literal op (#1917)
  • Update link to Presto website (#1895)
  • Removing linting from windows (#1896)
  • Fix link to NUMFOCUS CoC (#1884)
  • Added CoC section (#1882)
  • Remove pandas exception for rows_with_max_lookback (#1859)
  • Move CI pipelines to Azure (#1856)

1.2.0 (2019-06-24)

Features

  • Add new geospatial functions to OmniSciDB backend (#1836)
  • allow pandas timedelta in rows_with_max_lookback (#1838)
  • Accept rows-with-max-lookback as preceding parameter (#1825)
  • PostGIS support (#1787)

Bugs

  • Fix call to psql causing failing CI (#1855)
  • Fix nested array literal repr (#1851)
  • Fix repr of empty schema (#1850)
  • Add max_lookback to window replace and combine functions (#1843)
  • Partially revert #1758 (#1837)

Support

  • Skip SQLAlchemy backend tests in connect method in backends.py (#1847)
  • Validate order_by when using rows_with_max_lookback window (#1848)
  • Generate release notes from commits (#1845)
  • Raise exception on backends where rows_with_max_lookback can't be implemented (#1844)
  • Tighter version spec for pytest (#1840)
  • Allow passing a branch to ci/feedstock.py (#1826)

1.1.0 (2019-06-09)

Features

  • Conslidate trailing window functions (#1809)
  • Call to_interval when casting integers to intervals (#1766)
  • Add session feature to mapd client API (#1796)
  • Add min periods parameter to Window (#1792)
  • Allow strings for types in pandas UDFs (#1785)
  • Add missing date operations and struct field operation for the pandas backend (#1790)
  • Add window operations to the OmniSci backend (#1771)
  • Reimplement the pandas backend using topological sort (#1758)
  • Add marker for xfailing specific backends (#1778)
  • Enable window function tests where possible (#1777)
  • is_computable_arg dispatcher (#1743)
  • Added float32 and geospatial types for create table from schema (#1753)

Bugs

  • Fix group_concat test and implementations (#1819)
  • Fix failing strftime tests on Python 3.7 (#1818)
  • Remove unnecessary (and erroneous in some cases) frame clauses (#1757)
  • Chained mutate operations are buggy (#1799)
  • Allow projections from joins to attempt fusion (#1783)
  • Fix Python 3.5 dependency versions (#1798)
  • Fix compatibility and bugs associated with pandas toposort reimplementation (#1789)
  • Fix outer_join generating LEFT join instead of FULL OUTER (#1772)
  • NullIf should enforce that its arguments are castable to a common type (#1782)
  • Fix conda create command in documentation (#1775)
  • Fix preceding and following with None (#1765)
  • PostgreSQL interval type not recognized (#1661)

Support

  • Remove decorator hacks and add custom markers (#1820)
  • Add development deps to setup.py (#1814)
  • Fix design and developer docs (#1805)
  • Pin sphinx version to 2.0.1 (#1810)
  • Add pep8speaks integration (#1793)
  • Fix typo in UDF signature specification (#1821)
  • Clean up most xpassing tests (#1779)
  • Update omnisci container version (#1781)
  • Constrain PyMapD version to get passing builds (#1776)
  • Remove warnings and clean up some docstrings (#1763)
  • Add StringToTimestamp as unsupported (#1638)
  • Add isort pre-commit hooks (#1759)
  • Add Python 3.5 testing back to CI (#1750)
  • Re-enable CI for building step (#1700)
  • Update README reference to MapD to say OmniSci (#1749)

1.0.0 (2019-03-26)

Features

  • Add black as a pre-commit hook (#1735)
  • Add support for the arbitrary aggregate in the mapd backend (#1680)
  • Add SQL method for the MapD backend (#1731)
  • Clean up merge PR script and use the actual merge feature of GitHub (#1744)
  • Add cross join to the pandas backend (#1723)
  • Implement default handler for multiple client pre_execute (#1727)
  • Implement BigQuery auth using pydata_google_auth (#1728)
  • Timestamp literal accepts a timezone parameter (#1712)
  • Remove support for passing integers to ibis.timestamp (#1725)
  • Add find_nodes to lineage (#1704)
  • Remove a bunch of deprecated APIs and clean up warnings (#1714)
  • Implement table distinct for the pandas backend (#1716)
  • Implement geospatial functions for MapD (#1678)
  • Implement geospatial types for MapD (#1666)
  • Add pre commit hook (#1685)
  • Getting started with mapd, mysql and pandas (#1686)
  • Support column names with special characters in mapd (#1675)
  • Allow operations to hide arguments from display (#1669)
  • Remove implicit ordering requirements in the PostgreSQL backend (#1636)
  • Add cross join operator to MapD (#1655)
  • Fix UDF bugs and add support for non-aggregate analytic functions (#1637)
  • Support string slicing with other expressions (#1627)
  • Publish the ibis roadmap (#1618)
  • Implement approx_median in BigQuery (#1604)
  • Make ibis node instances hashable (#1611)
  • Add range_window and trailing_range_window to docs (#1608)

Bugs

  • Make dev/merge-pr.py script handle PR branches (#1745)
  • Fix NULLIF implementation for the pandas backend (#1742)
  • Fix casting to float in the MapD backend (#1737)
  • Fix testing for BigQuery after auth flow update (#1741)
  • Fix skipping for new BigQuery auth flow (#1738)
  • Fix bug in TableExpr.drop (#1732)
  • Filter the raw warning from newer pandas to support older pandas (#1729)
  • Fix BigQuery credentials link (#1706)
  • Add Union as an unsuppoted operation for MapD (#1639)
  • Fix visualizing an ibis expression when showing a selection after a table join (#1705)
  • Fix MapD exception for toDateTime (#1659)
  • Use == to compare strings (#1701)
  • Resolves joining with different column names (#1647)
  • Fix map get with compatible types (#1643)
  • Fixed where operator for MapD (#1653)
  • Remove parameters from mapd (#1648)
  • Make sure we cast when NULL is else in CASE expressions (#1651)
  • Fix equality (#1600)

Support

  • Do not build universal wheels (#1748)
  • Remove tag prefix from versioneer (#1747)
  • Use releases to manage documentation (#1746)
  • Use cudf instead of pygdf (#1694)
  • Fix multiple CI issues (#1696)
  • Update mapd ci to v4.4.1 (#1681)
  • Enabled mysql CI on azure pipelines (#1672)
  • Remove support for Python 2 (#1670)
  • Fix flake8 and many other warnings (#1667)
  • Update README.md for impala and kudu (#1664)
  • Remove defaults as a channel from azure pipelines (#1660)
  • Fixes a very typo in the pandas/core.py docstring (#1658)
  • Unpin clickhouse-driver version (#1657)
  • Add test for reduction returning lists (#1650)
  • Fix Azure VM image name (#1646)
  • Updated MapD server-CI (#1641)
  • Add TableExpr.drop to API documentation (#1645)
  • Fix Azure deployment step (#1642)
  • Set up CI with Azure Pipelines (#1640)
  • Fix conda builds (#1609)

v0.14.0 (2018-08-23)

This release brings refactored, more composable core components and rule system to ibis. We also focused quite heavily on the BigQuery backend this release.

New Features

  • Allow keyword arguments in Node subclasses (#968)
  • Splat args into Node subclasses instead of requiring a list (#969)
  • Add support for UNION in the BigQuery backend (#1408, #1409)
  • Support for writing UDFs in BigQuery (#1377). See the BigQuery UDF docs for more details.
  • Support for cross-project expressions in the BigQuery backend. (#1427, #1428)
  • Add strftime and to_timestamp support for BigQuery (#1422, #1410)
  • Require google-cloud-bigquery >=1.0 (#1424)
  • Limited support for interval arithmetic in the pandas backend (#1407)
  • Support for subclassing TableExpr (#1439)
  • Fill out pandas backend operations (#1423)
  • Add common DDL APIs to the pandas backend (#1464)
  • Implement the sql method for BigQuery (#1463)
  • Add to_timestamp for BigQuery (#1455)
  • Add the mapd backend (#1419)
  • Implement range windows (#1349)
  • Support for map types in the pandas backend (#1498)
  • Add mean and sum for boolean types in BigQuery (#1516)
  • All recent versions of SQLAlchemy are now suppported (#1384)
  • Add support for NUMERIC types in the BigQuery backend (#1534)
  • Speed up grouped and rolling operations in the pandas backend (#1549)
  • Implement TimestampNow for BigQuery and pandas (#1575)

Bug Fixes

  • Nullable property is now propagated through value types (#1289)
  • Implicit casting between signed and unsigned integers checks boundaries
  • Fix precedence of case statement (#1412)
  • Fix handling of large timestamps (#1440)
  • Fix identical_to precedence (#1458)
  • Pandas 0.23 compatibility (#1458)
  • Preserve timezones in timestamp-typed literals (#1459)
  • Fix incorrect topological ordering of UNION expressions (#1501)
  • Fix projection fusion bug when attempting to fuse columns of the same name (#1496)
  • Fix output type for some decimal operations (#1541)

API Changes

  • The previous, private rules API has been rewritten (#1366)
  • Defining input arguments for operations happens in a more readable fashion instead of the previous [input_type]{.title-ref} list.
  • Removed support for async query execution (only Impala supported)
  • Remove support for Python 3.4 (#1326)
  • BigQuery division defaults to using IEEE_DIVIDE (#1390)
  • Add tolerance parameter to asof_join (#1443)

v0.13.0 (2018-03-30)

This release brings new backends, including support for executing against files, MySQL, Pandas user defined scalar and aggregations along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Backends

  • File Support for CSV & HDF5 (#1165, #1194)
  • File Support for Parquet Format (#1175, #1194)
  • Experimental support for MySQL thanks to \@kszucs (#1224)

New Features

  • Support for Unsigned Integer Types (#1194)
  • Support for Interval types and expressions with support for execution on the Impala and Clickhouse backends (#1243)
  • Isnan, isinf operations for float and double values (#1261)
  • Support for an interval with a quarter period (#1259)
  • ibis.pandas.from_dataframe convenience function (#1155)
  • Remove the restriction on ROW_NUMBER() requiring it to have an ORDER BY clause (#1371)
  • Add .get() operation on a Map type (#1376)
  • Allow visualization of custom defined expressions
  • Add experimental support for pandas UDFs/UDAFs (#1277)
  • Functions can be used as groupby keys (#1214, #1215)
  • Generalize the use of the where parameter to reduction operations (#1220)
  • Support for interval operations thanks to \@kszucs (#1243, #1260, #1249)
  • Support for the PARTITIONTIME column in the BigQuery backend (#1322)
  • Add arbitrary() method for selecting the first non null value in a column (#1230, #1309)
  • Windowed MultiQuantile operation in the pandas backend thanks to \@DiegoAlbertoTorres (#1343)
  • Rules for validating table expressions thanks to \@DiegoAlbertoTorres (#1298)
  • Complete end-to-end testing framework for all supported backends (#1256)
  • contains/not contains now supported in the pandas backend (#1210, #1211)
  • CI builds are now reproducible locally thanks to \@kszucs (#1121, #1237, #1255, #1311)
  • isnan/isinf operations thanks to \@kszucs (#1261)
  • Framework for generalized dtype and schema inference, and implicit casting thanks to \@kszucs (#1221, #1269)
  • Generic utilities for expression traversal thanks to \@kszucs (#1336)
  • day_of_week API (#306, #1047)
  • Design documentation for ibis (#1351)

Bug Fixes

  • Unbound parameters were failing in the simple case of a ibis.expr.types.TableExpr.mutate call with no operation (#1378)
  • Fix parameterized subqueries (#1300, #1331, #1303, #1378)
  • Fix subquery extraction, which wasn\'t happening in topological order (#1342)
  • Fix parenthesization if isnull (#1307)
  • Calling drop after mutate did not work (#1296, #1299)
  • SQLAlchemy backends were missing an implementation of ibis.expr.operations.NotContains.
  • Support REGEX_EXTRACT in PostgreSQL 10 (#1276, #1278)

API Changes

  • Fixing #1378 required the removal of the name parameter to the ibis.param function. Use the ibis.expr.types.Expr.name method instead.

v0.12.0 (2017-10-28)

This release brings Clickhouse and BigQuery SQL support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Backends

  • BigQuery backend (#1170), thanks to \@tsdlovell.
  • Clickhouse backend (#1127), thanks to \@kszucs.

New Features

  • Add support for Binary data type (#1183)
  • Allow users of the BigQuery client to define their own API proxy classes (#1188)
  • Add support for HAVING in the pandas backend (#1182)
  • Add struct field tab completion (#1178)
  • Add expressions for Map/Struct types and columns (#1166)
  • Support Table.asof_join (#1162)
  • Allow right side of arithmetic operations to take over (#1150)
  • Add a data_preload step in pandas backend (#1142)
  • expressions in join predicates in the pandas backend (#1138)
  • Scalar parameters (#1075)
  • Limited window function support for pandas (#1083)
  • Implement Time datatype (#1105)
  • Implement array ops for pandas (#1100)
  • support for passing multiple quantiles in .quantile() (#1094)
  • support for clip and quantile ops on DoubleColumns (#1090)
  • Enable unary math operations for pandas, sqlite (#1071)
  • Enable casting from strings to temporal types (#1076)
  • Allow selection of whole tables in pandas joins (#1072)
  • Implement comparison for string vs date and timestamp types (#1065)
  • Implement isnull and notnull for pandas (#1066)
  • Allow like operation to accept a list of conditions to match (#1061)
  • Add a pre_execute step in pandas backend (#1189)

Bug Fixes

  • Remove global expression caching to ensure repeatable code generation (#1179, #1181)
  • Fix ORDER BY generation without a GROUP BY (#1180, #1181)
  • Ensure that ~ibis.expr.datatypes.DataType and subclasses hash properly (#1172)
  • Ensure that the pandas backend can deal with unary operations in groupby
  • (#1182)
  • Incorrect impala code generated for NOT with complex argument (#1176)
  • BUG/CLN: Fix predicates on Selections on Joins (#1149)
  • Don\'t use SET LOCAL to allow redshift to work (#1163)
  • Allow empty arrays as arguments (#1154)
  • Fix column renaming in groupby keys (#1151)
  • Ensure that we only cast if timezone is not None (#1147)
  • Fix location of conftest.py (#1107)
  • TST/Make sure we drop tables during postgres testing (#1101)
  • Fix misleading join error message (#1086)
  • BUG/TST: Make hdfs an optional dependency (#1082)
  • Memoization should include expression name where available (#1080)

Performance Enhancements

  • Speed up imports (#1074)
  • Fix execution perf of groupby and selection (#1073)
  • Use normalize for casting to dates in pandas (#1070)
  • Speed up pandas groupby (#1067)

Contributors

The following people contributed to the 0.12.0 release :

$ git shortlog -sn --no-merges v0.11.2..v0.12.0
63  Phillip Cloud
 8  Jeff Reback
 2  Krisztián Szűcs
 2  Tory Haavik
 1  Anirudh
 1  Szucs Krisztian
 1  dlovell
 1  kwangin

0.11.0 (2017-06-28)

This release brings initial Pandas backend support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.

New Features

  • Experimental pandas backend to allow execution of ibis expression against pandas DataFrames
  • Graphviz visualization of ibis expressions. Implements _repr_png_ for Jupyter Notebook functionality
  • Ability to create a partitioned table from an ibis expression
  • Support for missing operations in the SQLite backend: sqrt, power, variance, and standard deviation, regular expression functions, and missing power support for PostgreSQL
  • Support for schemas inside databases with the PostgreSQL backend
  • Appveyor testing on core ibis across all supported Python versions
  • Add year/month/day methods to date types
  • Ability to sort, group by and project columns according to positional index rather than only by name
  • Added a type parameter to ibis.literal to allow user specification of literal types

Bug Fixes

  • Fix broken conda recipe
  • Fix incorrectly typed fillna operation
  • Fix postgres boolean summary operations
  • Fix kudu support to reflect client API Changes
  • Fix equality of nested types and construction of nested types when the value type is specified as a string

API Changes

  • Deprecate passing integer values to the ibis.timestamp literal constructor, this will be removed in 0.12.0
  • Added the admin_timeout parameter to the kudu client connect function

Contributors

$ git shortlog --summary --numbered v0.10.0..v0.11.0

  58 Phillip Cloud
   1 Greg Rahn
   1 Marius van Niekerk
   1 Tarun Gogineni
   1 Wes McKinney

0.8 (2016-05-19)

This release brings initial PostgreSQL backend support along with a number of critical bug fixes and usability improvements. As several correctness bugs with the SQL compiler were fixed, we recommend that all users upgrade from earlier versions of Ibis.

New Features

  • Initial PostgreSQL backend contributed by Phillip Cloud.
  • Add groupby as an alias for group_by to table expressions

Bug Fixes

  • Fix an expression error when filtering based on a new field
  • Fix Impala\'s SQL compilation of using OR with compound filters
  • Various fixes with the having(...) function in grouped table expressions
  • Fix CTE (WITH) extraction inside UNION ALL expressions.
  • Fix ImportError on Python 2 when mock library not installed

API Changes

  • The deprecated ibis.impala_connect and ibis.make_client APIs have been removed

0.7 (2016-03-16)

This release brings initial Kudu-Impala integration and improved Impala and SQLite support, along with several critical bug fixes.

New Features

  • Apache Kudu (incubating) integration for Impala users. Will add some documentation here when possible.
  • Add use_https option to ibis.hdfs_connect for WebHDFS connections in secure (Kerberized) clusters without SSL enabled.
  • Correctly compile aggregate expressions involving multiple subqueries.

To explain this last point in more detail, suppose you had:

table = ibis.table([('flag', 'string'),
                    ('value', 'double')],
                   'tbl')

flagged = table[table.flag == '1']
unflagged = table[table.flag == '0']

fv = flagged.value
uv = unflagged.value

expr = (fv.mean() / fv.sum()) - (uv.mean() / uv.sum())

The last expression now generates the correct Impala or SQLite SQL:

SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
  SELECT avg(`value`) / sum(`value`) AS `tmp`
  FROM tbl
  WHERE `flag` = '1'
) t0
  CROSS JOIN (
    SELECT avg(`value`) / sum(`value`) AS `tmp`
    FROM tbl
    WHERE `flag` = '0'
  ) t1

Bug Fixes

  • CHAR(n) and VARCHAR(n) Impala types now correctly map to Ibis string expressions
  • Fix inappropriate projection-join-filter expression rewrites resulting in incorrect generated SQL.
  • ImpalaClient.create_table correctly passes STORED AS PARQUET for format='parquet'.
  • Fixed several issues with Ibis dependencies (impyla, thriftpy, sasl, thrift_sasl), especially for secure clusters. Upgrading will pull in these new dependencies.
  • Do not fail in ibis.impala.connect when trying to create the temporary Ibis database if no HDFS connection passed.
  • Fix join predicate evaluation bug when column names overlap with table attributes.
  • Fix handling of fully-materialized joins (aka select * joins) in SQLAlchemy / SQLite.

Contributors

Thank you to all who contributed patches to this release.

$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
    21 Wes McKinney
     1 Uri Laserson
     1 Kristopher Overholt

0.6 (2015-12-01)

This release brings expanded pandas and Impala integration, including support for managing partitioned tables in Impala. See the new Ibis for Impala Users guide for more on using Ibis with Impala.

The Ibis for SQL Programmers guide also was written since the 0.5 release.

This release also includes bug fixes affecting generated SQL correctness. All users should upgrade as soon as possible.

New Features

  • New integrated Impala functionality. See Ibis for Impala Users for more details on these things.
    • Improved Impala-pandas integration. Create tables or insert into existing tables from pandas DataFrame objects.
    • Partitioned table metadata management API. Add, drop, alter, and insert into table partitions.
    • Add is_partitioned property to ImpalaTable.
    • Added support for LOAD DATA DDL using the load_data function, also supporting partitioned tables.
    • Modify table metadata (location, format, SerDe properties etc.) using ImpalaTable.alter
    • Interrupting Impala expression execution with Control-C will attempt to cancel the running query with the server.
    • Set the compression codec (e.g. snappy) used with ImpalaClient.set_compression_codec.
    • Get and set query options for a client session with ImpalaClient.get_options and ImpalaClient.set_options.
    • Add ImpalaTable.metadata method that parses the output of the DESCRIBE FORMATTED DDL to simplify table metadata inspection.
    • Add ImpalaTable.stats and ImpalaTable.column_stats to see computed table and partition statistics.
    • Add CHAR and VARCHAR handling
    • Add refresh, invalidate_metadata DDL options and add incremental option to compute_stats for COMPUTE INCREMENTAL STATS.
  • Add substitute method for performing multiple value substitutions in an array or scalar expression.
  • Division is by default true division like Python 3 for all numeric data. This means for SQL systems that use C-style division semantics, the appropriate CAST will be automatically inserted in the generated SQL.
  • Easier joins on tables with overlapping column names. See Ibis for SQL Programmers.
  • Expressions like string_expr[:3] now work as expected.
  • Add coalesce instance method to all value expressions.
  • Passing limit=None to the execute method on expressions disables any default row limits.

API Changes

  • ImpalaTable.rename no longer mutates the calling table expression.

Contributors

$ git log v0.5.0..v0.6.0 --pretty=format:%aN | sort | uniq -c | sort -rn
46 Wes McKinney
 3 Uri Laserson
 1 Phillip Cloud
 1 mariusvniekerk
 1 Kristopher Overholt

0.5 (2015-09-10)

Highlights in this release are the SQLite, Python 3, Impala UDA support, and an asynchronous execution API. There are also many usability improvements, bug fixes, and other new features.

New Features

  • SQLite client and built-in function support
  • Ibis now supports Python 3.4 as well as 2.6 and 2.7
  • Ibis can utilize Impala user-defined aggregate (UDA) functions
  • SQLAlchemy-based translation toolchain to enable more SQL engines having SQLAlchemy dialects to be supported
  • Many window function usability improvements (nested analytic functions and deferred binding conveniences)
  • More convenient aggregation with keyword arguments in aggregate functions
  • Built preliminary wrapper API for MADLib-on-Impala
  • Add var and std aggregation methods and support in Impala
  • Add nullifzero numeric method for all SQL engines
  • Add rename method to Impala tables (for renaming tables in the Hive metastore)
  • Add close method to ImpalaClient for session cleanup (#533)
  • Add relabel method to table expressions
  • Add insert method to Impala tables
  • Add compile and verify methods to all expressions to test compilation and ability to compile (since many operations are unavailable in SQLite, for example)

API Changes

  • Impala Ibis client creation now uses only ibis.impala.connect, and ibis.make_client has been deprecated

Contributors

$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
      55 Wes McKinney
      9 Uri Laserson
      1 Kristopher Overholt

0.4 (2015-08-14)

New Features

  • Add tooling to use Impala C++ scalar UDFs within Ibis (#262, #195)
  • Support and testing for Kerberos-enabled secure HDFS clusters
  • Many table functions can now accept functions as parameters (invoked on the calling table) to enhance composability and emulate late-binding semantics of languages (like R) that have non-standard evaluation (#460)
  • Add any, all, notany, and notall reductions on boolean arrays, as well as cumany and cumall
  • Using topk now produces an analytic expression that is executable (as an aggregation) but can also be used as a filter as before (#392, #91)
  • Added experimental database object \"usability layer\", see ImpalaClient.database.
  • Add TableExpr.info
  • Add compute_stats API to table expressions referencing physical Impala tables
  • Add explain method to ImpalaClient to show query plan for an expression
  • Add chmod and chown APIs to HDFS interface for superusers
  • Add convert_base method to strings and integer types
  • Add option to ImpalaClient.create_table to create empty partitioned tables
  • ibis.cross_join can now join more than 2 tables at once
  • Add ImpalaClient.raw_sql method for running naked SQL queries
  • ImpalaClient.insert now validates schemas locally prior to sending query to cluster, for better usability.
  • Add conda installation recipes

Contributors

$ git log v0.3.0..v0.4.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     38 Wes McKinney
      9 Uri Laserson
      2 Meghana Vuyyuru
      2 Kristopher Overholt
      1 Marius van Niekerk

0.3 (2015-07-20)

First public release. See https://ibis-project.org for more.

New Features

  • Implement window / analytic function support
  • Enable non-equijoins (join clauses with operations other than ==).
  • Add remaining string functions supported by Impala.
  • Add pipe method to tables (hat-tip to the pandas dev team).
  • Add mutate convenience method to tables.
  • Fleshed out WebHDFS implementations: get/put directories, move files, etc. See the full HDFS API.
  • Add truncate method for timestamp values
  • ImpalaClient can execute scalar expressions not involving any table.
  • Can also create internal Impala tables with a specific HDFS path.
  • Make Ibis\'s temporary Impala database and HDFS paths configurable (see ibis.options).
  • Add truncate_table function to client (if the user\'s Impala cluster supports it).
  • Python 2.6 compatibility
  • Enable Ibis to execute concurrent queries in multithreaded applications (earlier versions were not thread-safe).
  • Test data load script in scripts/load_test_data.py
  • Add an internal operation type signature API to enhance developer productivity.

Contributors

$ git log v0.2.0..v0.3.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     59 Wes McKinney
     29 Uri Laserson
      4 Isaac Hodes
      2 Meghana Vuyyuru

0.2 (2015-06-16)

New Features

  • insert method on Ibis client for inserting data into existing tables.
  • parquet_file, delimited_file, and avro_file client methods for querying datasets not yet available in Impala
  • New ibis.hdfs_connect method and HDFS client API for WebHDFS for writing files and directories to HDFS
  • New timedelta API and improved timestamp data support
  • New bucket and histogram methods on numeric expressions
  • New category logical datatype for handling bucketed data, among other things
  • Add summary API to numeric expressions
  • Add value_counts convenience API to array expressions
  • New string methods like, rlike, and contains for fuzzy and regex searching
  • Add options.verbose option and configurable options.verbose_log callback function for improved query logging and visibility
  • Support for new SQL built-in functions
    • ibis.coalesce
    • ibis.greatest and ibis.least
    • ibis.where for conditional logic (see also ibis.case and ibis.cases)
    • nullif method on value expressions
    • ibis.now
  • New aggregate functions: approx_median, approx_nunique, and group_concat
  • where argument in aggregate functions
  • Add having method to group_by intermediate object
  • Added group-by convenience table.group_by(exprs).COLUMN_NAME.agg_function()
  • Add default expression names to most aggregate functions
  • New Impala database client helper methods
    • create_database
    • drop_database
    • exists_database
    • list_databases
    • set_database
  • Client list_tables searching / listing method
  • Add add, sub, and other explicit arithmetic methods to value expressions

API Changes

  • New Ibis client and Impala connection workflow. Client now combined from an Impala connection and an optional HDFS connection

Bug Fixes

  • Numerous expression API bug fixes and rough edges fixed

Contributors

$ git log v0.1.0..v0.2.0 --pretty=format:%aN | sort | uniq -c | sort -rn
     71 Wes McKinney
      1 Juliet Hougland
      1 Isaac Hodes

0.1 (2015-03-26)

First Ibis release.

  • Expression DSL design and type system
  • Expression to ImpalaSQL compiler toolchain
  • Impala built-in function wrappers

    $ git log 84d0435..v0.1.0 --pretty=format:%aN | sort | uniq -c | sort -rn 78 Wes McKinney 1 srus 1 Henry Robinson


Last update: April 28, 2022