Extending Ibis Part 2: Adding a New Reduction Expression

This notebook will show you how to add a new reduction operation last_date to the existing backend SQLite.

A reduction operation is a function that maps \(N\) rows to 1 row, for example the sum function.

Description

We’re going to add a ``last_date`` function to ibis. last_date simply returns the latest date of a list of dates.

Step 1: Define the Operation

Let’s define the last_date operation as a function that takes any date column as input and returns a date:

import datetime
import typing

def last_date(dates: typing.List[datetime.date]) -> datetime.date:
    """Latest date"""
[1]:
import ibis.expr.datatypes as dt
import ibis.expr.rules as rlz

from ibis.expr.operations import Reduction, Arg


class LastDate(Reduction):
    arg = Arg(rlz.column(rlz.date))
    where = Arg(rlz.boolean, default=None)
    output_type = rlz.scalar_like('arg')

We just defined a LastDate class that takes one date column as input, and returns a scalar output of the same type as the input. This matches both the requirements of a reduction and the spepcifics of the function that we want to implement.

Note: It is very important that you write the correct argument rules and output type here. The expression will not work otherwise.

Step 2: Define the API

Because every reduction in ibis has the ability to filter out values during aggregation (a typical feature in databases and analytics tools), to make an expression out of LastDate we need to pass an additional argument: where to our LastDate constructor.

[2]:
from ibis.expr.types import DateColumn  # not DateValue! reductions are only valid on columns


def last_date(date_column, where=None):
    return LastDate(date_column, where=where).to_expr()


DateColumn.last_date = last_date

Interlude: Create some expressions using last_date

[3]:
import ibis
[4]:
people = ibis.table([('name', 'string'), ('country', 'string'), ('date_of_birth', 'date')], name='people')
[5]:
people.date_of_birth.last_date()
[5]:
ref_0
UnboundTable[table]
  name: people
  schema:
    name : string
    country : string
    date_of_birth : date

LastDate[date]
  date_of_birth = Column[date*] 'date_of_birth' from table
    ref_0
  where:
    None
[6]:
people.date_of_birth.last_date(people.country == 'Indonesia')
[6]:
ref_0
UnboundTable[table]
  name: people
  schema:
    name : string
    country : string
    date_of_birth : date

LastDate[date]
  date_of_birth = Column[date*] 'date_of_birth' from table
    ref_0
  where:
    Equals[boolean*]
      left:
        country = Column[string*] 'country' from table
          ref_0
      right:
        Literal[string]
          Indonesia

Step 3: Turn the Expression into SQL

[7]:
import sqlalchemy as sa


@ibis.sqlite.add_operation(LastDate)
def _last_date(translator, expr):
    # pull out the arguments to the expression
    arg, where = expr.op().args

    # compile the argument
    compiled_arg = translator.translate(arg)

    # call the appropriate SQLite function (`max` for the latest/maximum date)
    agg = sa.func.max(compiled_arg)

    # handle a non-None filter clause
    if where is not None:
        return agg.filter(translator.translate(where))
    return agg

Step 4: Putting it all Together

[8]:
import pathlib
import ibis

db_fname = str(pathlib.Path().resolve().parent.parent / 'tutorial' / 'data' / 'geography.db')

con = ibis.sqlite.connect(db_fname)

Create and execute a bitwise_and expression

[9]:
independence = con.table('independence')
independence
[9]:
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string

Last country to gain independence in our database:

[10]:
expr = independence.independence_date.last_date()
expr
[10]:
ref_0
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string

LastDate[date]
  independence_date = Column[date*] 'independence_date' from table
    ref_0
  where:
    None
[11]:
sql_expr = expr.compile()
print(sql_expr)
SELECT max(t0.independence_date) AS tmp
FROM base.independence AS t0
[12]:
expr.execute()
[12]:
Timestamp('2011-07-09 00:00:00')

Last country to gain independence from the Spanish Empire, using the where parameter:

[13]:
expr = independence.independence_date.last_date(where=independence.independence_from == 'Spanish Empire')
expr
[13]:
ref_0
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string

LastDate[date]
  independence_date = Column[date*] 'independence_date' from table
    ref_0
  where:
    Equals[boolean*]
      left:
        independence_from = Column[string*] 'independence_from' from table
          ref_0
      right:
        Literal[string]
          Spanish Empire
[14]:
result = expr.execute()
result
[14]:
Timestamp('1898-06-12 00:00:00')