Extending Ibis Part 1: Adding a New Elementwise Expression

This notebook will show you how to add a new elementwise operation to an existing backend.

We are going to add julianday, a function supported by the SQLite database, to the SQLite Ibis backend.

The Julian day of a date, is the number of days since January 1st, 4713 BC. For more information check the Julian day wikipedia page.

Step 1: Define the Operation

Let’s define the julianday operation as a function that takes one string input argument and returns a float.

def julianday(date: str) -> float:
    """Julian date"""
[1]:
import ibis.expr.datatypes as dt
import ibis.expr.rules as rlz

from ibis.expr.operations import ValueOp
from ibis.expr.signature import Argument as Arg


class JulianDay(ValueOp):
    arg = Arg(rlz.string)
    output_type = rlz.shape_like('arg', 'float')

We just defined a JulianDay class that takes one argument of type string or binary, and returns a float.

Step 2: Define the API

Because we know the output type of the operation, to make an expression out of JulianDay we simply need to construct it and call its ibis.expr.types.Node.to_expr method.

We still need to add a method to StringValue and BinaryValue (this needs to work on both scalars and columns).

When you add a method to any of the expression classes whose name matches *Value both the scalar and column child classes will pick it up, making it easy to define operations for both scalars and columns in one place.

We can do this by defining a function and assigning it to the appropriate class of expressions.

[2]:
from ibis.expr.types import StringValue, BinaryValue


def julianday(string_value):
    return JulianDay(string_value).to_expr()


StringValue.julianday = julianday

Interlude: Create some expressions with sha1

[3]:
import ibis

t = ibis.table([('string_col', 'string')], name='t')

t.string_col.julianday()
[3]:
ref_0
UnboundTable[table]
  name: t
  schema:
    string_col : string

JulianDay[float32*]
  string_col = Column[string*] 'string_col' from table
    ref_0

Step 3: Turn the Expression into SQL

[4]:
import sqlalchemy as sa


@ibis.sqlite.add_operation(JulianDay)
def _julianday(translator, expr):
    # pull out the arguments to the expression
    arg, = expr.op().args

    # compile the argument
    compiled_arg = translator.translate(arg)

    # return a SQLAlchemy expression that calls into the SQLite julianday function
    return sa.func.julianday(compiled_arg)

Step 4: Putting it all Together

[5]:
import pathlib
import ibis

db_fname = str(pathlib.Path().resolve().parent.parent / 'tutorial' / 'data' / 'geography.db')

con = ibis.sqlite.connect(db_fname)

Create and execute a julianday expression

[6]:
independence = con.table('independence')
independence
[6]:
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string
[7]:
day = independence.independence_date.cast('string')
day
[7]:
ref_0
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string

cast(independence_date, string) = Cast[string*]
  independence_date = Column[date*] 'independence_date' from table
    ref_0
  to:
    string
[8]:
julianday_expr = day.julianday()
julianday_expr
[8]:
ref_0
AlchemyTable[table]
  name: independence
  schema:
    country_code : string
    independence_date : date
    independence_from : string

JulianDay[float32*]
  cast(independence_date, string) = Cast[string*]
    independence_date = Column[date*] 'independence_date' from table
      ref_0
    to:
      string
[9]:
sql_expr = julianday_expr.compile()
print(sql_expr)
SELECT julianday(CAST(t0.independence_date AS TEXT)) AS tmp
FROM base.independence AS t0
[10]:
result = julianday_expr.execute()
result.head()
[10]:
0    2422189.5
1    2419734.5
2    2437850.5
3    2442727.5
4    2444909.5
Name: tmp, dtype: float32

Because we’ve defined our operation on StringValue, and not just on StringColumn we get operations on both string scalars and string columns for free

[11]:
scalar = ibis.literal('2010-03-14')
scalar
[11]:
Literal[string]
  2010-03-14
[12]:
julianday_scalar = scalar.julianday()
[13]:
con.execute(julianday_scalar)
[13]:
2455269.5