Adding an Elementwise Operation¶
This notebook will show you how to add a new elementwise operation to an existing backend.
We are going to add julianday
, a function supported by the SQLite database, to the SQLite Ibis backend.
The Julian day of a date, is the number of days since January 1st, 4713 BC. For more information check the Julian day wikipedia page.
Step 1: Define the Operation¶
Let's define the julianday
operation as a function that takes one string input argument and returns a float.
def julianday(date: str) -> float:
"""Julian date"""
import ibis.expr.datatypes as dt
import ibis.expr.rules as rlz
from ibis.expr.operations import ValueOp
class JulianDay(ValueOp):
arg = rlz.string
output_type = rlz.shape_like('arg', 'float')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [1], in <cell line: 7>() 2 import ibis.expr.rules as rlz 4 from ibis.expr.operations import ValueOp ----> 7 class JulianDay(ValueOp): 8 arg = rlz.string 9 output_type = rlz.shape_like('arg', 'float') Input In [1], in JulianDay() 7 class JulianDay(ValueOp): 8 arg = rlz.string ----> 9 output_type = rlz.shape_like('arg', 'float') TypeError: shape_like() takes 1 positional argument but 2 were given
We just defined a JulianDay
class that takes one argument of type string or binary, and returns a float.
Step 2: Define the API¶
Because we know the output type of the operation, to make an expression out of JulianDay
we simply need to construct it and call its ibis.expr.types.Node.to_expr
method.
We still need to add a method to StringValue
and BinaryValue
(this needs to work on both scalars and columns).
When you add a method to any of the expression classes whose name matches *Value
both the scalar and column child classes will pick it up, making it easy to define operations for both scalars and columns in one place.
We can do this by defining a function and assigning it to the appropriate class of expressions.
from ibis.expr.types import StringValue, BinaryValue
def julianday(string_value):
return JulianDay(string_value).to_expr()
StringValue.julianday = julianday
Interlude: Create some expressions with sha1
¶
import ibis
t = ibis.table([('string_col', 'string')], name='t')
t.string_col.julianday()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [3], in <cell line: 5>() 1 import ibis 3 t = ibis.table([('string_col', 'string')], name='t') ----> 5 t.string_col.julianday() Input In [2], in julianday(string_value) 4 def julianday(string_value): ----> 5 return JulianDay(string_value).to_expr() NameError: name 'JulianDay' is not defined
Step 3: Turn the Expression into SQL¶
import sqlalchemy as sa
@ibis.sqlite.add_operation(JulianDay)
def _julianday(translator, expr):
# pull out the arguments to the expression
arg, = expr.op().args
# compile the argument
compiled_arg = translator.translate(arg)
# return a SQLAlchemy expression that calls into the SQLite julianday function
return sa.func.julianday(compiled_arg)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [4], in <cell line: 4>() 1 import sqlalchemy as sa ----> 4 @ibis.sqlite.add_operation(JulianDay) 5 def _julianday(translator, expr): 6 # pull out the arguments to the expression 7 arg, = expr.op().args 9 # compile the argument NameError: name 'JulianDay' is not defined
Step 4: Putting it all Together¶
!curl -LsS -o $TEMPDIR/geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'
import os
import tempfile
import ibis
db_fname = os.path.join(tempfile.gettempdir(), 'geography.db')
con = ibis.sqlite.connect(db_fname)
Create and execute a julianday
expression¶
independence = con.table('independence')
independence
AlchemyTable: independence country_code string independence_date date independence_from string
day = independence.independence_date.cast('string')
day
r0 := AlchemyTable: independence country_code string independence_date date independence_from string cast(independence_date, string): Cast(r0.independence_date, to=string)
julianday_expr = day.julianday()
julianday_expr
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [9], in <cell line: 1>() ----> 1 julianday_expr = day.julianday() 2 julianday_expr Input In [2], in julianday(string_value) 4 def julianday(string_value): ----> 5 return JulianDay(string_value).to_expr() NameError: name 'JulianDay' is not defined
sql_expr = julianday_expr.compile()
print(sql_expr)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [10], in <cell line: 1>() ----> 1 sql_expr = julianday_expr.compile() 2 print(sql_expr) NameError: name 'julianday_expr' is not defined
result = julianday_expr.execute()
result.head()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [11], in <cell line: 1>() ----> 1 result = julianday_expr.execute() 2 result.head() NameError: name 'julianday_expr' is not defined
Because we've defined our operation on StringValue
, and not just on StringColumn
we get operations on both string scalars and string columns for free
scalar = ibis.literal('2010-03-14')
scalar
'2010-03-14'
julianday_scalar = scalar.julianday()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [13], in <cell line: 1>() ----> 1 julianday_scalar = scalar.julianday() Input In [2], in julianday(string_value) 4 def julianday(string_value): ----> 5 return JulianDay(string_value).to_expr() NameError: name 'JulianDay' is not defined
con.execute(julianday_scalar)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [14], in <cell line: 1>() ----> 1 con.execute(julianday_scalar) NameError: name 'julianday_scalar' is not defined