API Reference

Top-level expression APIs

These methods are available directly in the ibis module namespace.

case()

Similar to the .case method on array expressions, create a case builder that accepts self-contained boolean expressions (as opposed to expressions which are to be equality-compared with a fixed value expression)

literal(value[, type])

Create a scalar expression from a Python value.

schema([pairs, names, types])

Validate and return an Ibis Schema object

table(schema[, name])

Create an unbound Ibis table for creating expressions.

timestamp(value[, timezone])

Returns a timestamp literal if value is likely coercible to a timestamp

where(boolean_expr, true_expr, false_null_expr)

Equivalent to the ternary expression: if X then Y else Z

ifelse(arg, true_expr, false_expr)

Shorthand for implementing ternary expressions

coalesce(*args)

Compute the first non-null value(s) from the passed arguments in left-to-right order.

greatest(*args)

Compute the largest value (row-wise, if any arrays are present) among the supplied arguments.

least(*args)

Compute the smallest value (row-wise, if any arrays are present) among the supplied arguments.

negate(arg)

Negate a numeric expression

desc(expr)

Create a sort key (when used in sort_by) by the passed array expression or column name.

now()

Compute the current timestamp

NA

null()

Create a NULL/NA scalar

expr_list(exprs)

row_number()

Analytic function for the current row number, starting at 0.

window([preceding, following, group_by, …])

Create a window clause for use with window functions.

range_window([preceding, following, …])

Create a range-based window clause for use with window functions.

trailing_window(preceding[, group_by, order_by])

Create a trailing window for use with aggregate window functions.

cumulative_window([group_by, order_by])

Create a cumulative window for use with aggregate window functions.

trailing_range_window(preceding, order_by[, …])

Create a trailing time window for use with aggregate window functions.

random()

Return a random floating point number in the range [0.0, 1.0).

General expression methods

Expr.compile([limit, timecontext, params])

Compile expression to whatever execution target, to verify

Expr.equals(other[, cache])

Expr.execute([limit, timecontext, params])

If this expression is based on physical tables in a database backend, execute it against that backend.

Expr.pipe(f, *args, **kwargs)

Generic composition function to enable expression pipelining.

Expr.verify()

Returns True if expression can be compiled to its attached client

Table methods

TableExpr.aggregate([metrics, by, having])

Aggregate a table with a given set of reductions, with grouping expressions, and post-aggregation filters.

TableExpr.count()

Returns the computed number of rows in the table expression

TableExpr.distinct()

Compute set of unique rows/tuples occurring in this table

TableExpr.drop(fields)

TableExpr.info([buf])

Similar to pandas DataFrame.info.

TableExpr.filter(predicates)

Select rows from table based on boolean expressions

TableExpr.get_column(name)

Get a reference to a single column from the table

TableExpr.get_columns(iterable)

Get multiple columns from the table

TableExpr.group_by([by])

Create an intermediate grouped table expression, pending some group operation to be applied with it.

TableExpr.groupby([by])

Create an intermediate grouped table expression, pending some group operation to be applied with it.

TableExpr.limit(n[, offset])

Select the first n rows at beginning of table (may not be deterministic depending on implementation and presence of a sorting).

TableExpr.mutate([exprs])

Convenience function for table projections involving adding columns

TableExpr.projection(exprs)

Compute new table expression with the indicated column expressions from this table.

TableExpr.relabel(substitutions[, replacements])

Change table column names, otherwise leaving table unaltered

TableExpr.rowid()

An autonumeric representing the row number of the results.

TableExpr.schema()

Get the schema for this table (if one is known)

TableExpr.set_column(name, expr)

Replace an existing column with a new expression

TableExpr.sort_by(sort_exprs)

Sort table by the indicated column expressions and sort orders (ascending/descending)

TableExpr.union(right[, distinct])

Form the table set union of two table expressions having identical schemas.

TableExpr.view()

Create a new table expression that is semantically equivalent to the current one, but is considered a distinct relation for evaluation purposes (e.g.

TableExpr.join(right[, predicates, how])

Perform a relational join between two tables.

TableExpr.cross_join(**kwargs)

Perform a cross join (cartesian product) amongst a list of tables, with optional set of prefixes to apply to overlapping column names

TableExpr.inner_join(other[, predicates])

Perform a relational join between two tables.

TableExpr.left_join(other[, predicates])

Perform a relational join between two tables.

TableExpr.outer_join(other[, predicates])

Perform a relational join between two tables.

TableExpr.semi_join(other[, predicates])

Perform a relational join between two tables.

TableExpr.anti_join(other[, predicates])

Perform a relational join between two tables.

Grouped table methods

GroupedTableExpr.aggregate([metrics])

GroupedTableExpr.count([metric_name])

Convenience function for computing the group sizes (number of rows per group) given a grouped table.

GroupedTableExpr.having(expr)

Add a post-aggregation result filter (like the having argument in aggregate), for composability with the group_by API

GroupedTableExpr.mutate([exprs])

Returns a table projection with analytic / window functions applied.

GroupedTableExpr.order_by(expr)

Expressions to use for ordering data for a window function computation.

GroupedTableExpr.over(window)

Add a window clause to be applied to downstream analytic expressions

GroupedTableExpr.projection(exprs)

Like mutate, but do not include existing table columns

GroupedTableExpr.size([metric_name])

Convenience function for computing the group sizes (number of rows per group) given a grouped table.

Generic value methods

Scalar or column methods

ValueExpr.between(lower, upper)

Check if the input expr falls between the lower/upper bounds passed.

ValueExpr.cast(target_type)

Cast value(s) to indicated data type.

ValueExpr.coalesce()

Compute the first non-null value(s) from the passed arguments in left-to-right order.

ValueExpr.fillna(fill_value)

Replace any null values with the indicated fill value

ValueExpr.isin(values)

Check whether the value expression is contained within the indicated list of values.

ValueExpr.notin(values)

Like isin, but checks whether this expression’s value(s) are not contained in the passed values.

ValueExpr.nullif(null_if_expr)

Set values to null if they match/equal a particular expression (scalar or array-valued).

ValueExpr.hash([how])

Compute an integer hash value for the indicated value expression.

ValueExpr.isnull()

Returns true if values are null

ValueExpr.notnull()

Returns true if values are not null

ValueExpr.over(window)

Turn an aggregation or full-sample analytic operation into a windowed operation.

ValueExpr.typeof()

Return the data type of the argument according to the current backend

ValueExpr.case()

Create a new SimpleCaseBuilder to chain multiple if-else statements.

ValueExpr.cases(case_result_pairs[, default])

Create a case expression in one shot.

ValueExpr.substitute(value[, replacement, else_])

Substitute (replace) one or more values in a value expression

Column methods

ColumnExpr.distinct()

Compute set of unique values occurring in this array.

ColumnExpr.count([where])

Compute cardinality / sequence size of expression.

ColumnExpr.min([where])

ColumnExpr.max([where])

ColumnExpr.approx_median([where])

ColumnExpr.approx_nunique([where])

ColumnExpr.group_concat([sep, where])

Concatenate values using the indicated separator (comma by default) to produce a string

ColumnExpr.nunique([where])

ColumnExpr.summary([exact_nunique, prefix])

Compute a set of summary metrics from the input value expression

ColumnExpr.value_counts([metric_name])

Compute a frequency table for this value expression

ColumnExpr.first()

ColumnExpr.last()

ColumnExpr.dense_rank()

Compute position of first element within each equal-value group in sorted order, ignoring duplicate values.

ColumnExpr.rank()

Compute position of first element within each equal-value group in sorted order.

ColumnExpr.lag([offset, default])

ColumnExpr.lead([offset, default])

ColumnExpr.cummin()

Cumulative min.

ColumnExpr.cummax()

Cumulative max.

General numeric methods

Scalar or column methods

NumericValue.abs()

Absolute value

NumericValue.ceil()

Round up to the nearest integer value greater than or equal to this value

NumericValue.floor()

Round down to the nearest integer value less than or equal to this value

NumericValue.sign()

NumericValue.exp()

NumericValue.sqrt()

NumericValue.log([base])

Perform the logarithm using a specified base

NumericValue.ln()

Natural logarithm

NumericValue.log2()

Logarithm base 2

NumericValue.log10()

Logarithm base 10

NumericValue.round([digits])

Round values either to integer or indicated number of decimal places.

NumericValue.nullifzero()

Set values to NULL if they equal to zero.

NumericValue.zeroifnull()

NumericValue.add(other)

NumericValue.sub(other)

NumericValue.mul(other)

NumericValue.div(other)

NumericValue.pow(other)

NumericValue.rdiv(other)

NumericValue.rsub(other)

Column methods

NumericColumn.sum([where])

NumericColumn.mean([where])

NumericColumn.std([where, how])

Compute standard deviation of numeric array

NumericColumn.var([where, how])

Compute standard deviation of numeric array

NumericColumn.cumsum()

Cumulative sum.

NumericColumn.cummean()

Cumulative mean.

NumericColumn.bottomk(k[, by])

NumericColumn.topk(k[, by])

returns

topk

NumericColumn.bucket(buckets[, closed, …])

Compute a discrete binning of a numeric array

NumericColumn.histogram([nbins, binwidth, …])

Compute a histogram with fixed width bins

Integer methods

Scalar or column methods

IntegerValue.convert_base(from_base, to_base)

Convert number (as integer or string) from one base to another

IntegerValue.to_timestamp([unit])

Convert integer UNIX timestamp (at some resolution) to a timestamp type

String methods

All string operations are valid either on scalar or array values

StringValue.convert_base(from_base, to_base)

Convert number (as integer or string) from one base to another

StringValue.length()

Compute length of strings

StringValue.lower()

Convert string to all lowercase

StringValue.upper()

Convert string to all uppercase

StringValue.reverse()

Reverse string

StringValue.ascii_str()

StringValue.strip()

Remove whitespace from left and right sides of string

StringValue.lstrip()

Remove whitespace from left side of string

StringValue.rstrip()

Remove whitespace from right side of string

StringValue.capitalize()

Return a capitalized version of input string

StringValue.contains(substr)

Determine if indicated string is exactly contained in the calling string.

StringValue.like(patterns)

Wildcard fuzzy matching function equivalent to the SQL LIKE directive.

StringValue.to_timestamp(format_str[, timezone])

Parses a string and returns a timestamp.

StringValue.parse_url(extract[, key])

Returns the portion of a URL corresponding to a part specified by ‘extract’ Can optionally specify a key to retrieve an associated value if extract parameter is ‘QUERY’

StringValue.substr(start[, length])

Pull substrings out of each string value by position and maximum length.

StringValue.left(nchars)

Return left-most up to N characters from each string.

StringValue.right(nchars)

Return up to nchars starting from end of each string.

StringValue.repeat(n)

Returns the argument string repeated n times

StringValue.find(substr[, start, end])

Returns position (0 indexed) of first occurence of substring, optionally after a particular position (0 indexed)

StringValue.translate(from_str, to_str)

Returns string with set of ‘from’ characters replaced by set of ‘to’ characters.

StringValue.find_in_set(str_list)

Returns postion (0 indexed) of first occurence of argument within a list of strings.

StringValue.join(strings)

Joins a list of strings together using the calling string as a separator

StringValue.replace(pattern, replacement)

Replaces each exactly occurrence of pattern with given replacement string.

StringValue.lpad(length[, pad])

Returns string of given length by truncating (on right) or padding (on left) original string

StringValue.rpad(length[, pad])

Returns string of given length by truncating (on right) or padding (on right) original string

StringValue.rlike(pattern)

Search string values using a regular expression.

StringValue.re_search(pattern)

Search string values using a regular expression.

StringValue.re_extract(pattern, index)

Returns specified index, 0 indexed, from string based on regex pattern given

StringValue.re_replace(pattern, replacement)

Replaces match found by regex with replacement string.

Timestamp methods

All timestamp operations are valid either on scalar or array values

TimestampValue.strftime(format_str)

Format timestamp according to the passed format string.

TimestampValue.year()

TimestampValue.month()

TimestampValue.day()

TimestampValue.day_of_week

Namespace expression containing methods for extracting information about the day of the week of a TimestampValue or DateValue expression.

TimestampValue.epoch_seconds()

TimestampValue.hour()

TimestampValue.minute()

TimestampValue.second()

TimestampValue.millisecond()

TimestampValue.truncate(unit)

Zero out smaller-size units beyond indicated unit.

TimestampValue.time()

Return a Time node for a Timestamp.

TimestampValue.date()

Return a Date for a Timestamp.

TimestampValue.add(other)

TimestampValue.radd(other)

TimestampValue.sub(right)

TimestampValue.rsub(right)

Date methods

DateValue.strftime(format_str)

Format timestamp according to the passed format string.

DateValue.year()

DateValue.month()

DateValue.day()

DateValue.day_of_week

Namespace expression containing methods for extracting information about the day of the week of a TimestampValue or DateValue expression.

DateValue.epoch_seconds()

DateValue.truncate(unit)

Zero out smaller-size units beyond indicated unit.

DateValue.add(other)

DateValue.radd(other)

DateValue.sub(right)

DateValue.rsub(right)

Day of week methods

DayOfWeek.index()

Get the index of the day of the week.

DayOfWeek.full_name()

Get the name of the day of the week.

Time methods

TimeValue.between(lower, upper[, timezone])

Check if the input expr falls between the lower/upper bounds passed.

TimeValue.truncate(unit)

Zero out smaller-size units beyond indicated unit.

TimeValue.hour()

TimeValue.minute()

TimeValue.second()

TimeValue.millisecond()

TimeValue.add(other)

TimeValue.radd(other)

TimeValue.sub(right)

TimeValue.rsub(right)

Interval methods

IntervalValue.to_unit(target_unit)

IntervalValue.years

Extract the number of years from an IntervalValue expression.

IntervalValue.quarters

Extract the number of quarters from an IntervalValue expression.

IntervalValue.months

Extract the number of months from an IntervalValue expression.

IntervalValue.weeks

Extract the number of weeks from an IntervalValue expression.

IntervalValue.days

Extract the number of days from an IntervalValue expression.

IntervalValue.hours

Extract the number of hours from an IntervalValue expression.

IntervalValue.minutes

Extract the number of minutes from an IntervalValue expression.

IntervalValue.seconds

Extract the number of seconds from an IntervalValue expression.

IntervalValue.milliseconds

Extract the number of milliseconds from an IntervalValue expression.

IntervalValue.microseconds

Extract the number of microseconds from an IntervalValue expression.

IntervalValue.nanoseconds

Extract the number of nanoseconds from an IntervalValue expression.

IntervalValue.add(other)

IntervalValue.radd(other)

IntervalValue.sub(other)

IntervalValue.mul(other)

IntervalValue.rmul(other)

IntervalValue.floordiv(other)

IntervalValue.negate()

Negate a numeric expression

Boolean methods

BooleanValue.ifelse(true_expr, false_expr)

Shorthand for implementing ternary expressions

BooleanColumn.any()

BooleanColumn.all()

BooleanColumn.cumany()

BooleanColumn.cumall()

Category methods

Category is a logical type with either a known or unknown cardinality. Values are represented semantically as integers starting at 0.

CategoryValue.label(labels[, nulls])

Format a known number of categories as strings

Geospatial methods

Scalar or column methods

GeoSpatialValue.area()

Compute area of a geo spatial data

GeoSpatialValue.as_binary()

Get the geometry as well-known bytes (WKB) without the SRID data.

GeoSpatialValue.as_ewkb()

Get the geometry as well-known bytes (WKB) with the SRID data.

GeoSpatialValue.as_ewkt()

Get the geometry as well-known text (WKT) with the SRID data.

GeoSpatialValue.as_text()

Get the geometry as well-known text (WKT) without the SRID data.

GeoSpatialValue.azimuth(right)

Check if the geometries have at least one point in common, but do not intersect.

GeoSpatialValue.buffer(radius)

Returns a geometry that represents all points whose distance from this Geometry is less than or equal to distance.

GeoSpatialValue.centroid()

Returns the centroid of the geometry.

GeoSpatialValue.contains(right)

Check if the first geometry contains the second one

GeoSpatialValue.contains_properly(right)

Check if the first geometry contains the second one, with no common border points.

GeoSpatialValue.covers(right)

Check if the first geometry covers the second one.

GeoSpatialValue.covered_by(right)

Check if the first geometry is covered by the second one.

GeoSpatialValue.crosses(right)

Check if the geometries have some, but not all, interior points in common.

GeoSpatialValue.d_fully_within(right, distance)

Check if the first geometry is fully within a specified distance from the second one.

GeoSpatialValue.d_within(right, distance)

Check if the first geometry is within a specified distance from the second one.

GeoSpatialValue.difference(right)

Return the difference of two geometries.

GeoSpatialValue.disjoint(right)

Check if the geometries have no points in common.

GeoSpatialValue.distance(right)

Compute distance between two geo spatial data

GeoSpatialValue.end_point()

Returns the last point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING

GeoSpatialValue.envelope()

Returns a geometry representing the bounding box of the arg.

GeoSpatialValue.equals(right)

Check if the geometries are the same.

GeoSpatialValue.geometry_n(n)

Get the 1-based Nth geometry of a multi geometry.

GeoSpatialValue.geometry_type()

Get the type of a geometry.

GeoSpatialValue.intersection(right)

Return the intersection of two geometries.

GeoSpatialValue.intersects(right)

Check if the geometries share any points.

GeoSpatialValue.is_valid()

Check if the geometry is valid.

GeoSpatialValue.line_locate_point(right)

Locate the distance a point falls along the length of a line.

GeoSpatialValue.line_merge()

Merge a MultiLineString into a LineString.

GeoSpatialValue.line_substring(start, end)

Clip a substring from a LineString.

GeoSpatialValue.length()

Compute length of a geo spatial data

GeoSpatialValue.max_distance(right)

Returns the 2-dimensional maximum distance between two geometries in projected units.

GeoSpatialValue.n_points()

Return the number of points in a geometry.

GeoSpatialValue.n_rings()

If the geometry is a polygon or multi-polygon returns the number of rings.

GeoSpatialValue.ordering_equals(right)

Check if two geometries are equal and have the same point ordering.

GeoSpatialValue.overlaps(right)

Check if the geometries share space, are of the same dimension, but are not completely contained by each other.

GeoSpatialValue.perimeter()

Compute perimeter of a geo spatial data

GeoSpatialValue.point_n(n)

Return the Nth point in a single linestring in the geometry.

GeoSpatialValue.set_srid(srid)

Set the spatial reference identifier for the ST_Geometry

GeoSpatialValue.simplify(tolerance, …)

Simplify a given geometry.

GeoSpatialValue.srid()

Returns the spatial reference identifier for the ST_Geometry

GeoSpatialValue.start_point()

Returns the first point of a LINESTRING geometry as a POINT or NULL if the input parameter is not a LINESTRING

GeoSpatialValue.touches(right)

Check if the geometries have at least one point in common, but do not intersect.

GeoSpatialValue.transform(srid)

Transform a geometry into a new SRID.

GeoSpatialValue.union(right)

Merge two geometries into a union geometry.

GeoSpatialValue.within(right)

Check if the first geometry is completely inside of the second.

GeoSpatialValue.x()

Return the X coordinate of the point, or NULL if not available.

GeoSpatialValue.x_max()

Returns X maxima of a geometry

GeoSpatialValue.x_min()

Returns Y minima of a geometry

GeoSpatialValue.y()

Return the Y coordinate of the point, or NULL if not available.

GeoSpatialValue.y_max()

Returns Y maxima of a geometry

GeoSpatialValue.y_min()

Returns Y minima of a geometry

Column methods

GeoSpatialColumn.unary_union()

Aggregate a set of geometries into a union.

HDFS

Client objects have an hdfs attribute you can use to interact directly with HDFS.

hdfs_connect([host, port, protocol, …])

Connect to HDFS.

HDFS.ls(hdfs_path[, status])

Return contents of directory.

HDFS.chmod(hdfs_path, permissions)

Change permissions of a file of directory.

HDFS.chown(hdfs_path[, owner, group])

Change owner (and/or group) of a file or directory.

HDFS.get(hdfs_path[, local_path, overwrite])

Download remote file or directory to the local filesystem.

HDFS.head(hdfs_path[, nbytes, offset])

Retrieve the requested number of bytes from a file.

HDFS.put(hdfs_path, resource[, overwrite, …])

Write file or directory to HDFS.

HDFS.put_tarfile(hdfs_path, local_path[, …])

Write contents of tar archive to HDFS.

HDFS.rm(path)

Delete a single file.

HDFS.rmdir(path)

Delete a directory and all its contents.

HDFS.size(hdfs_path)

Return total size of file or directory.

HDFS.status(path)

Check if the status of the path.