Datatypes and Datashapes

Every value in Ibis has two important properties: a type and shape.

The type is probably familiar to you. It is something like

The shape is one of

Datatype Flavors

For some datatypes, there are further options that define them. For instance, Integer values can be signed or unsigned, and they have a precision. For example, “uint8”, “int64”, etc. These flavors don’t affect their capabilities (eg both signed and unsigned ints have a .abs() method), but the flavor does impact how the underlying backend performs the computation.

Capabilities

Depending on the combination of datatype and datashape, a value has different capabilities. For example:

  • All String values (both StringScalars and StringColumns) have the method .upper() that transforms the string to uppercase. Floating and Array values don’t have this method, of course.
  • IntegerColumn and FloatingColumn values have .mean(), .max(), etc methods because you can aggregate over them, since they are a collection of values. On the other hand, IntegerScalar and FloatingScalar values do not have these methods, because it doesn’t make sense to take the mean or max of a single value.
  • If you call .to_pandas() on these values, you get different results. Scalar shapes result in scalar objects:
    • IntegerScalar: NumPy int64 object (or whatever specific flavor).
    • FloatingScalar: NumPy float64 object (or whatever specific flavor).
    • StringScalar: plain python str object.
    • ArrayScalar: plain python list object.
  • On the other hand, Column shapes result in pandas.Series:
    • IntegerColumn: pd.Series of integers, with the same flavor. For example, if the IntegerColumn was specifically “uint16”, then the pandas series will hold a numpy array of type “uint16”.
    • FloatingColumn: pd.Series of numpy floats with the same flavor.
    • etc.

Broadcasting and Alignment

There are rules for how different datashapes are combined. This is similar to how SQL and NumPy handles merging datashapes, if you are familiar with them.

import ibis

ibis.options.interactive = True
t1 = ibis.examples.penguins.fetch().head(100)
t1
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.71813750male  2007 │
│ Adelie Torgersen39.517.41863800female2007 │
│ Adelie Torgersen40.318.01953250female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.31933450female2007 │
│ Adelie Torgersen39.320.61903650male  2007 │
│ Adelie Torgersen38.917.81813625female2007 │
│ Adelie Torgersen39.219.61954675male  2007 │
│ Adelie Torgersen34.118.11933475NULL2007 │
│ Adelie Torgersen42.020.21904250NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

We can look at the datatype of the year Column

t1.year.type()
Int64(nullable=True)

Combining two Scalars results in a Scalar:

t1.year.mean() + t1.year.std()

2008.0025189076296

Combining a Column and Scalar results in a Column:

t1.year + 1000
┏━━━━━━━━━━━━━━━━━┓
┃ Add(year, 1000) ┃
┡━━━━━━━━━━━━━━━━━┩
│ int64           │
├─────────────────┤
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│            3007 │
│                │
└─────────────────┘

Combining two Columns results in a Column:

t1.year + t1.bill_length_mm
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Add(year, bill_length_mm) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                   │
├───────────────────────────┤
│                    2046.1 │
│                    2046.5 │
│                    2047.3 │
│                      NULL │
│                    2043.7 │
│                    2046.3 │
│                    2045.9 │
│                    2046.2 │
│                    2041.1 │
│                    2049.0 │
│                          │
└───────────────────────────┘

One requirement that might surprise you if you are coming from NumPy is Ibis’s requirements on aligning Columns: In NumPy, if you have two arbitrary arrays, each of length 100, you can add them together, and it works because the elements are “lined up” based on position. Ibis is different. Because it is based around SQL, and SQL has no notion of inherent row ordering, you cannot “line up” any two Columns in Ibis: They both have to be derived from the same Table expression. For example:

t2 = ibis.examples.population.fetch().head(100)
t2
┏━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ country      year   population ┃
┡━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ stringint64int64      │
├─────────────┼───────┼────────────┤
│ Afghanistan199517586073 │
│ Afghanistan199618415307 │
│ Afghanistan199719021226 │
│ Afghanistan199819496836 │
│ Afghanistan199919987071 │
│ Afghanistan200020595360 │
│ Afghanistan200121347782 │
│ Afghanistan200222202806 │
│ Afghanistan200323116142 │
│ Afghanistan200424018682 │
│  │
└─────────────┴───────┴────────────┘
t1.bill_depth_mm + t2.population
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
 /home/runner/work/ibis/ibis/ibis/expr/types/core.py:96 in __rich_console__                       
                                                                                                  
    93 │   │                                                                                      
    94 │   │   try:                                                                               
    95 │   │   │   if opts.interactive:                                                           
  96 │   │   │   │   rich_object = to_rich(self, console_width=console_width)                   
    97 │   │   │   else:                                                                          
    98 │   │   │   │   rich_object = Text(self._noninteractive_repr())                            
    99 │   │   except Exception as e:                                                             
                                                                                                  
 /home/runner/work/ibis/ibis/ibis/expr/types/pretty.py:271 in to_rich                             
                                                                                                  
   268 │   │   │   expr, max_length=max_length, max_string=max_string, max_depth=max_depth        
   269 │   │   )                                                                                  
   270 else:                                                                                  
 271 │   │   return _to_rich_table(                                                             
   272 │   │   │   expr,                                                                          
   273 │   │   │   max_rows=max_rows,                                                             
   274 │   │   │   max_columns=max_columns,                                                       
                                                                                                  
 /home/runner/work/ibis/ibis/ibis/expr/types/pretty.py:313 in _to_rich_table                      
                                                                                                  
   310 max_string = max_string or ibis.options.repr.interactive.max_string                    
   311 show_types = ibis.options.repr.interactive.show_types                                  
   312                                                                                        
 313 table = tablish.as_table()                                                             
   314 orig_ncols = len(table.columns)                                                        
   315                                                                                        
   316 if console_width == float("inf"):                                                      
                                                                                                  
 /home/runner/work/ibis/ibis/ibis/expr/types/generic.py:1479 in as_table                          
                                                                                                  
   1476 │   │   │   (parent,) = parents                                                           
   1477 │   │   │   return parent.to_expr().select(self)                                          
   1478 │   │   else:                                                                             
 1479 │   │   │   raise com.RelationError(                                                      
   1480 │   │   │   │   f"Cannot convert {type(self)} expression involving multiple "             
   1481 │   │   │   │   "base table references to a projection"                                   
   1482 │   │   │   )                                                                             
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RelationError: Cannot convert <class 'ibis.expr.types.numeric.FloatingColumn'> expression involving multiple base 
table references to a projection
RelationError: Cannot convert <class 'ibis.expr.types.numeric.FloatingColumn'> expression involving multiple base table references to a projection

RelationError: Cannot convert <class 'ibis.expr.types.numeric.FloatingColumn'> expression involving multiple base table references to a projection

If you want to use these two columns together, you would need to join the tables together first:

j = ibis.join(t1, t2, "year")
j
┏━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ species  island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year   country         population ┃
┡━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64stringint64      │
├─────────┼────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┼────────────────┼────────────┤
│ Adelie Dream 42.321.21914150male  2007Afghanistan   26349243 │
│ Adelie Dream 43.218.51924100male  2008Afghanistan   27032197 │
│ Adelie Dream 42.321.21914150male  2007Albania       3166222 │
│ Adelie Dream 43.218.51924100male  2008Albania       3156608 │
│ Adelie Dream 42.321.21914150male  2007Algeria       35097043 │
│ Adelie Dream 43.218.51924100male  2008Algeria       35725377 │
│ Adelie Dream 42.321.21914150male  2007American Samoa57919 │
│ Adelie Dream 43.218.51924100male  2008American Samoa57053 │
│ Adelie Dream 42.321.21914150male  2007Andorra       81292 │
│ Adelie Dream 43.218.51924100male  2008Andorra       79969 │
│  │
└─────────┴────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┴────────────────┴────────────┘
j.bill_depth_mm + j.population
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Add(bill_depth_mm, population) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                        │
├────────────────────────────────┤
│                   2.634926e+07 │
│                   2.703222e+07 │
│                   3.166243e+06 │
│                   3.156626e+06 │
│                   3.509706e+07 │
│                   3.572540e+07 │
│                   5.794020e+04 │
│                   5.707150e+04 │
│                   8.131320e+04 │
│                   7.998750e+04 │
│                               │
└────────────────────────────────┘
Back to top