Taking a random cube for a walk and making it talk

blog
duckdb
udfs
Author

Cody Peterson

Published

September 26, 2024

Synthetic data with Ibis, DuckDB, Python UDFs, and Faker.

To follow along, install the required libraries:

pip install 'ibis-framework[duckdb]' faker plotly

A random cube

We’ll generate a random cube of data with Ibis (default DuckDB backend) and visualize it as a 3D line plot:

Show me the code!
import ibis
import ibis.selectors as s
import plotly.express as px

ibis.options.interactive = True
ibis.options.repr.interactive.max_rows = 5

con = ibis.connect("duckdb://synthetic.ddb")

if "source" in con.list_tables():
    t = con.table("source")
else:
    lookback = ibis.interval(days=1)
    step = ibis.interval(seconds=1)

    t = (
        (
            ibis.range(
                ibis.now() - lookback,
                ibis.now(),
                step=step,
            )
            .unnest()
            .name("timestamp")
            .as_table()
        )
        .mutate(
            index=(ibis.row_number().over(order_by="timestamp")),
            **{col: 2 * (ibis.random() - 0.5) for col in ["a", "b", "c"]},
        )
        .mutate(color=ibis._["index"].histogram(nbins=8))
        .drop("index")
        .relocate("timestamp", "color")
        .order_by("timestamp")
    )

    t = con.create_table("source", t.to_pyarrow())

c = px.line_3d(
    t,
    x="a",
    y="b",
    z="c",
    color="color",
    hover_data=["timestamp"],
)
c
1
Import the necessary libraries.
2
Enable interactive mode for Ibis.
3
Connect to an on-disk DuckDB database.
4
Load the table if it already exists.
5
Define the time range and step for the data.
6
Create the array of timestamps.
7
Unnest the array to a column.
8
Name the column “timestamp”.
9
Convert the column into a table.
10
Create a monotonically increasing index column.
11
Create three columns of random numbers.
12
Create a color column based on the index (help visualize the time series).
13
Drop the index column.
14
Rearrange the columns.
15
Order the table by timestamp.
16
Store the table in the on-disk database.
17
Create a 3D line plot of the data.