Synthetic data with Ibis, DuckDB, Python UDFs, and Faker.
To follow along, install the required libraries:
pip install 'ibis-framework[duckdb]' faker plotly
A random cube
We’ll generate a random cube of data with Ibis (default DuckDB backend) and visualize it as a 3D line plot:
Show me the code!
import ibis
import ibis.selectors as s
import plotly.express as px
ibis.options.interactive = True
ibis.options.repr.interactive.max_rows = 5
con = ibis.connect("duckdb://synthetic.ddb")
if "source" in con.list_tables():
t = con.table("source")
else:
lookback = ibis.interval(days=1)
step = ibis.interval(seconds=1)
t = (
(
ibis.range(
ibis.now() - lookback,
ibis.now(),
step=step,
)
.unnest()
.name("timestamp")
.as_table()
)
.mutate(
index=(ibis.row_number().over(order_by="timestamp")),
**{col: 2 * (ibis.random() - 0.5) for col in ["a", "b", "c"]},
)
.mutate(color=ibis._["index"].histogram(nbins=8))
.drop("index")
.relocate("timestamp", "color")
.order_by("timestamp")
)
t = con.create_table("source", t.to_pyarrow())
c = px.line_3d(
t,
x="a",
y="b",
z="c",
color="color",
hover_data=["timestamp"],
)
c
- 1
-
Import the necessary libraries.
- 2
-
Enable interactive mode for Ibis.
- 3
-
Connect to an on-disk DuckDB database.
- 4
-
Load the table if it already exists.
- 5
-
Define the time range and step for the data.
- 6
-
Create the array of timestamps.
- 7
-
Unnest the array to a column.
- 8
-
Name the column “timestamp”.
- 9
-
Convert the column into a table.
- 10
-
Create a monotonically increasing index column.
- 11
-
Create three columns of random numbers.
- 12
-
Create a color column based on the index (help visualize the time series).
- 13
-
Drop the index column.
- 14
-
Rearrange the columns.
- 15
-
Order the table by timestamp.
- 16
-
Store the table in the on-disk database.
- 17
-
Create a 3D line plot of the data.