Analysis of Ibis’s CI performance

blog
bigquery
continuous integration
data engineering
dogfood
Author

Phillip Cloud

Published

January 9, 2023

Summary

This notebook takes you through an analysis of Ibis’s CI data using ibis on top of Google BigQuery.

  • First, we load some data and poke around at it to see what’s what.
  • Second, we figure out some useful things to calculate based on our poking.
  • Third, we’ll visualize the results of calculations to showcase what changed and how.

Imports

Let’s start out by importing ibis and turning on interactive mode.

import ibis
from ibis import _

ibis.options.interactive = True

Connect to BigQuery

We connect to BigQuery using the ibis.connect API, which accepts a URL string indicating the backend and various bit of information needed to connect to the backend. Here we’re using BigQuery, so we need the project id (ibis-gbq) and the dataset id (workflows).

Datasets are analogous to schemas in other systems.

url = "bigquery://ibis-gbq/workflows"
con = ibis.connect(url)

Let’s see what tables are available.

con.list_tables()
['analysis', 'jobs', 'workflows']

Analysis

Here we’ve got our first bit of interesting information: the jobs and workflows tables.

Terminology

Before we jump in, it helps to lay down some terminology.

  • A workflow corresponds to an individual GitHub Actions YAML file in a GitHub repository under the .github/workflows directory.
  • A job is a named set of steps to run inside a workflow file.

What’s in the workflows table?

Each row in the workflows table corresponds to a workflow run.

  • A workflow run is an instance of a workflow that was triggered by some entity: a GitHub user, bot, or other entity. Each row of the workflows table is a workflow run.

What’s in the jobs table?

Similarly, each row in the jobs table is a job run. That is, for a given workflow run there are a set of jobs run with it.

  • A job run is an instance of a job in a workflow. It is associated with a single workflow run.

Rationale

The goal of this analysis is to try to understand ibis’s CI performance, and whether the amount of time we spent waiting on CI has decreased, stayed the same or increased. Ideally, we can understand the pieces that contribute to the change or lack thereof.

Metrics

To that end there are a few interesting metrics to look at:

  • job run duration: this is the amount of time it takes for a given job to complete
  • workflow run duration: the amount of time it takes for all job runs in a workflow run to complete.
  • queueing duration: the amount time time spent waiting for the first job run to commence.

Mitigating Factors

  • Around October 2021, we changed our CI infrastructure to use Poetry instead of Conda. The goal there was to see if we could cache dependencies using the lock file generated by poetry. We should see whether that had any effect.
  • At the end of November 2022, we switch to the Team Plan (a paid GitHub plan) for the Ibis organzation. This tripled the amount of job runs that could execute in parallel. We should see if that helped anything.

Alright, let’s jump into some data!

jobs = con.tables.jobs[_.started_at < "2023-01-09"]
jobs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ url                                                                    steps                                                                             status     started_at                 runner_group_name  run_attempt  name                               labels         node_id                       id         runner_id  run_url                                                                run_id     check_run_url                                                        html_url                                                                    runner_name  runner_group_id  head_sha                                  conclusion  completed_at              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringarray<struct<status: string, conclusion: string, started_at: timestamp('UTC'), …stringtimestamp('UTC')stringint64stringarray<string>stringint64int64stringint64stringstringstringint64stringstringtimestamp('UTC')          │
├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼───────────────────────────┼───────────────────┼─────────────┼───────────────────────────────────┼───────────────┼──────────────────────────────┼───────────┼───────────┼───────────────────────────────────────────────────────────────────────┼───────────┼─────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼─────────────┼─────────────────┼──────────────────────────────────────────┼────────────┼───────────────────────────┤
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950708339[{...}, {...}, ... +9]completed2020-08-05 19:01:16+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA3MDgzMzk=950708339NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191196667191https://api.github.com/repos/ibis-project/ibis/check-runs/950708339https://github.com/ibis-project/ibis/runs/950708339?check_suite_focus=trueNULLNULL08855609f1e9ebdeb6197887cf64ecda015d99a8success   2020-08-05 19:51:29+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950545071[{...}, {...}, ... +9]completed2020-08-05 18:12:43+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA1NDUwNzE=950545071NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109196617109https://api.github.com/repos/ibis-project/ibis/check-runs/950545071https://github.com/ibis-project/ibis/runs/950545071?check_suite_focus=trueNULLNULL7472797f3e4da39d18e53c09566dba5e373094b0success   2020-08-05 18:59:32+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950493219[{...}, {...}, ... +9]completed2020-08-05 17:59:27+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0OTMyMTk=950493219NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751196598751https://api.github.com/repos/ibis-project/ibis/check-runs/950493219https://github.com/ibis-project/ibis/runs/950493219?check_suite_focus=trueNULLNULL7452ea048908149a672f681ffd94e3fd0953ab2cfailure   2020-08-05 18:05:28+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950486464[{...}, {...}, ... +9]completed2020-08-05 17:56:41+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0ODY0NjQ=950486464NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932196596932https://api.github.com/repos/ibis-project/ibis/check-runs/950486464https://github.com/ibis-project/ibis/runs/950486464?check_suite_focus=trueNULLNULL59daccd16de041b14fa48b9ba53e8aac6495a578failure   2020-08-05 18:23:47+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950480141[{...}, {...}, ... +9]completed2020-08-05 17:54:44+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0ODAxNDE=950480141NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357196595357https://api.github.com/repos/ibis-project/ibis/check-runs/950480141https://github.com/ibis-project/ibis/runs/950480141?check_suite_focus=trueNULLNULL1d4f2db372834da7fb33b53c60b59d3f3e40cf7cfailure   2020-08-05 17:54:51+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950177717[{...}, {...}, ... +10]completed2020-08-05 16:29:08+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTAxNzc3MTc=950177717NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866196505866https://api.github.com/repos/ibis-project/ibis/check-runs/950177717https://github.com/ibis-project/ibis/runs/950177717?check_suite_focus=trueNULLNULLc36dd6504d86d1994fb36d6a84fb3f302a57642cfailure   2020-08-05 17:17:42+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950129587[{...}, {...}, ... +10]completed2020-08-05 16:15:54+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTAxMjk1ODc=950129587NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206196491206https://api.github.com/repos/ibis-project/ibis/check-runs/950129587https://github.com/ibis-project/ibis/runs/950129587?check_suite_focus=trueNULLNULL5ffc4dcb3857eae64b5b36f46b378149c0bb2d74failure   2020-08-05 16:47:02+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/949724768[{...}, {...}, ... +14]completed2020-08-05 14:36:49+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDk3MjQ3Njg=949724768NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166196367166https://api.github.com/repos/ibis-project/ibis/check-runs/949724768https://github.com/ibis-project/ibis/runs/949724768?check_suite_focus=trueNULLNULLe88d621425c939857b3b9391794c5ddfd7615981failure   2020-08-05 15:10:08+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/949565949[{...}, {...}, ... +13]completed2020-08-05 14:01:36+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDk1NjU5NDk=949565949NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939196316939https://api.github.com/repos/ibis-project/ibis/check-runs/949565949https://github.com/ibis-project/ibis/runs/949565949?check_suite_focus=trueNULLNULL702446a96a1b9e6b463084f2f09f2f2106fef8d4failure   2020-08-05 14:32:10+00:00 │
│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947307233[{...}, {...}, ... +13]completed2020-08-05 00:48:26+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDczMDcyMzM=947307233NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439195537439https://api.github.com/repos/ibis-project/ibis/check-runs/947307233https://github.com/ibis-project/ibis/runs/947307233?check_suite_focus=trueNULLNULL2ab26f385b87f39b66cf51783d7ab8904fdb4677failure   2020-08-05 01:19:56+00:00 │
│                          │
└───────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴───────────────────────────┴───────────────────┴─────────────┴───────────────────────────────────┴───────────────┴──────────────────────────────┴───────────┴───────────┴───────────────────────────────────────────────────────────────────────┴───────────┴─────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────────┴──────────────────────────────────────────┴────────────┴───────────────────────────┘

These first few columns in the jobs table aren’t that interesting so we should look at what else is there

jobs.columns
['url',
 'steps',
 'status',
 'started_at',
 'runner_group_name',
 'run_attempt',
 'name',
 'labels',
 'node_id',
 'id',
 'runner_id',
 'run_url',
 'run_id',
 'check_run_url',
 'html_url',
 'runner_name',
 'runner_group_id',
 'head_sha',
 'conclusion',
 'completed_at']

A bunch of these aren’t that useful for our purposes. However, run_id, started_at, completed_at are useful for us. The GitHub documentation for job information provides useful detail about the meaning of these fields.

  • run_id: the workflow run associated with this job run
  • started_at: when the job started
  • completed_at: when the job completed

What we’re interested in to a first degree is the job duration, so let’s compute that.

We also need to compute when the last job for a given run_id started and when it completed. We’ll use the former to compute the queueing duration, and the latter to compute the total time it took for a given workflow run to complete.

run_id_win = ibis.window(group_by=_.run_id)
jobs = jobs.select(
    _.run_id,
    job_duration=_.completed_at.cast("int") - _.started_at.cast("int"),
    last_job_started_at=_.started_at.max().over(run_id_win),
    last_job_completed_at=_.completed_at.max().over(run_id_win),
)
jobs
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ run_id     job_duration  last_job_started_at        last_job_completed_at     ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ int64int64timestamp('UTC')timestamp('UTC')          │
├───────────┼──────────────┼───────────────────────────┼───────────────────────────┤
│ 20596172613530000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2059617264540000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2059617264470000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2059617267100000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2059617263640000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2059617263220000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 20596172602020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │
│ 2128049343530000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │
│ 2128049344000000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │
│ 21280493410270000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │
│                                  │
└───────────┴──────────────┴───────────────────────────┴───────────────────────────┘

Let’s take a look at workflows

workflows = con.tables.workflows
workflows
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ workflow_url                                                              workflow_id  triggering_actor  run_number  run_attempt  updated_at                 cancel_url                                                                    rerun_url                                                                    check_suite_node_id               pull_requests                                                                     id         node_id                           status     repository                                                                                                                                                     jobs_url                                                                    previous_attempt_url  artifacts_url                                                                    html_url                                                     head_sha                                  head_repository                                                                                                                                                    run_started_at             head_branch    url                                                                    event         name    actor  created_at                 check_suite_url                                                         check_suite_id  conclusion  head_commit                                                                                                                            logs_url                                                                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stringint64struct<subscrip…int64int64timestamp('UTC')stringstringstringarray<struct<number: int64, url: string, id: int64, head: struct<sha: string, r…int64stringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…stringstringstringstringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…timestamp('UTC')stringstringstringstringstru…timestamp('UTC')stringint64stringstruct<tree_id: string, timestamp: timestamp('UTC'), message: string, id: strin…string                                                                     │
├──────────────────────────────────────────────────────────────────────────┼─────────────┼──────────────────┼────────────┼─────────────┼───────────────────────────┼──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼──────────────────────────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼──────────────────────┼─────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────┼──────────────┼────────┼───────┼───────────────────────────┼────────────────────────────────────────────────────────────────────────┼────────────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2812020-08-05 20:01:17+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/rerunMDEwOkNoZWNrU3VpdGUxMDEzMDI5NDkw[]196667191MDExOldvcmtmbG93UnVuMTk2NjY3MTkxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/artifactshttps://github.com/ibis-project/ibis/actions/runs/19666719108855609f1e9ebdeb6197887cf64ecda015d99a8{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 19:01:08+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191pull_requestMain  NULL2020-08-05 19:01:08+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10130294901013029490success   {'tree_id': 'c4277198178ae73c3d9611af464ee75eadbceedc', 'timestamp': datetime.datetime(2020, 8, 5, 19, 0, 57, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2712020-08-05 18:59:37+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/rerunMDEwOkNoZWNrU3VpdGUxMDEyODM0OTQ3[]196617109MDExOldvcmtmbG93UnVuMTk2NjE3MTA5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/artifactshttps://github.com/ibis-project/ibis/actions/runs/1966171097472797f3e4da39d18e53c09566dba5e373094b0{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 18:12:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109pull_requestMain  NULL2020-08-05 18:12:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10128349471012834947success   {'tree_id': '451472455efc6f20b81f6e1762ac712ec75e77b3', 'timestamp': datetime.datetime(2020, 8, 5, 18, 12, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2612020-08-05 18:05:32+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzc0OTQ4[]196598751MDExOldvcmtmbG93UnVuMTk2NTk4NzUxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965987517452ea048908149a672f681ffd94e3fd0953ab2c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:58:48+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751pull_requestMain  NULL2020-08-05 17:58:48+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127749481012774948failure   {'tree_id': 'e753c3d693a15eeb99a0d2bd074414ab90dbc85d', 'timestamp': datetime.datetime(2020, 8, 5, 17, 58, 39, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2512020-08-05 18:23:52+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzY2NTk2[]196596932MDExOldvcmtmbG93UnVuMTk2NTk2OTMycompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/artifactshttps://github.com/ibis-project/ibis/actions/runs/19659693259daccd16de041b14fa48b9ba53e8aac6495a578{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:56:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932pull_requestMain  NULL2020-08-05 17:56:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127665961012766596failure   {'tree_id': '342312b7ce508d4d7c91259dd7919cee06508f19', 'timestamp': datetime.datetime(2020, 8, 5, 17, 56, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2412020-08-05 17:54:55+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzU5MjQ1[]196595357MDExOldvcmtmbG93UnVuMTk2NTk1MzU3completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965953571d4f2db372834da7fb33b53c60b59d3f3e40cf7c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:54:35+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357pull_requestMain  NULL2020-08-05 17:54:35+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127592451012759245failure   {'tree_id': '82f0fdad2916ec10d33f5b6c589dbfe8e4decccd', 'timestamp': datetime.datetime(2020, 8, 5, 17, 54, 26, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2312020-08-05 17:17:46+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/rerunMDEwOkNoZWNrU3VpdGUxMDEyNDExMDA0[]196505866MDExOldvcmtmbG93UnVuMTk2NTA1ODY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/artifactshttps://github.com/ibis-project/ibis/actions/runs/196505866c36dd6504d86d1994fb36d6a84fb3f302a57642c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:28:57+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866pull_requestMain  NULL2020-08-05 16:28:57+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10124110041012411004failure   {'tree_id': 'c3bf70f2809e48fc5c1dd5b0e7e2321bae4879ea', 'timestamp': datetime.datetime(2020, 8, 5, 16, 28, 48, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2212020-08-05 16:47:06+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/rerunMDEwOkNoZWNrU3VpdGUxMDEyMzQ4MjUz[]196491206MDExOldvcmtmbG93UnVuMTk2NDkxMjA2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/artifactshttps://github.com/ibis-project/ibis/actions/runs/1964912065ffc4dcb3857eae64b5b36f46b378149c0bb2d74{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:15:44+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206pull_requestMain  NULL2020-08-05 16:15:44+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10123482531012348253failure   {'tree_id': '3781a799e538f99a2e05399fea4237e5d06d5df2', 'timestamp': datetime.datetime(2020, 8, 5, 16, 12, 4, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2112020-08-05 15:10:13+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/rerunMDEwOkNoZWNrU3VpdGUxMDExODM5NTE5[]196367166MDExOldvcmtmbG93UnVuMTk2MzY3MTY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/artifactshttps://github.com/ibis-project/ibis/actions/runs/196367166e88d621425c939857b3b9391794c5ddfd7615981{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:36:39+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166pull_requestMain  NULL2020-08-05 14:36:39+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10118395191011839519failure   {'tree_id': 'e093ce6398be8a2fb5331d944d07ef0c5518cc84', 'timestamp': datetime.datetime(2020, 8, 5, 14, 36, 30, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2012020-08-05 14:32:14+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/rerunMDEwOkNoZWNrU3VpdGUxMDExNjUzNDY5[]196316939MDExOldvcmtmbG93UnVuMTk2MzE2OTM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/artifactshttps://github.com/ibis-project/ibis/actions/runs/196316939702446a96a1b9e6b463084f2f09f2f2106fef8d4{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:01:24+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939pull_requestMain  NULL2020-08-05 14:01:24+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10116534691011653469failure   {'tree_id': 'cdac62bce1914add5faceafd210f32965aa00fe7', 'timestamp': datetime.datetime(2020, 8, 5, 14, 1, 11, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/logs │
│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL1912020-08-05 01:20:00+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/rerunMDEwOkNoZWNrU3VpdGUxMDA5MDQ3OTk5[]195537439MDExOldvcmtmbG93UnVuMTk1NTM3NDM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/artifactshttps://github.com/ibis-project/ibis/actions/runs/1955374392ab26f385b87f39b66cf51783d7ab8904fdb4677{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 00:48:17+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439pull_requestMain  NULL2020-08-05 00:48:17+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10090479991009047999failure   {'tree_id': '33ca23ad93f84344f03894d952d7ffeaf8fb5990', 'timestamp': datetime.datetime(2020, 8, 5, 0, 48, 8, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/logs │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┴─────────────┴──────────────────┴────────────┴─────────────┴───────────────────────────┴──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴──────────────────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴──────────────────────┴─────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────┴───────────────┴───────────────────────────────────────────────────────────────────────┴──────────────┴────────┴───────┴───────────────────────────┴────────────────────────────────────────────────────────────────────────┴────────────────┴────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘

Again we have a bunch of columns that aren’t so useful to us, so let’s see what else is there.

workflows.columns
['workflow_url',
 'workflow_id',
 'triggering_actor',
 'run_number',
 'run_attempt',
 'updated_at',
 'cancel_url',
 'rerun_url',
 'check_suite_node_id',
 'pull_requests',
 'id',
 'node_id',
 'status',
 'repository',
 'jobs_url',
 'previous_attempt_url',
 'artifacts_url',
 'html_url',
 'head_sha',
 'head_repository',
 'run_started_at',
 'head_branch',
 'url',
 'event',
 'name',
 'actor',
 'created_at',
 'check_suite_url',
 'check_suite_id',
 'conclusion',
 'head_commit',
 'logs_url']

We don’t care about many of these for the purposes of this analysis, however we need the id and a few values derived from the run_started_at column.

  • id: the unique identifier of the workflow run
  • run_started_at: the time the workflow run started

We compute the date the run started at so we can later compare it to the dates where we added poetry and switched to the team plan.

workflows = workflows.select(
    _.id, _.run_started_at, started_date=_.run_started_at.date()
)
workflows
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ id         run_started_at             started_date ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ int64timestamp('UTC')date         │
├───────────┼───────────────────────────┼──────────────┤
│ 1954783822020-08-04 23:54:29+00:002020-08-04   │
│ 1954765172020-08-04 23:51:44+00:002020-08-04   │
│ 1954755252020-08-04 23:50:11+00:002020-08-04   │
│ 1954686772020-08-04 23:39:51+00:002020-08-04   │
│ 1954653432020-08-04 23:34:11+00:002020-08-04   │
│ 1954606112020-08-04 23:29:07+00:002020-08-04   │
│ 1954525052020-08-04 23:17:29+00:002020-08-04   │
│ 1954478862020-08-04 23:11:35+00:002020-08-04   │
│ 1954355212020-08-04 23:02:34+00:002020-08-04   │
│ 1954333852020-08-04 23:01:00+00:002020-08-04   │
│                     │
└───────────┴───────────────────────────┴──────────────┘

We need to associate jobs and workflows somehow, so let’s join them on the relevant key fields.

joined = jobs.join(workflows, jobs.run_id == workflows.id)
joined
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ run_id      job_duration  last_job_started_at        last_job_completed_at      id          run_started_at             started_date ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ int64int64timestamp('UTC')timestamp('UTC')int64timestamp('UTC')date         │
├────────────┼──────────────┼───────────────────────────┼───────────────────────────┼────────────┼───────────────────────────┼──────────────┤
│  28727181802020-10-04 01:42:03+00:002020-10-04 01:42:03+00:002872718182020-10-04 01:41:55+00:002020-10-04   │
│ 1405828076270000002021-10-31 23:36:44+00:002021-10-31 23:37:11+00:0014058280762021-10-31 23:36:37+00:002021-10-31   │
│ 1405808044340000002021-10-31 23:26:54+00:002021-10-31 23:27:28+00:0014058080442021-10-31 23:24:52+00:002021-10-31   │
│ 140579786930000002021-10-31 23:26:44+00:002021-10-31 23:26:47+00:0014057978692021-10-31 23:19:08+00:002021-10-31   │
│ 140579787002021-10-31 23:26:41+00:002021-10-31 23:26:41+00:0014057978702021-10-31 23:19:08+00:002021-10-31   │
│ 14057960702800000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │
│ 14057960701280000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │
│ 14057960704770000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │
│ 14057960706070000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │
│ 14057960701540000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │
│                      │
└────────────┴──────────────┴───────────────────────────┴───────────────────────────┴────────────┴───────────────────────────┴──────────────┘

Sweet! Now we have workflow runs and job runs together in the same table, let’s start exploring summarization.

Let’s encode our knowledge about when the poetry move happened and also when we moved to the team plan.

from datetime import date

POETRY_MERGED_DATE = date(2021, 10, 15)
TEAMIZATION_DATE = date(2022, 11, 28)

Let’s compute some indicator variables indicating whether a given row contains data after poetry changes occurred, and do the same for the team plan.

Let’s also compute queueing time and workflow duration.

stats = joined.select(
    _.started_date,
    _.job_duration,
    has_poetry=_.started_date > POETRY_MERGED_DATE,
    has_team=_.started_date > TEAMIZATION_DATE,
    queueing_time=_.last_job_started_at.cast("int")
    - _.run_started_at.cast("int"),
    workflow_duration=_.last_job_completed_at.cast("int")
    - _.run_started_at.cast("int"),
)
stats
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ dateint64booleanbooleanint64int64             │
├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┤
│ 2022-08-1584000000 │ True       │ False    │      1100000095000000 │
│ 2022-08-1524000000 │ True       │ False    │      1100000035000000 │
│ 2022-08-150 │ True       │ False    │       10000001000000 │
│ 2022-08-151000000 │ True       │ False    │      1800000020000000 │
│ 2022-08-150 │ True       │ False    │      1800000020000000 │
│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │
│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │
│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │
│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │
│ 2022-08-15411000000 │ True       │ False    │     712000000712000000 │
│  │
└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┘

Let’s create a column ranging from 0 to 2 inclusive where:

  • 0: no improvements
  • 1: just poetry
  • 2: poetry and the team plan

Let’s also give them some names that’ll look nice on our plots.

stats = stats.mutate(
    raw_improvements=_.has_poetry.cast("int") + _.has_team.cast("int")
).mutate(
    improvements=(
        _.raw_improvements.case()
        .when(0, "None")
        .when(1, "Poetry")
        .when(2, "Poetry + Team Plan")
        .else_("NA")
        .end()
    ),
    team_plan=ibis.where(_.raw_improvements > 1, "Poetry + Team Plan", "None"),
)
stats
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration  raw_improvements  improvements  team_plan ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ dateint64booleanbooleanint64int64int64stringstring    │
├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┼──────────────────┼──────────────┼───────────┤
│ 2021-03-29377000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29414000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29519000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29642000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29861000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29873000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29455000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29637000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29798000000 │ False      │ False    │      130000009780000000None        None      │
│ 2021-03-29822000000 │ False      │ False    │      130000009780000000None        None      │
│          │
└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┴──────────────────┴──────────────┴───────────┘

Finally, we can summarize by averaging the different durations, grouping on the variables of interest.

USECS_PER_MIN = 60_000_000

agged = stats.group_by([_.started_date, _.improvements, _.team_plan]).agg(
    job=_.job_duration.div(USECS_PER_MIN).mean(),
    workflow=_.workflow_duration.div(USECS_PER_MIN).mean(),
    queueing_time=_.queueing_time.div(USECS_PER_MIN).mean(),
)
agged
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ started_date  improvements        team_plan           job        workflow   queueing_time ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ datestringstringfloat64float64float64       │
├──────────────┼────────────────────┼────────────────────┼───────────┼───────────┼───────────────┤
│ 2020-09-02None              None              14.15476257.42083357.148929 │
│ 2022-09-12Poetry            None              3.12556913.84556012.839561 │
│ 2022-09-25Poetry            None              3.35744615.09534314.073260 │
│ 2021-09-24None              None              7.91091520.35839220.336150 │
│ 2020-11-25None              None              13.75061259.67240159.493807 │
│ 2020-08-07None              None              48.22916748.3916670.162500 │
│ 2021-04-09None              None              0.0000000.1666670.166667 │
│ 2022-02-22Poetry            None              2.34196611.37756410.467503 │
│ 2022-10-13Poetry            None              6.67685335.34404628.530969 │
│ 2022-12-19Poetry + Team PlanPoetry + Team Plan4.11339110.2731967.780035 │
│  │
└──────────────┴────────────────────┴────────────────────┴───────────┴───────────┴───────────────┘

If at any point you want to inspect the SQL you’ll be running, ibis has you covered with ibis.to_sql.

ibis.to_sql(agged)
WITH t0 AS (
  SELECT
    t6.*
  FROM `ibis-gbq`.workflows.jobs AS t6
  WHERE
    t6.`started_at` < '2023-01-09'
), t1 AS (
  SELECT
    t6.`id`,
    t6.`run_started_at`,
    DATE(t6.`run_started_at`) AS `started_date`
  FROM `ibis-gbq`.workflows.workflows AS t6
), t2 AS (
  SELECT
    t0.`run_id`,
    UNIX_MICROS(t0.`completed_at`) - UNIX_MICROS(t0.`started_at`) AS `job_duration`,
    MAX(t0.`started_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_started_at`,
    MAX(t0.`completed_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_completed_at`
  FROM t0
), t3 AS (
  SELECT
    `started_date`,
    `job_duration`,
    `started_date` > CAST('2021-10-15' AS DATE) AS `has_poetry`,
    `started_date` > CAST('2022-11-28' AS DATE) AS `has_team`,
    UNIX_MICROS(`last_job_started_at`) - UNIX_MICROS(`run_started_at`) AS `queueing_time`,
    UNIX_MICROS(`last_job_completed_at`) - UNIX_MICROS(`run_started_at`) AS `workflow_duration`
  FROM t2
  INNER JOIN t1
    ON t2.`run_id` = t1.`id`
), t4 AS (
  SELECT
    t3.*,
    CAST(t3.`has_poetry` AS INT64) + CAST(t3.`has_team` AS INT64) AS `raw_improvements`
  FROM t3
)
SELECT
  t5.`started_date`,
  t5.`improvements`,
  t5.`team_plan`,
  avg(IEEE_DIVIDE(t5.`job_duration`, 60000000)) AS `job`,
  avg(IEEE_DIVIDE(t5.`workflow_duration`, 60000000)) AS `workflow`,
  avg(IEEE_DIVIDE(t5.`queueing_time`, 60000000)) AS `queueing_time`
FROM (
  SELECT
    t4.*,
    CASE t4.`raw_improvements`
      WHEN 0
      THEN 'None'
      WHEN 1
      THEN 'Poetry'
      WHEN 2
      THEN 'Poetry + Team Plan'
      ELSE 'NA'
    END AS `improvements`,
    CASE WHEN t4.`raw_improvements` > 1 THEN 'Poetry + Team Plan' ELSE 'None' END AS `team_plan`
  FROM t4
) AS t5
GROUP BY
  1,
  2,
  3

Plot the Results

Ibis doesn’t have builtin plotting support, so we need to pull our results into pandas.

Here I’m using plotnine (a Python port of ggplot2), which has great integration with pandas DataFrames.

raw_df = agged.execute()
raw_df
started_date improvements team_plan job workflow queueing_time
0 2022-05-21 Poetry None 3.315453 12.056705 10.856003
1 2021-06-23 None None 8.804329 18.838528 0.799567
2 2022-05-12 Poetry None 4.912492 17.443804 13.617164
3 2022-09-11 Poetry None 3.318782 12.665244 11.561670
4 2021-04-08 None None 8.366981 13.957233 0.276730
... ... ... ... ... ... ...
779 2022-10-09 Poetry None 3.472283 12.489749 9.092648
780 2021-03-24 None None 9.499082 16.419903 1.801063
781 2022-03-06 Poetry None 2.727943 11.757324 10.942026
782 2021-11-22 Poetry None 2.608860 10.306637 7.481462
783 2022-06-08 Poetry None 3.214470 12.617276 11.713546

784 rows × 6 columns

Generally, plotnine works with long, tidy data so let’s use pandas.melt to get there.

import pandas as pd

df = pd.melt(
    raw_df,
    id_vars=["started_date", "improvements", "team_plan"],
    var_name="entity",
    value_name="duration",
)
df.head()
started_date improvements team_plan entity duration
0 2022-05-21 Poetry None job 3.315453
1 2021-06-23 None None job 8.804329
2 2022-05-12 Poetry None job 4.912492
3 2022-09-11 Poetry None job 3.318782
4 2021-04-08 None None job 8.366981

Let’s make our theme lighthearted by using xkcd-style plots.

from plotnine import *

theme_set(theme_xkcd())

Create a few labels for our plot.

poetry_label = f"Poetry\n{POETRY_MERGED_DATE}"
team_label = f"Team Plan\n{TEAMIZATION_DATE}"

Without the following line you may see large amount of inconsequential warnings that make the notebook unusable.

import logging

# without this, findfont logging spams the notebook making it unusable
logging.getLogger('matplotlib.font_manager').disabled = True

Here we show job durations, coloring the points differently depending on whether they have no improvements, poetry, or poetry + team plan.

(
    ggplot(
        df.loc[df.entity == "job"].reset_index(drop=True),
        aes(x="started_date", y="duration", color="factor(improvements)"),
    )
    + geom_point()
    + geom_vline(
        xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],
        colour=["blue", "green"],
        linetype="dashed",
    )
    + scale_color_brewer(
        palette=7,
        type='qual',
        limits=["None", "Poetry", "Poetry + Team Plan"],
    )
    + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=15, color="blue")
    + geom_text(x=TEAMIZATION_DATE, label=team_label, y=10, color="blue")
    + stat_smooth(method="lm")
    + labs(x="Date", y="Duration (minutes)")
    + ggtitle("Job Duration")
    + theme(
        figure_size=(22, 6),
        legend_position=(0.67, 0.65),
        legend_direction="vertical",
    )
)

<Figure Size: (2200 x 600)>

Result #1: Job Duration

This result is pretty interesting.

A few things pop out to me right away:

  • The move to poetry decreased the average job run duration by quite a bit. No, I’m not going to do any statistical tests.
  • The variability of job run durations also decreased by quite a bit after introducing poetry.
  • Moving to the team plan had little to no effect on job run duration.
(
    ggplot(
        df.loc[df.entity != "job"].reset_index(drop=True),
        aes(x="started_date", y="duration", color="factor(improvements)"),
    )
    + facet_wrap("entity", ncol=1)
    + geom_point()
    + geom_vline(
        xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],
        linetype="dashed",
    )
    + scale_color_brewer(
        palette=7,
        type='qual',
        limits=["None", "Poetry", "Poetry + Team Plan"],
    )
    + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=75, color="blue")
    + geom_text(x=TEAMIZATION_DATE, label=team_label, y=50, color="blue")
    + stat_smooth(method="lm")
    + labs(x="Date", y="Duration (minutes)")
    + ggtitle("Workflow Duration")
    + theme(
        figure_size=(22, 13),
        legend_position=(0.68, 0.75),
        legend_direction="vertical",
    )
)

<Figure Size: (2200 x 1300)>

Result #2: Workflow Duration and Queueing Time

Another interesting result.

Queueing Time

  • It almost looks like moving to poetry made average queueing time worse. This is probably due to our perception that faster jobs means faster ci. As we see here that isn’t the case
  • Moving to the team plan cut down the queueing time by quite a bit

Workflow Duration

  • Overall workflow duration appears to be strongly influenced by moving to the team plan, which is almost certainly due to the drop in queueing time since we are no longer limited by slow job durations.
  • Perhaps it’s obvious, but queueing time and workflow duration appear to be highly correlated.

In the next plot we’ll look at that correlation.

(
    ggplot(raw_df, aes(x="workflow", y="queueing_time"))
    + geom_point()
    + geom_rug()
    + facet_grid(". ~ team_plan")
    + labs(x="Workflow Duration (minutes)", y="Queueing Time (minutes)")
    + ggtitle("Workflow Duration vs. Queueing Time")
    + theme(figure_size=(22, 6))
)

<Figure Size: (2200 x 600)>

Result #3: Workflow Duration and Queueing Duration are correlated

It also seems that moving to the team plan (though also the move to poetry might be related here) reduced the variability of both metrics.

We’re lacking data compared to the past so we should wait for more to come in.

Conclusions

It appears that you need both a short queue time and fast individual jobs to minimize time spent in CI.

If you have a short queue time, but long job runs then you’ll be bottlenecked on individual jobs, and if you have more jobs than queue slots then you’ll be blocked on queueing time.

I think we can sum this up nicely:

  • slow jobs, slow queue: 🤷 blocked by jobs or queue
  • slow jobs, fast queue: ❓ blocked by jobs, if jobs are slow enough
  • fast jobs, slow queue: ❗ blocked by queue, with enough jobs
  • fast jobs, fast queue: ✅
Back to top