I grew up playing games, and with the recent re-release of World of Warcraft Classic, it seems like a perfect time to analyze some in-game data!
This dataset is the product of a Horde player’s diligent recording throughout 2008, capturing the transitional phase between the Burning Crusade and Wrath of the Lich King expansions. Notably, starting November 13, 2008, the data showcases numerous characters venturing into new territories and advancing beyond the former level cap of 70.
Analysis
We’ll determine who logged in the most, who leveled from 70 to 80 the fastest, and what activities these players engaged with based on zones. Let’s get to work.
Getting started
Ibis ships with an examples module, which includes this specific data. We’ll use DuckDB here, but this is possible with other backends, and we encourage you to experiment. DuckDB is the default Ibis backend, so it’ll be easy to use with this example.
You can execute pip install ibis-framework[duckdb,examples] to work with Ibis and the example data.
from ibis.interactive import*wowah_data = ex.wowah_data_raw.fetch()wowah_data
I suspect this data was captured by a single player spamming “/who” in the game, most likely using an AddOn, about every ten minutes. Some players could have been captured twice, depending on how the command was being filtered.
We can go ahead and remove these duplicates.
wowah_data = wowah_data.distinct()
Which player logged in the most?
We mentioned that there was a single player likely capturing these results. Let’s find out who that is.
That Troll Hunter that never exceeded level 1 is likely our person with 42,770 sessions.
Who leveled the fastest from 70–80?
At the end of the year, there were 884 level 80s. Who leveled the fastest?
Finding this answer will involve filtering, grouping, and aggregating to compute each character’s time taken to level from 70 to 80.
Let’s start by creating an expression to filter to only the level 80 characters, then join it to filter and identify only where they were level 70 or 80. We’re only concerned with three columns so that we will select only those.
The data is filtered and grouped by character, and two new columns are created to represent timestamps for levels 70 and 80. Then we drop what we no longer need, get the distinct values, and calculate the time taken to level from 70 to 80. Then we sort it!
This isn’t perfect, as I found a case where there was a player who seemed to have quit in March and then returned for the new expansion. They hit 71 before it looks like their login at 70 was captured later. If you’re curious, take a look at char=21951 for yourself.
How did they level?
Let’s grab all the details from the previous result and join it back to get the timestamp and zone data.
This code summarizes how often those characters appear in different zones while leveling up from level 70 to 80. It combines two sets of data based on character names, selects records within the leveling timeframe, groups data by character and zone, and counts the number of times each character was found in each zone.
There is another example table we can join to figure out the Zone information. I’m only interested in two columns, so I’ll filter this further and rename the columns.
zones = ex.wowah_zones_raw.fetch()zones = zones.select(zone=_.Zone, zone_type=_.Type)zones
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ zone ┃ zone_type ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├──────────────────────┼───────────┤
│ Durotar │ Zone │
│ The Barrens │ Zone │
│ Silverpine Forest │ Zone │
│ Stonetalon Mountains │ Zone │
│ Thunder Bluff │ City │
│ Dustwallow Marsh │ Zone │
│ Durotar │ City │
│ Tirisfal Glades │ City │
│ Ashenvale │ Zone │
│ Stranglethorn Vale │ Zone │
│ … │ … │
└──────────────────────┴───────────┘
Making use of pivot_wider and joining back to our leveler_zones expression will make this a breeze!