You’re a data analyst, and a new ticket landed in your queue.
Subject: Urgent: Data Discovery Needed for Critical Analysis
Hi Data Team,
I hope this message finds you well. I’m reaching out with an urgent request that directly impacts the company’s most critical project. We need to locate a specific value within our database but do not know which column it’s in. Unfortunately, we don’t have documentation for this particular table. We are looking for the value “NEEDLE” in the table.
We think it is in the X database, Y schema, and Z table. We appreciate your help with this urgent matter!
Whelp, let’s give this a try.
The table
To set up this particular problem, we can use pandas to create a table with 5 columns and 100 rows. We can use the at method to update a row with the value “NEEDLE” to simulate what we need to find.
Code
import pandas as pdimport randomimport stringfrom ibis.interactive import*def random_string(length=10):return"".join( random.choice(string.ascii_letters + string.digits) for _ inrange(length) )data = [[random_string() for _ inrange(5)] for _ inrange(100)]column_names = [f"col{i+1}"for i inrange(5)]df = pd.DataFrame(data, columns=column_names)df.at[42, 'col4'] ="NEEDLE"t = ibis.memtable(df, name="Z")
We could always spell it out with SQL, including each column that we want to check in the WHERE clause. In this scenario, we know each column is a varchar, so we can check each one for the value “NEEDLE”.
s.of_type("string") was used to select string columns, then s.if_any() builds up the ORs. The _ == "NEEDLE" part is the condition itself, checking each column for the value.
Here’s the SQL that was generated to help us find it, which is quite similar to what we would have had to write if we had gone with Option 1.
Now that we’ve found the "NEEDLE" value, we can provide the information to the requester. Urgent requests like this require quick and precise responses.
Our use of Ibis demonstrates how easy it is to simplify navigating large datasets, and in this case, undocumented ones.
Please get in touch with us on GitHub or Zulip. We’d love to hear from you!