Snow IO: loading data from other DBs into Snowflake

blog
snowflake
io
productivity
Author

Phillip Cloud

Published

March 6, 2024

Recap

We’ve blogged about Snowflake IO before, in the context of getting local files into Snowflake as fast as possible.

In this post, we’ll show how to insert query results from another system into Snowflake, using Ibis.

Setup

Connect to your non-Snowflake system

We’ll connect to a postgres database running locally in a container. You should be able to swap in your own connection details as needed.

from ibis.interactive import *

pg_con = ibis.connect("postgres://postgres:postgres@localhost/postgres")
1
Import Ibis for maximum productivity in interactive analysis.

We’ll use a test dataset that contains some baseball batting statistics.

Ibis provides that example data, so we can dump that into postgres.

pg_batting = pg_con.create_table(
    "batting",
    ibis.examples.Batting.fetch().to_pandas(),
    temp=True,
)
1
Yep, I’m using pandas here!
2
Use a temporary table to avoid cluttering up the database.

Connect to Snowflake

import os

# snowflake://user:pass@account/database/schema?warehouse=my_warehouse
snow_con = ibis.connect(os.environ["SNOWFLAKE_URL"])
1
Set the SNOWFLAKE_URL environment variable to your Snowflake connection string.

Profit

Construct an Ibis expression from the postgres data

Let’s build an Ibis expression based on the batting table in our postgres database.

pg_batting
┏━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ player_id  year_id  stint  team_id  lg_id   g      ab     r      h      x2b    x3b    hr     rbi      sb       cs       bb     so       ibb      hbp      sh       sf       gidp    ┃
┡━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ stringint64int64stringstringint64int64int64int64int64int64int64float64float64float64int64float64float64float64float64float64float64 │
├───────────┼─────────┼───────┼─────────┼────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ abercda0118711TRO    NA    14000000.00.00.000.0nannannannan0.0 │
│ addybo01 18711RC1    NA    25118303260013.08.01.040.0nannannannan0.0 │
│ allisar0118711CL1    NA    29137284045019.03.01.025.0nannannannan1.0 │
│ allisdo0118711WS3    NA    271332844102227.01.01.002.0nannannannan0.0 │
│ ansonca0118711RC1    NA    251202939113016.06.02.021.0nannannannan0.0 │
│ armstbo0118711FW1    NA    12499112105.00.01.001.0nannannannan0.0 │
│ barkeal0118711RC1    NA    14010002.00.00.010.0nannannannan0.0 │
│ barnero0118711BS1    NA    311576663109034.011.06.0131.0nannannannan1.0 │
│ barrebi0118711FW1    NA    15111001.00.00.000.0nannannannan0.0 │
│ barrofr0118711BS1    NA    1886131321011.01.00.000.0nannannannan0.0 │
│  │
└───────────┴─────────┴───────┴─────────┴────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

We can compute the average RBI per year per team.

pg_expr = pg_batting.group_by(("year_id", "team_id")).agg(avg_rbi=_.rbi.mean())
pg_expr
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ year_id  team_id  avg_rbi   ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
│ int64stringfloat64   │
├─────────┼─────────┼───────────┤
│    1891PIT    22.782609 │
│    1895BSN    34.363636 │
│    1940SLA    22.343750 │
│    1981HOU    9.972973 │
│    1913CLE    13.512821 │
│    1971MON    17.181818 │
│    2008PIT    15.000000 │
│    1895WAS    23.096774 │
│    2011KCA    16.785714 │
│    2007MIL    19.350000 │
│        │
└─────────┴─────────┴───────────┘

We can also rename columns to be more consistent with typical Snowflake usage.

pg_expr = pg_expr.rename("ALL_CAPS")
pg_expr
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ YEAR_ID  TEAM_ID  AVG_RBI   ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
│ int64stringfloat64   │
├─────────┼─────────┼───────────┤
│    1891PIT    22.782609 │
│    1895BSN    34.363636 │
│    1940SLA    22.343750 │
│    1981HOU    9.972973 │
│    1913CLE    13.512821 │
│    1971MON    17.181818 │
│    2008PIT    15.000000 │
│    1895WAS    23.096774 │
│    2011KCA    16.785714 │
│    2007MIL    19.350000 │
│        │
└─────────┴─────────┴───────────┘

Let’s show how many rows we have in the result.

pg_expr.count()

3015

Insert the computed results into Snowflake

Because all Ibis backends implement the to_pyarrow() method, we can get data out of another system and into Snowflake with a few lines of code.

First we’ll create a table in Snowflake to hold the data.

Ibis helps here by providing an API to access the schema from the postgres-based expression, and automatically translates postgres types into Snowflake types.

snow_table = snow_con.create_table("pg_batting", schema=pg_expr.schema(), temp=True)
1
By default the table will be created in the database and schema of the current connection.

We’ll show that the table is empty to sanity check ourselves.

snow_table
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ YEAR_ID  TEAM_ID  AVG_RBI ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64stringfloat64 │
└─────────┴─────────┴─────────┘

Insert the expression’s result table into Snowflake.

snow_con.insert("pg_batting", pg_expr.to_pyarrow())

To sanity check what we’ve done let’s peek at the table.

snow_table
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┓
┃ YEAR_ID  TEAM_ID  AVG_RBI   ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━┩
│ int64stringfloat64   │
├─────────┼─────────┼───────────┤
│    1891PIT    22.782609 │
│    1895BSN    34.363636 │
│    1940SLA    22.343750 │
│    1981HOU    9.972973 │
│    1913CLE    13.512821 │
│    1971MON    17.181818 │
│    2008PIT    15.000000 │
│    1895WAS    23.096774 │
│    2011KCA    16.785714 │
│    2007MIL    19.350000 │
│        │
└─────────┴─────────┴───────────┘

We’ll count them too, to be extra sure.

snow_table.count()

3015

Conclusion

In this post we show how easy it is to move data from one backend into Snowflake using Ibis.

Please try it out and get in touch on Zulip or GitHub, we’d love to hear from you!

Back to top