Icy IO: loading local files with Snowflake

blog
snowflake
io
productivity
Author

Phillip Cloud

Published

August 31, 2023

Loading files without Ibis

It can be challenging to load local files into Snowflake from Python.

Here’s how to load a CSV file into Snowflake without Ibis.

CREATE TEMP STAGE load_csv_stage;

CREATE TEMP FILE FORMAT load_csv_format
TYPE = CSV PARSE_HEADER = TRUE;

PUT 'file:///path/to/my.csv' @load_csv_stage;

CREATE TEMP TABLE my_table
USING TEMPLATE (
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
    FROM TABLE(
        INFER_SCHEMA(
            LOCATION => '@load_csv_stage',
            FILE_FORMAT => 'load_csv_format'
        )
    )
);

COPY INTO my_table
FROM @load_csv_stage
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1);
1
Creates a temporary stage in Snowflake. Stages are locations in Snowflake that hold files. They can be used to store raw files to load into tables. TEMP stages are only accessible to the current session and will be dropped when the session ends.
2
A file format is a set of instructions for how to interpret a file. File formats are where you specify parsing and some loading options for your files.
3
PUT copies a file or glob pattern matching one or more files to a stage.
4
Creates a temporary table with schema inferred using Snowflake’s INFER_SCHEMA table function.
5
USING TEMPLATE is a Snowflake-specific syntax that allows you to specify a set of column definitions computed from staged files.
6
INFER_SCHEMA is a powerful feature of Snowflake that allows you to load files without having to compute the schema in client code.
7
COPY INTO loads the staged data into the created temporary table.

Snowflake provides the full set of primitives required to achieve this, but composing them together can be challenging. Some users struggle to remember the sequence of steps.

Loading files with Ibis

Let’s take a look at how ibis turns the above process into a single line of Python.

First, we connect to snowflake:

import os
from ibis.interactive import *

con = ibis.connect(os.environ["SNOWFLAKE_URL"])

read_csv

Loading CSV files is now a single line of familiar Python code:

diamonds = con.read_csv("diamonds.csv")
diamonds
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ carat          cut        color   clarity  depth          table          price  x              y              z             ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ decimal(3, 2)stringstringstringdecimal(3, 1)decimal(3, 1)int64decimal(4, 2)decimal(4, 2)decimal(4, 2) │
├───────────────┼───────────┼────────┼─────────┼───────────────┼───────────────┼───────┼───────────────┼───────────────┼───────────────┤
│          0.23Ideal    E     SI2    61.555.03263.953.982.43 │
│          0.21Premium  E     SI1    59.861.03263.893.842.31 │
│          0.23Good     E     VS1    56.965.03274.054.072.31 │
│          0.29Premium  I     VS2    62.458.03344.204.232.63 │
│          0.31Good     J     SI2    63.358.03354.344.352.75 │
│          0.24Very GoodJ     VVS2   62.857.03363.943.962.48 │
│          0.24Very GoodI     VVS1   62.357.03363.953.982.47 │
│          0.26Very GoodH     SI1    61.955.03374.074.112.53 │
│          0.22Fair     E     VS2    65.161.03373.873.782.49 │
│          0.23Very GoodH     VS1    59.461.03384.004.052.39 │
│              │
└───────────────┴───────────┴────────┴─────────┴───────────────┴───────────────┴───────┴───────────────┴───────────────┴───────────────┘

read_parquet

Similarly, loading Parquet files is now a single line of code:

diamonds = con.read_parquet("diamonds.parquet")
diamonds
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ carat    cut        color   clarity  depth    table    price  x        y        z       ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ float64stringstringstringfloat64float64int64float64float64float64 │
├─────────┼───────────┼────────┼─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┤
│    0.23Ideal    E     SI2    61.555.03263.953.982.43 │
│    0.21Premium  E     SI1    59.861.03263.893.842.31 │
│    0.23Good     E     VS1    56.965.03274.054.072.31 │
│    0.29Premium  I     VS2    62.458.03344.204.232.63 │
│    0.31Good     J     SI2    63.358.03354.344.352.75 │
│    0.24Very GoodJ     VVS2   62.857.03363.943.962.48 │
│    0.24Very GoodI     VVS1   62.357.03363.953.982.47 │
│    0.26Very GoodH     SI1    61.955.03374.074.112.53 │
│    0.22Fair     E     VS2    65.161.03373.873.782.49 │
│    0.23Very GoodH     VS1    59.461.03384.004.052.39 │
│        │
└─────────┴───────────┴────────┴─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┘

read_json

Lastly, loading JSON files is now – surprise 🥳 – a single line of code!

Line delimited JSON is supported:

diamonds = con.read_json("diamonds.ndjson")
diamonds
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ carat          clarity  color   cut        depth          price  table          x              y              z             ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ decimal(3, 2)stringstringstringdecimal(3, 1)int64decimal(3, 1)decimal(4, 2)decimal(4, 2)decimal(4, 2) │
├───────────────┼─────────┼────────┼───────────┼───────────────┼───────┼───────────────┼───────────────┼───────────────┼───────────────┤
│          0.23SI2    E     Ideal    61.532655.03.953.982.43 │
│          0.21SI1    E     Premium  59.832661.03.893.842.31 │
│          0.23VS1    E     Good     56.932765.04.054.072.31 │
│          0.29VS2    I     Premium  62.433458.04.204.232.63 │
│          0.31SI2    J     Good     63.333558.04.344.352.75 │
│          0.24VVS2   J     Very Good62.833657.03.943.962.48 │
│          0.24VVS1   I     Very Good62.333657.03.953.982.47 │
│          0.26SI1    H     Very Good61.933755.04.074.112.53 │
│          0.22VS2    E     Fair     65.133761.03.873.782.49 │
│          0.23VS1    H     Very Good59.433861.04.004.052.39 │
│              │
└───────────────┴─────────┴────────┴───────────┴───────────────┴───────┴───────────────┴───────────────┴───────────────┴───────────────┘

As well as strict JSON arrays of objects:

diamonds = con.read_json("diamonds.json")
diamonds
┏━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ carat          clarity  color   cut        depth          price  table          x              y              z             ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ decimal(3, 2)stringstringstringdecimal(3, 1)int64decimal(3, 1)decimal(4, 2)decimal(4, 2)decimal(4, 2) │
├───────────────┼─────────┼────────┼───────────┼───────────────┼───────┼───────────────┼───────────────┼───────────────┼───────────────┤
│          0.23SI2    E     Ideal    61.532655.03.953.982.43 │
│          0.21SI1    E     Premium  59.832661.03.893.842.31 │
│          0.23VS1    E     Good     56.932765.04.054.072.31 │
│          0.29VS2    I     Premium  62.433458.04.204.232.63 │
│          0.31SI2    J     Good     63.333558.04.344.352.75 │
│          0.24VVS2   J     Very Good62.833657.03.943.962.48 │
│          0.24VVS1   I     Very Good62.333657.03.953.982.47 │
│          0.26SI1    H     Very Good61.933755.04.074.112.53 │
│          0.22VS2    E     Fair     65.133761.03.873.782.49 │
│          0.23VS1    H     Very Good59.433861.04.004.052.39 │
│              │
└───────────────┴─────────┴────────┴───────────┴───────────────┴───────┴───────────────┴───────────────┴───────────────┴───────────────┘

Conclusion

Ibis 7.0.0 adds support for read_csv, read_parquet and read_json to the Snowflake backend.

We think you’ll enjoy the increase in productivity these new features bring to the Snowflake backend and we’d love to hear your feedback!

Back to top