BigQuery¶
To use the BigQuery client, you will need a Google Cloud Platform account. Use the BigQuery sandbox to try the service for free.
BigQuery Quickstart¶
Install dependencies for Ibis’s BigQuery dialect:
pip install ibis-framework[bigquery]
Create a client by passing in the project id and dataset id you wish to operate with:
>>> con = ibis.bigquery.connect(project_id='ibis-gbq', dataset_id='testing')
By default ibis assumes that the BigQuery project that’s billed for queries is also the project where the data lives.
However, it’s very easy to query data that does not live in the billing project.
Note
When you run queries against data from other projects the billing project will still be billed for any and all queries.
If you want to query data that lives in a different project than the billing
project you can use the ibis.bigquery.client.BigQueryClient.database()
method of ibis.bigquery.client.BigQueryClient
objects:
>>> db = con.database('other-data-project.other-dataset')
>>> t = db.my_awesome_table
>>> t.sweet_column.sum().execute() # runs against the billing project
API¶
The BigQuery client is accessible through the ibis.bigquery
namespace.
See BigQuery for a tutorial on using this backend.
Use the ibis.bigquery.connect
function to create a BigQuery
client. If no credentials
are provided, the
pydata_google_auth.default()
function fetches default credentials.
|
Create a BigQueryClient for use with Ibis. |
|
Create a database object. |
|
|
|
|
|
Create a table expression. |
The BigQuery client object¶
To use Ibis with BigQuery, you first must connect to BigQuery using the
ibis.bigquery.connect()
function, optionally supplying Google API
credentials:
import ibis
client = ibis.bigquery.connect(
project_id=YOUR_PROJECT_ID,
dataset_id='bigquery-public-data.stackoverflow'
)
User Defined functions (UDF)¶
Note
BigQuery only supports element-wise UDFs at this time.
BigQuery supports UDFs through JavaScript. Ibis provides support for this by turning Python code into JavaScript.
The interface is very similar to the pandas UDF API:
import ibis.expr.datatypes as dt
from ibis.bigquery import udf
@udf([dt.double], dt.double)
def my_bigquery_add_one(x):
return x + 1.0
Ibis will parse the source of the function and turn the resulting Python AST into JavaScript source code (technically, ECMAScript 2015). Most of the Python language is supported including classes, functions and generators.
When you want to use this function you call it like any other Python function–only it must be called on an ibis expression:
t = ibis.table([('a', 'double')])
expr = my_bigquery_add_one(t.a)
print(ibis.bigquery.compile(expr))
Privacy¶
This package is subject to the NumFocus privacy policy. Your use of Google APIs with this module is subject to each API’s respective terms of service.
Google account and user data¶
Accessing user data¶
The connect()
function provides access to data
stored in Google BigQuery and other sources such as Google Sheets or Cloud
Storage, via the federated query feature. Your machine communicates directly
with the Google APIs.
Storing user data¶
By default, your credentials are stored to a local file, such as
~/.config/pydata/ibis.json
. All user data is stored on
your local machine. Use caution when using this library on a shared
machine.
Sharing user data¶
The BigQuery client only communicates with Google APIs. No user data is shared with PyData, NumFocus, or any other servers.
Policies for application authors¶
Do not use the default client ID when using Ibis from an application, library, or tool. Per the Google User Data Policy, your application must accurately represent itself when authenticating to Google API services.
Extending the BigQuery backend¶
Create a Google Cloud project.
Set the
GOOGLE_BIGQUERY_PROJECT_ID
environment variable.Populate test data:
python ci/datamgr.py bigquery
Run the test suite:
pytest ibis/bigquery/tests