Using Malloy in Jupyter Notebooks

VSCode

Malloy provides built-in IPython magic commands that allow you to define Malloy models and run Malloy queries inside of Jupyter notebooks. Click here to see a simple example of this in Colab.

Installation

In a Jupyter notebook, simply run the following commands to install Malloy from PyPI, and load the extension into your kernel:

!pip install malloy
%reload_ext malloy

Usage

There are two IPython cell magic functions available: %%malloy_model and %%malloy_query

This magic function is used to declare or import a Malloy model. It takes a single parameter -- the variable name in which the Malloy model is stored. The contents of the cell can either be an import statement that points to a Malloy file containing sources and/or queries, or it can be a text field containing Malloy sources and/or queries.

Examples:

The following statement looks for a file called flights.malloy, attempts to read the file, and load any model contents (sources or queries) into a model called flights.

%%malloy_model flights
import 'flights.malloy'

The flights model can then be referenced in future Malloy magic cells.

The following statement defines a model called flights using the cell's input text as the model definition:

%%malloy_model flights

source: airports is duckdb.table('../data/airports.parquet') extend {
  primary_key: code
  measure: airport_count is count()
}

source: flights is duckdb.table('../data/flights.parquet') extend {
  rename:
    origin_code is origin
    destination_code is destination

  join_one: origin is airports with origin_code
  join_one: destination is airports with destination_code

  measure:
    flight_count is count()
    destination_count is destination.count()
    origin_count is origin.count()
}

%%malloy_query

This magic function is used the execute a Malloy query. It has one required parameter: the name of the model to use. It optionally takes a second parameter: the variable name of the Pandas dataframe in which to store the result.

Examples:

The following statement loads the flights model and executes a query using the definitions in the file.

%% malloy_query flights
run: flights -> {
  group_by:
    origin.full_name

  aggregate:
    flight_count
    destination_count
}

The following statement executes a query, and stores the results in a pandas dataframe called result_df. This dataframe can then be used in any downstream process.

%% malloy_query flights result_df
run: flights -> {
  group_by:
    origin.full_name

  aggregate:
    flight_count
    destination_count
}

Connecting to BigQuery

The examples above all query files from DuckDB, but it's relatively easy to work with data in BigQuery inside of your Jupyter notebook. To do this, you'll need to authorize your notebook environment to speak to your GCP project. Do this with the gcloud command, as follows

!gcloud auth application-default login

This command will prompt you to authenticate. Enter 'Y' in the textbox that asks "Do you want to continue (Y/n)?" (Note: this text box can be hidden/hard to find in some notebook renderers!). This will give you a link that takes you to a separate window that asks you to authenticate with your Google account. Select the correct account, copy the authentication code, and enter it into the input text field:

Once you've authenticated to GCP, you should be able to issue queries to BigQuery. You can define a Source that points to a table in BQ, and write a query against that source. Try running something against one of the BigQuery public datasets to confirm things are working.

First, create a cell that defines source on the GA4 sample data:

%%malloy_source ga4
source: events_table is bigquery.table('bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*')

Then create a cell that queries this source:

%%malloy_query ga4
run: events_table -> {
  group_by: event_name
  aggregate: row_count is count()
}