SQL sources, introduced in Malloy version 0.56, replace the previous method of including SQL queries in a Malloy model, SQL blocks.
Sometimes it can be useful to base Malloy models off of SQL queries. You can do so by using the .sql()
connection method.
source: my_sql_source is duckdb.sql(""" SELECT first_name, last_name, gender FROM '../data/users.parquet' LIMIT 10 """)
These SQL sources can be used any place a table source can be used:
run: my_sql_source -> { group_by: first_name aggregate: user_count is count() }
[ { "first_name": "FRANK", "user_count": 1 }, { "first_name": "HAROLD", "user_count": 1 }, { "first_name": "RICHARD", "user_count": 1 }, { "first_name": "CLARENCE", "user_count": 1 }, { "first_name": "JAMES", "user_count": 1 } ]
SELECT base."first_name" as "first_name", COUNT(1) as "user_count" FROM ( SELECT first_name, last_name, gender FROM '../data/users.parquet' LIMIT 10 ) as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Unlike other kinds of sources, SQL sources can be used like a query in some cases.
They can be run directly:
And they can also be defined as a query:
query: my_sql_query is duckdb.sql("select 1 as one") run: my_sql_query
[ { "one": 1 } ]
select 1 as one
Note: you can only run a SQL source as a query when it is defined as a query:
or included directly in a run:
statement.
Extending SQL Sources
Like other kinds of source, SQL sources can be extended to add reusable computations.
source: limited_users is duckdb.sql(""" SELECT first_name, last_name, gender FROM '../data/users.parquet' LIMIT 10 """) extend { measure: user_count is count() } run: limited_users -> { aggregate: user_count }
[ { "user_count": 10 } ]
SELECT COUNT(1) as "user_count" FROM ( SELECT first_name, last_name, gender FROM '../data/users.parquet' LIMIT 10 ) as base
Embedding Malloy Queries in an SQL Block (A.K.A. "Turducken")
Malloy queries can be embedded in SQL blocks as well. When %{
and }
appear inside a """
quoted (but not a "
or '
quoted) string of an SQL source, the Malloy query between the brackets is compiled and replaced with the SELECT
statement generated from the query.
source: users is duckdb.table('../data/users.parquet') source: malloy_in_sql_query is duckdb.sql(""" SELECT * FROM (%{ // Malloy query starts with the %{ users -> { limit: 10 group_by: first_name, last_name, gender aggregate: n_with_this_name is count() } }) -- Malloy query ends after the } WHERE n_with_this_name > 10 """) run: malloy_in_sql_query -> { select: * }
[ { "first_name": "MARY", "gender": "Female", "last_name": "SMITH", "n_with_this_name": 45 }, { "first_name": "JAMES", "gender": "Male", "last_name": "SMITH", "n_with_this_name": 40 }, { "first_name": "JAMES", "gender": "Male", "last_name": "WILLIAMS", "n_with_this_name": 32 }, { "first_name": "ROBERT", "gender": "Male", "last_name": "SMITH", "n_with_this_name": 32 }, { "first_name": "MICHAEL", "gender": "Male", "last_name": "SMITH", "n_with_this_name": 32 } ]
SELECT base."first_name" as "first_name", base."gender" as "gender", base."last_name" as "last_name", base."n_with_this_name" as "n_with_this_name" FROM ( SELECT * FROM (SELECT base."first_name" as "first_name", base."last_name" as "last_name", base."gender" as "gender", COUNT(1) as "n_with_this_name" FROM '../data/users.parquet' as base GROUP BY 1,2,3 ORDER BY 4 desc NULLS LAST LIMIT 10 ) -- Malloy query ends after the } WHERE n_with_this_name > 10 ) as base
We have referred to this feature as "Turducken" because you then take the SQL block and wrap it in an SQL source. It isn't the perfect name for infinite nesting, but it is amusing