This document will assumes a working knowledge of SQL and will rapidly take you through some of Malloy's key language features.
Using this Guide
For every Malloy Query you can see the formatted result, or raw result as JSON, or the SQL used to produce the result.
Click tab to to see the HTML, JSON or SQL result: 👈👈
SQL SELECT vs Malloy's run:
The statement to run a query in Malloy is run:
. There are two types of queries in Malloy, reductions which have group_by:
or aggregate:
statements, and projections which have select:
statements and do not group or aggregate results.
Projection: SELECT with no GROUP BY
In SQL
SELECT code, full_name, state, faa_region, fac_type, elevation FROM `malloy-data.faa.airports` ORDER BY code
Equivalent in Malloy
run: duckdb.table('../data/airports.parquet') -> { select: code, full_name, state, faa_region, fac_type, elevation order_by: code }
[ { "code": "00A", "full_name": "TOTAL RF", "state": "PA", "faa_region": "AEA", "fac_type": "HELIPORT", "elevation": 11 }, { "code": "00C", "full_name": "ANIMAS AIR PARK", "state": "CO", "faa_region": "ANM", "fac_type": "AIRPORT", "elevation": 6684 }, { "code": "00CA", "full_name": "GOLDSTONE /GTS/", "state": "CA", "faa_region": "AWP", "fac_type": "AIRPORT", "elevation": 3038 }, { "code": "00E", "full_name": "AT&T - APACHE JUNCTION", "state": "AZ", "faa_region": "AWP", "fac_type": "HELIPORT", "elevation": 2527 }, { "code": "00F", "full_name": "TCJC-NORTHEAST CAMPUS", "state": "TX", "faa_region": "ASW", "fac_type": "HELIPORT", "elevation": 600 } ]
SELECT base."code" as "code", base."full_name" as "full_name", base."state" as "state", base."faa_region" as "faa_region", base."fac_type" as "fac_type", base."elevation" as "elevation" FROM '../data/airports.parquet' as base ORDER BY 1 ASC NULLS LAST
Reduction: SELECT with GROUP BY and/or aggregation
In SQL
SELECT base.fac_type as fac_type, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as base WHERE base.state='CA' GROUP BY 1 ORDER BY 2 desc
Equivalent in Malloy
run: duckdb.table('../data/airports.parquet') -> { group_by: fac_type aggregate: airport_count is count() where: state = 'CA' order_by: airport_count desc }
[ { "fac_type": "AIRPORT", "airport_count": 569 }, { "fac_type": "HELIPORT", "airport_count": 396 }, { "fac_type": "SEAPLANE BASE", "airport_count": 12 }, { "fac_type": "GLIDERPORT", "airport_count": 3 }, { "fac_type": "STOLPORT", "airport_count": 2 } ]
SELECT base."fac_type" as "fac_type", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base WHERE base."state"='CA' GROUP BY 1 ORDER BY 2 desc NULLS LAST
Source: A data source for queries
Malloy separates a query's view from the source of the data. A source can be thought of as a table and a collection of computations and relationships which are relevant to that table. (Source Documentation).
Fields can be defined as part of a source.
A
measure:
is a declared aggregate calculation (think function that operates across the table) which can be used inaggregate:
elements in a query stageA
dimension:
is a declared scalar calculation which that can be used ingroup_by:
orselect:
elements of a query stage
source: airports is duckdb.table('../data/airports.parquet') extend { dimension: elevation_in_meters is elevation * 0.3048 dimension: state_and_county is concat(state,' - ', county) measure: airport_count is count() measure: avg_elevation_in_meters is elevation_in_meters.avg() }
Querying Against a Source
Queries can be run against source:
objects and can utilize the modeled fields from that source, as well as introduce new ones. (Query Documentation)
using the above declared airports
source
run: airports -> { limit: 10 where: fac_type = 'HELIPORT' group_by: state aggregate: airport_count // <-- declared in source avg_elevation_in_meters // <-- declared in source }
[ { "state": "TX", "airport_count": 435, "avg_elevation_in_meters": 138.68680275862064 }, { "state": "CA", "airport_count": 396, "avg_elevation_in_meters": 276.40203030303024 }, { "state": "PA", "airport_count": 307, "avg_elevation_in_meters": 216.25510358306178 }, { "state": "FL", "airport_count": 280, "avg_elevation_in_meters": 18.100765714285675 }, { "state": "NJ", "airport_count": 247, "avg_elevation_in_meters": 48.149761943319795 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count", AVG((base."elevation"*0.3048::DOUBLE)) as "avg_elevation_in_meters" FROM '../data/airports.parquet' as base WHERE base."fac_type"='HELIPORT' GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 10
Dimensional calculations are no different from columns
using the above declared airports
source
run: airports -> { group_by: state_and_county // <-- declared in source aggregate: airport_count order_by: 1 desc }
[ { "state_and_county": "WY - WESTON", "airport_count": 3 }, { "state_and_county": "WY - WASHAKIE", "airport_count": 4 }, { "state_and_county": "WY - UINTA", "airport_count": 3 }, { "state_and_county": "WY - TETON", "airport_count": 7 }, { "state_and_county": "WY - SWEETWATER", "airport_count": 4 } ]
SELECT CONCAT(base."state",' - ',base."county") as "state_and_county", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST
Defining Views in a Source
A source can also contain a set of useful views relating to that source, which can be run in queries.
using the above declared airports
source
source: airports2 is duckdb.table('../data/airports.parquet') extend { measure: airport_count is count() view: by_state is { // <-- can be called by name group_by: state aggregate: airport_count } }
Executing Views
The simplest form of a query in Malloy is the name of a source, the query operator ->
, and the name of one of its contained views.
using the above declared airports
source
run: airports2 -> by_state
[ { "state": "TX", "airport_count": 1845 }, { "state": "CA", "airport_count": 984 }, { "state": "IL", "airport_count": 890 }, { "state": "FL", "airport_count": 856 }, { "state": "PA", "airport_count": 804 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Filtering a Source
You can filter a source by adding a filter expression using the where:
keyword in an extension. In this example, we filter an existing airports
source and then use this extended version to execute the by_state
view in a query. For more information on filtering, see the Filters section.
run: airports2 extend { where: fac_type = 'SEAPLANE BASE' // <- run the query with an added filter } -> by_state
[ { "state": "AK", "airport_count": 104 }, { "state": "MN", "airport_count": 72 }, { "state": "FL", "airport_count": 43 }, { "state": "ME", "airport_count": 38 }, { "state": "NY", "airport_count": 23 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base WHERE base."fac_type"='SEAPLANE BASE' GROUP BY 1 ORDER BY 2 desc NULLS LAST
Filtering Measures
The input to an aggregate computation can be filtered.
using the above declared airports
source
run: airports -> { group_by: state aggregate: airport_count aggregate: heliport_count is airport_count { where: fac_type = 'HELIPORT' } // <-- add a filter }
[ { "state": "TX", "airport_count": 1845, "heliport_count": 435 }, { "state": "CA", "airport_count": 984, "heliport_count": 396 }, { "state": "IL", "airport_count": 890, "heliport_count": 245 }, { "state": "FL", "airport_count": 856, "heliport_count": 280 }, { "state": "PA", "airport_count": 804, "heliport_count": 307 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count", (COUNT(CASE WHEN base."fac_type"='HELIPORT' THEN 1 END)) as "heliport_count" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Composing with Views
For the next section assume the following source declaration.
source: airports3 is duckdb.table('../data/airports.parquet') extend { measure: airport_count is count() measure: avg_elevation is elevation.avg() view: top_5_states is { group_by: state aggregate: airport_count limit: 5 } view: by_facility_type is { group_by: fac_type aggregate: airport_count } }
The nest:
property embeds one view in another
Malloy allows you to create nested subtables easily in a query. In the case below, the top level view groups by state and the nested view groups by facility type. This mechanism is really useful for understanding data and creating complex data structures. (Nesting Documentation)
using the above declared airports
source
run: airports3 -> { group_by: state aggregate: airport_count limit: 5 nest: by_facility_type is { group_by: fac_type aggregate: airport_count } }
[ { "state": "TX", "airport_count": 1845, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1389 }, { "fac_type": "HELIPORT", "airport_count": 435 }, { "fac_type": "ULTRALIGHT", "airport_count": 8 }, { "fac_type": "STOLPORT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 5 } ] }, { "state": "CA", "airport_count": 984, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 569 }, { "fac_type": "HELIPORT", "airport_count": 396 }, { "fac_type": "SEAPLANE BASE", "airport_count": 12 }, { "fac_type": "GLIDERPORT", "airport_count": 3 }, { "fac_type": "ULTRALIGHT", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 2 } ] }, { "state": "IL", "airport_count": 890, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 625 }, { "fac_type": "HELIPORT", "airport_count": 245 }, { "fac_type": "SEAPLANE BASE", "airport_count": 8 }, { "fac_type": "ULTRALIGHT", "airport_count": 6 }, { "fac_type": "BALLOONPORT", "airport_count": 2 }, { "fac_type": "GLIDERPORT", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 2 } ] }, { "state": "FL", "airport_count": 856, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 511 }, { "fac_type": "HELIPORT", "airport_count": 280 }, { "fac_type": "SEAPLANE BASE", "airport_count": 43 }, { "fac_type": "STOLPORT", "airport_count": 13 }, { "fac_type": "ULTRALIGHT", "airport_count": 5 }, { "fac_type": "GLIDERPORT", "airport_count": 4 } ] }, { "state": "PA", "airport_count": 804, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 468 }, { "fac_type": "HELIPORT", "airport_count": 307 }, { "fac_type": "ULTRALIGHT", "airport_count": 13 }, { "fac_type": "SEAPLANE BASE", "airport_count": 10 }, { "fac_type": "STOLPORT", "airport_count": 3 }, { "fac_type": "GLIDERPORT", "airport_count": 3 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."state" as "state__0", CASE WHEN group_set=0 THEN COUNT(1) END as "airport_count__0", CASE WHEN group_set=1 THEN base."fac_type" END as "fac_type__1", CASE WHEN group_set=1 THEN COUNT(1) END as "airport_count__1" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,4 ) SELECT "state__0" as "state", MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count", COALESCE(LIST({ "fac_type": "fac_type__1", "airport_count": "airport_count__1"} ORDER BY "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type" FROM __stage0 GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
Queries can contain multiple nested views.
using the above declared airports
source
run: airports3 -> { group_by: faa_region aggregate: airport_count nest: top_5_states nest: by_facility_type }
[ { "faa_region": "AGL", "airport_count": 4437, "top_5_states": [ { "state": "IL", "airport_count": 890 }, { "state": "OH", "airport_count": 749 }, { "state": "IN", "airport_count": 643 }, { "state": "WI", "airport_count": 543 }, { "state": "MN", "airport_count": 507 } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 3443 }, { "fac_type": "HELIPORT", "airport_count": 826 }, { "fac_type": "SEAPLANE BASE", "airport_count": 119 }, { "fac_type": "ULTRALIGHT", "airport_count": 30 }, { "fac_type": "STOLPORT", "airport_count": 11 }, { "fac_type": "BALLOONPORT", "airport_count": 4 }, { "fac_type": "GLIDERPORT", "airport_count": 4 } ] }, { "faa_region": "ASW", "airport_count": 3268, "top_5_states": [ { "state": "TX", "airport_count": 1845 }, { "state": "LA", "airport_count": 500 }, { "state": "OK", "airport_count": 443 }, { "state": "AR", "airport_count": 299 }, { "state": "NM", "airport_count": 181 } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 2341 }, { "fac_type": "HELIPORT", "airport_count": 861 }, { "fac_type": "ULTRALIGHT", "airport_count": 32 }, { "fac_type": "SEAPLANE BASE", "airport_count": 19 }, { "fac_type": "STOLPORT", "airport_count": 9 }, { "fac_type": "GLIDERPORT", "airport_count": 6 } ] }, { "faa_region": "ASO", "airport_count": 2924, "top_5_states": [ { "state": "FL", "airport_count": 856 }, { "state": "GA", "airport_count": 440 }, { "state": "NC", "airport_count": 400 }, { "state": "TN", "airport_count": 285 }, { "state": "AL", "airport_count": 260 } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 2038 }, { "fac_type": "HELIPORT", "airport_count": 770 }, { "fac_type": "SEAPLANE BASE", "airport_count": 57 }, { "fac_type": "STOLPORT", "airport_count": 33 }, { "fac_type": "ULTRALIGHT", "airport_count": 17 }, { "fac_type": "GLIDERPORT", "airport_count": 8 }, { "fac_type": "BALLOONPORT", "airport_count": 1 } ] }, { "faa_region": "AEA", "airport_count": 2586, "top_5_states": [ { "state": "PA", "airport_count": 804 }, { "state": "NY", "airport_count": 576 }, { "state": "VA", "airport_count": 421 }, { "state": "NJ", "airport_count": 378 }, { "state": "MD", "airport_count": 229 } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1525 }, { "fac_type": "HELIPORT", "airport_count": 964 }, { "fac_type": "SEAPLANE BASE", "airport_count": 61 }, { "fac_type": "ULTRALIGHT", "airport_count": 18 }, { "fac_type": "STOLPORT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 7 }, { "fac_type": "BALLOONPORT", "airport_count": 3 } ] }, { "faa_region": "ANM", "airport_count": 2102, "top_5_states": [ { "state": "WA", "airport_count": 484 }, { "state": "OR", "airport_count": 441 }, { "state": "CO", "airport_count": 425 }, { "state": "MT", "airport_count": 259 }, { "state": "ID", "airport_count": 238 } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1524 }, { "fac_type": "HELIPORT", "airport_count": 527 }, { "fac_type": "SEAPLANE BASE", "airport_count": 25 }, { "fac_type": "STOLPORT", "airport_count": 13 }, { "fac_type": "ULTRALIGHT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 4 }, { "fac_type": "BALLOONPORT", "airport_count": 1 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."faa_region" as "faa_region__0", CASE WHEN group_set=0 THEN COUNT(1) END as "airport_count__0", CASE WHEN group_set=1 THEN base."state" END as "state__1", CASE WHEN group_set=1 THEN COUNT(1) END as "airport_count__1", CASE WHEN group_set=2 THEN base."fac_type" END as "fac_type__2", CASE WHEN group_set=2 THEN COUNT(1) END as "airport_count__2" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,2,1)) as group_set ) as group_set GROUP BY 1,2,4,6 ) SELECT "faa_region__0" as "faa_region", MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count", COALESCE(LIST({ "state": "state__1", "airport_count": "airport_count__1"} ORDER BY "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states", COALESCE(LIST({ "fac_type": "fac_type__2", "airport_count": "airport_count__2"} ORDER BY "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2),[]) as "by_facility_type" FROM __stage0 GROUP BY 1 ORDER BY 2 desc NULLS LAST
Views can be nested to any level of depth.
using the above declared airports
source
run: airports3 -> { group_by: faa_region aggregate: airport_count nest: by_state_and_county is { group_by: state aggregate: airport_count nest: by_county is { group_by: county aggregate: airport_count limit: 4 } } nest: by_facility_type }
[ { "faa_region": "AGL", "airport_count": 4437, "by_state_and_county": [ { "state": "IL", "airport_count": 890, "by_county": [ { "county": "COOK", "airport_count": 51 }, { "county": "LA SALLE", "airport_count": 39 }, { "county": "MC HENRY", "airport_count": 29 }, { "county": "DE KALB", "airport_count": 27 } ] }, { "state": "OH", "airport_count": 749, "by_county": [ { "county": "FRANKLIN", "airport_count": 27 }, { "county": "CUYAHOGA", "airport_count": 27 }, { "county": "STARK", "airport_count": 23 }, { "county": "MONTGOMERY", "airport_count": 22 } ] }, { "state": "IN", "airport_count": 643, "by_county": [ { "county": "MARION", "airport_count": 27 }, { "county": "ALLEN", "airport_count": 24 }, { "county": "HAMILTON", "airport_count": 20 }, { "county": "MARSHALL", "airport_count": 18 } ] }, { "state": "WI", "airport_count": 543, "by_county": [ { "county": "DANE", "airport_count": 30 }, { "county": "WALWORTH", "airport_count": 22 }, { "county": "WINNEBAGO", "airport_count": 17 }, { "county": "KENOSHA", "airport_count": 16 } ] }, { "state": "MN", "airport_count": 507, "by_county": [ { "county": "ST LOUIS", "airport_count": 28 }, { "county": "HENNEPIN", "airport_count": 23 }, { "county": "DAKOTA", "airport_count": 17 }, { "county": "CROW WING", "airport_count": 17 } ] }, { "state": "MI", "airport_count": 489, "by_county": [ { "county": "OAKLAND", "airport_count": 25 }, { "county": "KENT", "airport_count": 24 }, { "county": "WAYNE", "airport_count": 17 }, { "county": "LIVINGSTON", "airport_count": 15 } ] }, { "state": "ND", "airport_count": 436, "by_county": [ { "county": "CASS", "airport_count": 39 }, { "county": "MC LEAN", "airport_count": 20 }, { "county": "RICHLAND", "airport_count": 20 }, { "county": "WARD", "airport_count": 19 } ] }, { "state": "SD", "airport_count": 180, "by_county": [ { "county": "MEADE", "airport_count": 7 }, { "county": "PENNINGTON", "airport_count": 7 }, { "county": "MINNEHAHA", "airport_count": 7 }, { "county": "BRULE", "airport_count": 6 } ] } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 3443 }, { "fac_type": "HELIPORT", "airport_count": 826 }, { "fac_type": "SEAPLANE BASE", "airport_count": 119 }, { "fac_type": "ULTRALIGHT", "airport_count": 30 }, { "fac_type": "STOLPORT", "airport_count": 11 }, { "fac_type": "GLIDERPORT", "airport_count": 4 }, { "fac_type": "BALLOONPORT", "airport_count": 4 } ] }, { "faa_region": "ASW", "airport_count": 3268, "by_state_and_county": [ { "state": "TX", "airport_count": 1845, "by_county": [ { "county": "HARRIS", "airport_count": 135 }, { "county": "TARRANT", "airport_count": 63 }, { "county": "DENTON", "airport_count": 53 }, { "county": "DALLAS", "airport_count": 42 } ] }, { "state": "LA", "airport_count": 500, "by_county": [ { "county": "PLAQUEMINES", "airport_count": 31 }, { "county": "VERMILION", "airport_count": 29 }, { "county": "CALCASIEU", "airport_count": 23 }, { "county": "LAFOURCHE", "airport_count": 21 } ] }, { "state": "OK", "airport_count": 443, "by_county": [ { "county": "OKLAHOMA", "airport_count": 31 }, { "county": "TULSA", "airport_count": 25 }, { "county": "ROGERS", "airport_count": 16 }, { "county": "DELAWARE", "airport_count": 13 } ] }, { "state": "AR", "airport_count": 299, "by_county": [ { "county": "PULASKI", "airport_count": 20 }, { "county": "BENTON", "airport_count": 19 }, { "county": "LONOKE", "airport_count": 13 }, { "county": "MISSISSIPPI", "airport_count": 10 } ] }, { "state": "NM", "airport_count": 181, "by_county": [ { "county": "CATRON", "airport_count": 13 }, { "county": "LINCOLN", "airport_count": 10 }, { "county": "DONA ANA", "airport_count": 9 }, { "county": "SANTA FE", "airport_count": 9 } ] } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 2341 }, { "fac_type": "HELIPORT", "airport_count": 861 }, { "fac_type": "ULTRALIGHT", "airport_count": 32 }, { "fac_type": "SEAPLANE BASE", "airport_count": 19 }, { "fac_type": "STOLPORT", "airport_count": 9 }, { "fac_type": "GLIDERPORT", "airport_count": 6 } ] }, { "faa_region": "ASO", "airport_count": 2924, "by_state_and_county": [ { "state": "FL", "airport_count": 856, "by_county": [ { "county": "PALM BEACH", "airport_count": 45 }, { "county": "DADE", "airport_count": 44 }, { "county": "POLK", "airport_count": 43 }, { "county": "MARION", "airport_count": 37 } ] }, { "state": "GA", "airport_count": 440, "by_county": [ { "county": "FULTON", "airport_count": 22 }, { "county": "PIKE", "airport_count": 17 }, { "county": "CARROLL", "airport_count": 14 }, { "county": "FAYETTE", "airport_count": 12 } ] }, { "state": "NC", "airport_count": 400, "by_county": [ { "county": "WAKE", "airport_count": 15 }, { "county": "UNION", "airport_count": 14 }, { "county": "MECKLENBURG", "airport_count": 14 }, { "county": "ROWAN", "airport_count": 14 } ] }, { "state": "TN", "airport_count": 285, "by_county": [ { "county": "SHELBY", "airport_count": 24 }, { "county": "DAVIDSON", "airport_count": 17 }, { "county": "KNOX", "airport_count": 14 }, { "county": "RUTHERFORD", "airport_count": 10 } ] }, { "state": "AL", "airport_count": 260, "by_county": [ { "county": "BALDWIN", "airport_count": 25 }, { "county": "JEFFERSON", "airport_count": 19 }, { "county": "MOBILE", "airport_count": 17 }, { "county": "MADISON", "airport_count": 14 } ] }, { "state": "MS", "airport_count": 243, "by_county": [ { "county": "WASHINGTON", "airport_count": 15 }, { "county": "HARRISON", "airport_count": 12 }, { "county": "HINDS", "airport_count": 11 }, { "county": "BOLIVAR", "airport_count": 9 } ] }, { "state": "KY", "airport_count": 202, "by_county": [ { "county": "JEFFERSON", "airport_count": 13 }, { "county": "FAYETTE", "airport_count": 7 }, { "county": "BOONE", "airport_count": 6 }, { "county": "MC LEAN", "airport_count": 6 } ] }, { "state": "SC", "airport_count": 189, "by_county": [ { "county": "GREENVILLE", "airport_count": 13 }, { "county": "LEXINGTON", "airport_count": 9 }, { "county": "CHARLESTON", "airport_count": 9 }, { "county": "BEAUFORT", "airport_count": 9 } ] }, { "state": "PR", "airport_count": 40, "by_county": [ { "county": "--PUERTO RICO", "airport_count": 40 } ] }, { "state": "VI", "airport_count": 9, "by_county": [ { "county": "-VIRGIN ISLANDS-", "airport_count": 9 } ] } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 2038 }, { "fac_type": "HELIPORT", "airport_count": 770 }, { "fac_type": "SEAPLANE BASE", "airport_count": 57 }, { "fac_type": "STOLPORT", "airport_count": 33 }, { "fac_type": "ULTRALIGHT", "airport_count": 17 }, { "fac_type": "GLIDERPORT", "airport_count": 8 }, { "fac_type": "BALLOONPORT", "airport_count": 1 } ] }, { "faa_region": "AEA", "airport_count": 2586, "by_state_and_county": [ { "state": "PA", "airport_count": 804, "by_county": [ { "county": "BUCKS", "airport_count": 55 }, { "county": "MONTGOMERY", "airport_count": 44 }, { "county": "ALLEGHENY", "airport_count": 31 }, { "county": "CHESTER", "airport_count": 27 } ] }, { "state": "NY", "airport_count": 576, "by_county": [ { "county": "SUFFOLK", "airport_count": 34 }, { "county": "ERIE", "airport_count": 26 }, { "county": "NIAGARA", "airport_count": 20 }, { "county": "DUTCHESS", "airport_count": 20 } ] }, { "state": "VA", "airport_count": 421, "by_county": [ { "county": "FAUQUIER", "airport_count": 23 }, { "county": "SHENANDOAH", "airport_count": 13 }, { "county": "FAIRFAX", "airport_count": 12 }, { "county": "ACCOMACK", "airport_count": 12 } ] }, { "state": "NJ", "airport_count": 378, "by_county": [ { "county": "MONMOUTH", "airport_count": 31 }, { "county": "MIDDLESEX", "airport_count": 29 }, { "county": "BURLINGTON", "airport_count": 29 }, { "county": "HUNTERDON", "airport_count": 27 } ] }, { "state": "MD", "airport_count": 229, "by_county": [ { "county": "BALTIMORE", "airport_count": 24 }, { "county": "PRINCE GEORGES", "airport_count": 14 }, { "county": "CARROLL", "airport_count": 14 }, { "county": "ANNE ARUNDEL", "airport_count": 14 } ] }, { "state": "WV", "airport_count": 116, "by_county": [ { "county": "KANAWHA", "airport_count": 8 }, { "county": "MASON", "airport_count": 7 }, { "county": "PRESTON", "airport_count": 6 }, { "county": "RALEIGH", "airport_count": 6 } ] }, { "state": "DE", "airport_count": 42, "by_county": [ { "county": "KENT", "airport_count": 17 }, { "county": "SUSSEX", "airport_count": 14 }, { "county": "NEW CASTLE", "airport_count": 11 } ] }, { "state": "DC", "airport_count": 20, "by_county": [ { "county": "WASHINGTON", "airport_count": 18 }, { "county": "ARLINGTON", "airport_count": 1 }, { "county": "LOUDOUN", "airport_count": 1 } ] } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1525 }, { "fac_type": "HELIPORT", "airport_count": 964 }, { "fac_type": "SEAPLANE BASE", "airport_count": 61 }, { "fac_type": "ULTRALIGHT", "airport_count": 18 }, { "fac_type": "STOLPORT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 7 }, { "fac_type": "BALLOONPORT", "airport_count": 3 } ] }, { "faa_region": "ANM", "airport_count": 2102, "by_state_and_county": [ { "state": "WA", "airport_count": 484, "by_county": [ { "county": "KING", "airport_count": 61 }, { "county": "PIERCE", "airport_count": 27 }, { "county": "SPOKANE", "airport_count": 26 }, { "county": "SNOHOMISH", "airport_count": 25 } ] }, { "state": "OR", "airport_count": 441, "by_county": [ { "county": "CLACKAMAS", "airport_count": 34 }, { "county": "LINN", "airport_count": 26 }, { "county": "WASHINGTON", "airport_count": 24 }, { "county": "LANE", "airport_count": 24 } ] }, { "state": "CO", "airport_count": 425, "by_county": [ { "county": "WELD", "airport_count": 40 }, { "county": "EL PASO", "airport_count": 26 }, { "county": "ADAMS", "airport_count": 23 }, { "county": "JEFFERSON", "airport_count": 21 } ] }, { "state": "MT", "airport_count": 259, "by_county": [ { "county": "FLATHEAD", "airport_count": 20 }, { "county": "LEWIS AND CLARK", "airport_count": 16 }, { "county": "MISSOULA", "airport_count": 14 }, { "county": "GALLATIN", "airport_count": 11 } ] }, { "state": "ID", "airport_count": 238, "by_county": [ { "county": "VALLEY", "airport_count": 27 }, { "county": "KOOTENAI", "airport_count": 21 }, { "county": "BONNER", "airport_count": 18 }, { "county": "IDAHO", "airport_count": 18 } ] }, { "state": "UT", "airport_count": 140, "by_county": [ { "county": "SALT LAKE", "airport_count": 21 }, { "county": "SAN JUAN", "airport_count": 13 }, { "county": "UTAH", "airport_count": 10 }, { "county": "GRAND", "airport_count": 9 } ] }, { "state": "WY", "airport_count": 115, "by_county": [ { "county": "LARAMIE", "airport_count": 12 }, { "county": "PARK", "airport_count": 9 }, { "county": "CAMPBELL", "airport_count": 9 }, { "county": "CARBON", "airport_count": 9 } ] } ], "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1524 }, { "fac_type": "HELIPORT", "airport_count": 527 }, { "fac_type": "SEAPLANE BASE", "airport_count": 25 }, { "fac_type": "STOLPORT", "airport_count": 13 }, { "fac_type": "ULTRALIGHT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 4 }, { "fac_type": "BALLOONPORT", "airport_count": 1 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."faa_region" as "faa_region__0", CASE WHEN group_set=0 THEN COUNT(1) END as "airport_count__0", CASE WHEN group_set IN (1,2) THEN base."state" END as "state__1", CASE WHEN group_set=1 THEN COUNT(1) END as "airport_count__1", CASE WHEN group_set=2 THEN base."county" END as "county__2", CASE WHEN group_set=2 THEN COUNT(1) END as "airport_count__2", CASE WHEN group_set=3 THEN base."fac_type" END as "fac_type__3", CASE WHEN group_set=3 THEN COUNT(1) END as "airport_count__3" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,3,1)) as group_set ) as group_set GROUP BY 1,2,4,6,8 ) , __stage1 AS ( SELECT CASE WHEN group_set=2 THEN 1 ELSE group_set END as group_set, "faa_region__0" as "faa_region__0", FIRST("airport_count__0") FILTER (WHERE "airport_count__0" IS NOT NULL) as "airport_count__0", CASE WHEN group_set IN (1,2) THEN "state__1" END as "state__1", FIRST("airport_count__1") FILTER (WHERE "airport_count__1" IS NOT NULL) as "airport_count__1", COALESCE(LIST({ "county": "county__2", "airport_count": "airport_count__2"} ORDER BY "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2)[1:4],[]) as "by_county__1", CASE WHEN group_set=3 THEN "fac_type__3" END as "fac_type__3", FIRST("airport_count__3") FILTER (WHERE "airport_count__3" IS NOT NULL) as "airport_count__3" FROM __stage0 GROUP BY 1,2,4,7 ) SELECT "faa_region__0" as "faa_region", MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count", COALESCE(LIST({ "state": "state__1", "airport_count": "airport_count__1", "by_county": "by_county__1"} ORDER BY "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_state_and_county", COALESCE(LIST({ "fac_type": "fac_type__3", "airport_count": "airport_count__3"} ORDER BY "airport_count__3" desc NULLS LAST) FILTER (WHERE group_set=3),[]) as "by_facility_type" FROM __stage1 GROUP BY 1 ORDER BY 2 desc NULLS LAST
Refining a View
The gesture + { refinements }
allows you to base a new view on an existing view while adding new refinements to the query terms.
For example we can add a limit and an order by to by_state
run: airports2 -> by_state + { order_by: state desc // <-- add order by to query limit: 2 }
[ { "state": "WY", "airport_count": 115 }, { "state": "WV", "airport_count": 116 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST LIMIT 2
is the same as
run: airports -> { group_by: state aggregate: airport_count order_by: state desc limit: 2 }
[ { "state": "WY", "airport_count": 115 }, { "state": "WV", "airport_count": 116 } ]
SELECT base."state" as "state", COUNT(1) as "airport_count" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST LIMIT 2
You can add a measure or dimension
run: airports3 -> by_facility_type + { aggregate: avg_elevation }
[ { "fac_type": "AIRPORT", "airport_count": 13925, "avg_elevation": 1237.0441651705567 }, { "fac_type": "HELIPORT", "airport_count": 5135, "avg_elevation": 950.5125608568646 }, { "fac_type": "SEAPLANE BASE", "airport_count": 473, "avg_elevation": 488.82241014799155 }, { "fac_type": "ULTRALIGHT", "airport_count": 125, "avg_elevation": 806.144 }, { "fac_type": "STOLPORT", "airport_count": 86, "avg_elevation": 1375.046511627907 } ]
SELECT base."fac_type" as "fac_type", COUNT(1) as "airport_count", AVG(base."elevation") as "avg_elevation" FROM '../data/airports.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
You can nest another view
run: airports3 -> top_5_states + { nest: by_facility_type }
[ { "state": "TX", "airport_count": 1845, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 1389 }, { "fac_type": "HELIPORT", "airport_count": 435 }, { "fac_type": "ULTRALIGHT", "airport_count": 8 }, { "fac_type": "STOLPORT", "airport_count": 8 }, { "fac_type": "GLIDERPORT", "airport_count": 5 } ] }, { "state": "CA", "airport_count": 984, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 569 }, { "fac_type": "HELIPORT", "airport_count": 396 }, { "fac_type": "SEAPLANE BASE", "airport_count": 12 }, { "fac_type": "GLIDERPORT", "airport_count": 3 }, { "fac_type": "STOLPORT", "airport_count": 2 }, { "fac_type": "ULTRALIGHT", "airport_count": 2 } ] }, { "state": "IL", "airport_count": 890, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 625 }, { "fac_type": "HELIPORT", "airport_count": 245 }, { "fac_type": "SEAPLANE BASE", "airport_count": 8 }, { "fac_type": "ULTRALIGHT", "airport_count": 6 }, { "fac_type": "BALLOONPORT", "airport_count": 2 }, { "fac_type": "GLIDERPORT", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 2 } ] }, { "state": "FL", "airport_count": 856, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 511 }, { "fac_type": "HELIPORT", "airport_count": 280 }, { "fac_type": "SEAPLANE BASE", "airport_count": 43 }, { "fac_type": "STOLPORT", "airport_count": 13 }, { "fac_type": "ULTRALIGHT", "airport_count": 5 }, { "fac_type": "GLIDERPORT", "airport_count": 4 } ] }, { "state": "PA", "airport_count": 804, "by_facility_type": [ { "fac_type": "AIRPORT", "airport_count": 468 }, { "fac_type": "HELIPORT", "airport_count": 307 }, { "fac_type": "ULTRALIGHT", "airport_count": 13 }, { "fac_type": "SEAPLANE BASE", "airport_count": 10 }, { "fac_type": "STOLPORT", "airport_count": 3 }, { "fac_type": "GLIDERPORT", "airport_count": 3 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."state" as "state__0", CASE WHEN group_set=0 THEN COUNT(1) END as "airport_count__0", CASE WHEN group_set=1 THEN base."fac_type" END as "fac_type__1", CASE WHEN group_set=1 THEN COUNT(1) END as "airport_count__1" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,4 ) SELECT "state__0" as "state", MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count", COALESCE(LIST({ "fac_type": "fac_type__1", "airport_count": "airport_count__1"} ORDER BY "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type" FROM __stage0 GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
Changing the inner and outer query in the example above reveals very different information.
run: airports3 -> by_facility_type + { nest: top_5_states }
[ { "fac_type": "AIRPORT", "airport_count": 13925, "top_5_states": [ { "state": "TX", "airport_count": 1389 }, { "state": "IL", "airport_count": 625 }, { "state": "CA", "airport_count": 569 }, { "state": "OH", "airport_count": 537 }, { "state": "FL", "airport_count": 511 } ] }, { "fac_type": "HELIPORT", "airport_count": 5135, "top_5_states": [ { "state": "TX", "airport_count": 435 }, { "state": "CA", "airport_count": 396 }, { "state": "PA", "airport_count": 307 }, { "state": "FL", "airport_count": 280 }, { "state": "NJ", "airport_count": 247 } ] }, { "fac_type": "SEAPLANE BASE", "airport_count": 473, "top_5_states": [ { "state": "AK", "airport_count": 104 }, { "state": "MN", "airport_count": 72 }, { "state": "FL", "airport_count": 43 }, { "state": "ME", "airport_count": 38 }, { "state": "NY", "airport_count": 23 } ] }, { "fac_type": "ULTRALIGHT", "airport_count": 125, "top_5_states": [ { "state": "LA", "airport_count": 18 }, { "state": "IN", "airport_count": 17 }, { "state": "PA", "airport_count": 13 }, { "state": "TX", "airport_count": 8 }, { "state": "AZ", "airport_count": 7 } ] }, { "fac_type": "STOLPORT", "airport_count": 86, "top_5_states": [ { "state": "FL", "airport_count": 13 }, { "state": "TN", "airport_count": 9 }, { "state": "TX", "airport_count": 8 }, { "state": "CO", "airport_count": 6 }, { "state": "NC", "airport_count": 4 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."fac_type" as "fac_type__0", CASE WHEN group_set=0 THEN COUNT(1) END as "airport_count__0", CASE WHEN group_set=1 THEN base."state" END as "state__1", CASE WHEN group_set=1 THEN COUNT(1) END as "airport_count__1" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,4 ) SELECT "fac_type__0" as "fac_type", MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count", COALESCE(LIST({ "state": "state__1", "airport_count": "airport_count__1"} ORDER BY "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states" FROM __stage0 GROUP BY 1 ORDER BY 2 desc NULLS LAST
Joining
First let's model some simple tables... (Join Documentation)
Carrier table
simple source declaration used in example below
source: carriers is duckdb.table('../data/carriers.parquet') extend { measure: carrier_count is count() } run: carriers -> { select: * }
[ { "code": "EV", "name": "Atlantic Southeast Airlines", "nickname": "Atlantic Southeast" }, { "code": "NW", "name": "Northwest Airlines", "nickname": "Northwest" }, { "code": "AA", "name": "American Airlines", "nickname": "American" }, { "code": "FL", "name": "Airtran Airways Corporation", "nickname": "Airtran" }, { "code": "B6", "name": "Jetblue Airways", "nickname": "Jetblue" } ]
SELECT base."code" as "code", base."name" as "name", base."nickname" as "nickname" FROM '../data/carriers.parquet' as base
Flights table
simple source declaration used in example below
source: flights is duckdb.table('../data/flights.parquet') extend { measure: flight_count is count() } run: flights -> { select: id2, tail_num, dep_time, carrier, origin, destination, distance, dep_delay limit: 10 }
[ { "id2": 30272525, "tail_num": "N806MD", "dep_time": "2004-11-18T22:32:00.000Z", "carrier": "US", "origin": "PHL", "destination": "ABE", "distance": 55, "dep_delay": -3 }, { "id2": 29742442, "tail_num": "N806MD", "dep_time": "2004-10-12T20:46:00.000Z", "carrier": "US", "origin": "PHL", "destination": "ABE", "distance": 55, "dep_delay": 6 }, { "id2": 30270885, "tail_num": "N816MA", "dep_time": "2004-11-24T10:20:00.000Z", "carrier": "US", "origin": "PHL", "destination": "ABE", "distance": 55, "dep_delay": 0 }, { "id2": 28344746, "tail_num": "N806MD", "dep_time": "2004-08-31T20:30:00.000Z", "carrier": "US", "origin": "PHL", "destination": "ABE", "distance": 55, "dep_delay": 0 }, { "id2": 27898410, "tail_num": "N806MD", "dep_time": "2004-07-27T10:21:00.000Z", "carrier": "US", "origin": "PHL", "destination": "ABE", "distance": 55, "dep_delay": -4 } ]
SELECT base."id2" as "id2", base."tail_num" as "tail_num", base."dep_time" as "dep_time", base."carrier" as "carrier", base."origin" as "origin", base."destination" as "destination", base."distance" as "distance", base."dep_delay" as "dep_delay" FROM '../data/flights.parquet' as base LIMIT 10
Declare a Join
Join carriers to flights. Each flight has one carrier so we use join_one:
.
(Join Documentation)
source: carriers2 is duckdb.table('../data/carriers.parquet') extend { measure: carrier_count is count() } source: flights2 is duckdb.table('../data/flights.parquet') extend { join_one: carriers2 on carrier = carriers2.code measure: flight_count is count() total_distance is distance.sum() avg_distance is distance.avg() }
Query the joined tables
using the above declared flights
source
run: flights2 -> { group_by: carriers2.nickname aggregate: flight_count total_distance avg_distance }
[ { "nickname": "Southwest", "flight_count": 88751, "total_distance": 54619152, "avg_distance": 615.4201304774031 }, { "nickname": "USAir", "flight_count": 37683, "total_distance": 23721642, "avg_distance": 629.5051349414855 }, { "nickname": "American", "flight_count": 34577, "total_distance": 37684885, "avg_distance": 1089.8830147207682 }, { "nickname": "Northwest", "flight_count": 33580, "total_distance": 33376503, "avg_distance": 993.9399344848124 }, { "nickname": "United", "flight_count": 32757, "total_distance": 38882934, "avg_distance": 1187.0114479347926 } ]
SELECT carriers2_0."nickname" as "nickname", COUNT(1) as "flight_count", COALESCE(SUM(base."distance"),0) as "total_distance", AVG(base."distance") as "avg_distance" FROM '../data/flights.parquet' as base LEFT JOIN '../data/carriers.parquet' AS carriers2_0 ON base."carrier"=carriers2_0."code" GROUP BY 1 ORDER BY 2 desc NULLS LAST
Aggregates can be computed from anywhere in the Join Tree
using the above declared flights
source