Malloy provides a way to compute percent of total through level of detail (ungrouped aggregates) functions. The functions all()
and exclude()
escape grouping in aggregate calculations. These functions are different than window functions as they operate inline with the query and can produce correct results even when the data hits a limit
or is fanned out. Use cases below.
source: flights is duckdb.table('../data/flights.parquet') extend { join_one: carriers is duckdb.table('../data/carriers.parquet') on carrier = carriers.code measure: flight_count is count() }
Totals
Using all()
, you can easily produce an aggregate calculation that includes all the data, not just the data on the current row. Southwest + USAir = 126,434 flights. Notice that all_flights
is the total of all the flights accessible in the query.
run: flights -> { group_by: carriers.nickname aggregate: flight_count all_flights is all(flight_count) limit: 2 }
[ { "nickname": "Southwest", "flight_count": 88751, "all_flights": 344827 }, { "nickname": "USAir", "flight_count": 37683, "all_flights": 344827 } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set=1 THEN carriers_0."nickname" END as "nickname__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1", MAX((CASE WHEN group_set=0 THEN COUNT(1) END)) OVER () as "all_flights__1" FROM '../data/flights.parquet' as base LEFT JOIN '../data/carriers.parquet' AS carriers_0 ON base."carrier"=carriers_0."code" CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2 ) SELECT "nickname__1" as "nickname", MAX(CASE WHEN group_set=1 THEN "flight_count__1" END) as "flight_count", MAX(CASE WHEN group_set=1 THEN "all_flights__1" END) as "all_flights" FROM __stage0 WHERE group_set NOT IN (0) GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 2
Percent of Total
The all()
function is useful for percent of total calculations. The # percent
tags the result so it is displayed as a percentage.
run: flights -> { group_by: carriers.nickname aggregate: flight_count # percent percent_of_flights is flight_count / all(flight_count) limit: 5 }
[ { "nickname": "Southwest", "flight_count": 88751, "percent_of_flights": 0.2573783375431738 }, { "nickname": "USAir", "flight_count": 37683, "percent_of_flights": 0.10928088577750582 }, { "nickname": "American", "flight_count": 34577, "percent_of_flights": 0.10027347046489979 }, { "nickname": "Northwest", "flight_count": 33580, "percent_of_flights": 0.09738216554968143 }, { "nickname": "United", "flight_count": 32757, "percent_of_flights": 0.09499546149228454 } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set=1 THEN carriers_0."nickname" END as "nickname__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1", (CASE WHEN group_set=1 THEN COUNT(1) END)*1.0/MAX((CASE WHEN group_set=0 THEN COUNT(1) END)) OVER () as "percent_of_flights__1" FROM '../data/flights.parquet' as base LEFT JOIN '../data/carriers.parquet' AS carriers_0 ON base."carrier"=carriers_0."code" CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2 ) SELECT "nickname__1" as "nickname", MAX(CASE WHEN group_set=1 THEN "flight_count__1" END) as "flight_count", MAX(CASE WHEN group_set=1 THEN "percent_of_flights__1" END) as "percent_of_flights" FROM __stage0 WHERE group_set NOT IN (0) GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
All of a particular grouping
The all()
function can optionally take the names of output columns to show all of a particular value. You can see that all of Southwests fights is still 88,751. The output column name for carriers.nickname
is nickname
so we use that in the calculation. The exclude()
function lets you eliminate a dimension from grouping.
run: flights -> { group_by: carriers.nickname destination origin aggregate: flight_count flights_by_this_carrier is all(flight_count, nickname) flights_to_this_destination is all(flight_count, destination) flights_by_this_origin is all(flight_count, origin) flights_on_this_route is exclude(flight_count, nickname) limit: 20 }
[ { "nickname": "Delta", "destination": "LGA", "origin": "DCA", "flight_count": 1903, "flights_by_this_carrier": 32130, "flights_to_this_destination": 7625, "flights_by_this_origin": 6678, "flights_on_this_route": 2143 }, { "nickname": "Delta", "destination": "DCA", "origin": "LGA", "flight_count": 1901, "flights_by_this_carrier": 32130, "flights_to_this_destination": 6695, "flights_by_this_origin": 7623, "flights_on_this_route": 2123 }, { "nickname": "Delta", "destination": "BOS", "origin": "LGA", "flight_count": 1033, "flights_by_this_carrier": 32130, "flights_to_this_destination": 5799, "flights_by_this_origin": 7623, "flights_on_this_route": 1214 }, { "nickname": "Delta", "destination": "LGA", "origin": "BOS", "flight_count": 1031, "flights_by_this_carrier": 32130, "flights_to_this_destination": 7625, "flights_by_this_origin": 5797, "flights_on_this_route": 1201 }, { "nickname": "Southwest", "destination": "OAK", "origin": "LAX", "flight_count": 821, "flights_by_this_carrier": 88751, "flights_to_this_destination": 5082, "flights_by_this_origin": 11077, "flights_on_this_route": 868 }, { "nickname": "Atlantic Southeast", "destination": "PFN", "origin": "ATL", "flight_count": 787, "flights_by_this_carrier": 15769, "flights_to_this_destination": 789, "flights_by_this_origin": 17875, "flights_on_this_route": 787 }, { "nickname": "Atlantic Southeast", "destination": "ATL", "origin": "PFN", "flight_count": 776, "flights_by_this_carrier": 15769, "flights_to_this_destination": 17832, "flights_by_this_origin": 778, "flights_on_this_route": 776 }, { "nickname": "Southwest", "destination": "LAX", "origin": "OAK", "flight_count": 745, "flights_by_this_carrier": 88751, "flights_to_this_destination": 11074, "flights_by_this_origin": 5076, "flights_on_this_route": 792 }, { "nickname": "Atlantic Southeast", "destination": "ATL", "origin": "AGS", "flight_count": 716, "flights_by_this_carrier": 15769, "flights_to_this_destination": 17832, "flights_by_this_origin": 756, "flights_on_this_route": 717 }, { "nickname": "Atlantic Southeast", "destination": "AGS", "origin": "ATL", "flight_count": 696, "flights_by_this_carrier": 15769, "flights_to_this_destination": 737, "flights_by_this_origin": 17875, "flights_on_this_route": 697 }, { "nickname": "Atlantic Southeast", "destination": "CHA", "origin": "ATL", "flight_count": 670, "flights_by_this_carrier": 15769, "flights_to_this_destination": 696, "flights_by_this_origin": 17875, "flights_on_this_route": 670 }, { "nickname": "Atlantic Southeast", "destination": "ATL", "origin": "CHA", "flight_count": 670, "flights_by_this_carrier": 15769, "flights_to_this_destination": 17832, "flights_by_this_origin": 697, "flights_on_this_route": 670 }, { "nickname": "Southwest", "destination": "LAX", "origin": "PHX", "flight_count": 661, "flights_by_this_carrier": 88751, "flights_to_this_destination": 11074, "flights_by_this_origin": 12476, "flights_on_this_route": 819 }, { "nickname": "Atlantic Southeast", "destination": "AVL", "origin": "ATL", "flight_count": 657, "flights_by_this_carrier": 15769, "flights_to_this_destination": 741, "flights_by_this_origin": 17875, "flights_on_this_route": 657 }, { "nickname": "Atlantic Southeast", "destination": "ATL", "origin": "CSG", "flight_count": 654, "flights_by_this_carrier": 15769, "flights_to_this_destination": 17832, "flights_by_this_origin": 654, "flights_on_this_route": 654 }, { "nickname": "Atlantic Southeast", "destination": "CSG", "origin": "ATL", "flight_count": 651, "flights_by_this_carrier": 15769, "flights_to_this_destination": 651, "flights_by_this_origin": 17875, "flights_on_this_route": 651 }, { "nickname": "Atlantic Southeast", "destination": "ATL", "origin": "AVL", "flight_count": 643, "flights_by_this_carrier": 15769, "flights_to_this_destination": 17832, "flights_by_this_origin": 727, "flights_on_this_route": 643 }, { "nickname": "Southwest", "destination": "PHX", "origin": "LAS", "flight_count": 641, "flights_by_this_carrier": 88751, "flights_to_this_destination": 12477, "flights_by_this_origin": 11096, "flights_on_this_route": 777 }, { "nickname": "Southwest", "destination": "LAS", "origin": "PHX", "flight_count": 637, "flights_by_this_carrier": 88751, "flights_to_this_destination": 11092, "flights_by_this_origin": 12476, "flights_on_this_route": 788 }, { "nickname": "Southwest", "destination": "LAS", "origin": "LAX", "flight_count": 556, "flights_by_this_carrier": 88751, "flights_to_this_destination": 11092, "flights_by_this_origin": 11077, "flights_on_this_route": 1073 } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (5,1) THEN carriers_0."nickname" END as "nickname__5", CASE WHEN group_set IN (5,2,4) THEN base."destination" END as "destination__5", CASE WHEN group_set IN (5,3,4) THEN base."origin" END as "origin__5", CASE WHEN group_set=5 THEN COUNT(1) END as "flight_count__5", MAX((CASE WHEN group_set=1 THEN COUNT(1) END)) OVER (PARTITION BY CASE WHEN group_set IN (5,1) THEN carriers_0."nickname" END) as "flights_by_this_carrier__5", MAX((CASE WHEN group_set=2 THEN COUNT(1) END)) OVER (PARTITION BY CASE WHEN group_set IN (5,2,4) THEN base."destination" END) as "flights_to_this_destination__5", MAX((CASE WHEN group_set=3 THEN COUNT(1) END)) OVER (PARTITION BY CASE WHEN group_set IN (5,3,4) THEN base."origin" END) as "flights_by_this_origin__5", MAX((CASE WHEN group_set=4 THEN COUNT(1) END)) OVER (PARTITION BY CASE WHEN group_set IN (5,2,4) THEN base."destination" END, CASE WHEN group_set IN (5,3,4) THEN base."origin" END) as "flights_on_this_route__5" FROM '../data/flights.parquet' as base LEFT JOIN '../data/carriers.parquet' AS carriers_0 ON base."carrier"=carriers_0."code" CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,5,1)) as group_set ) as group_set GROUP BY 1,2,3,4 ) SELECT "nickname__5" as "nickname", "destination__5" as "destination", "origin__5" as "origin", MAX(CASE WHEN group_set=5 THEN "flight_count__5" END) as "flight_count", MAX(CASE WHEN group_set=5 THEN "flights_by_this_carrier__5" END) as "flights_by_this_carrier", MAX(CASE WHEN group_set=5 THEN "flights_to_this_destination__5" END) as "flights_to_this_destination", MAX(CASE WHEN group_set=5 THEN "flights_by_this_origin__5" END) as "flights_by_this_origin", MAX(CASE WHEN group_set=5 THEN "flights_on_this_route__5" END) as "flights_on_this_route" FROM __stage0 WHERE group_set NOT IN (0,1,2,3,4) GROUP BY 1,2,3 ORDER BY 4 desc NULLS LAST LIMIT 20
As Percentages
Displaying results as percentages is often gives clues as to how numbers relate. Is this number a large or small percentage of the group? Level of detail calculations are great for this. In Malloy, identifiers enclosed in back-ticks can have spaces.
run: flights -> { group_by: carriers.nickname destination origin aggregate: flight_count # percent `carrier as a percent of all flights` is all(flight_count, nickname) / all(flight_count) # percent `destination as a percent of all flights` is all(flight_count, destination) / all(flight_count) # percent `origin as a percent of all flights` is all(flight_count, origin) / all(flight_count) # percent `carriers as a percentage of route` is flight_count / exclude(flight_count, nickname) }