Malloy Documentation
search

Malloy provides a way to compute percent of total through level of detail (ungrouped aggregates) functions. The functions all() and exclude() escape grouping in aggregate calculations. These functions are different than window functions as they operate inline with the query and can produce correct results even when the data hits a limit or is fanned out. Use cases below.

document
source: flights is duckdb.table('../data/flights.parquet') extend {
  join_one: carriers is duckdb.table('../data/carriers.parquet') on carrier = carriers.code
  measure: flight_count is count()
}

Totals

Using all(), you can easily produce an aggregate calculation that includes all the data, not just the data on the current row. Southwest + USAir = 126,434 flights. Notice that all_flights is the total of all the flights accessible in the query.

document
run: flights -> {
  group_by: carriers.nickname
  aggregate: 
    flight_count
    all_flights is all(flight_count)
    limit: 2
}
QUERY RESULTS
[
  {
    "nickname": "Southwest",
    "flight_count": 88751,
    "all_flights": 344827
  },
  {
    "nickname": "USAir",
    "flight_count": 37683,
    "all_flights": 344827
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE WHEN group_set=1 THEN
      carriers_0."nickname"
      END as "nickname__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1",
    MAX((CASE WHEN group_set=0 THEN
      COUNT(1)
      END)) OVER () as "all_flights__1"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/carriers.parquet' AS carriers_0
    ON base."carrier"=carriers_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2
)
SELECT
  "nickname__1" as "nickname",
  MAX(CASE WHEN group_set=1 THEN "flight_count__1" END) as "flight_count",
  MAX(CASE WHEN group_set=1 THEN "all_flights__1" END) as "all_flights"
FROM __stage0
WHERE group_set NOT IN (0)
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 2

Percent of Total

The all() function is useful for percent of total calculations. The # percent tags the result so it is displayed as a percentage.

document
run: flights -> {
  group_by: carriers.nickname
  aggregate: 
    flight_count
    # percent
    percent_of_flights is flight_count / all(flight_count)
    limit: 5
}
QUERY RESULTS
[
  {
    "nickname": "Southwest",
    "flight_count": 88751,
    "percent_of_flights": 0.2573783375431738
  },
  {
    "nickname": "USAir",
    "flight_count": 37683,
    "percent_of_flights": 0.10928088577750582
  },
  {
    "nickname": "American",
    "flight_count": 34577,
    "percent_of_flights": 0.10027347046489979
  },
  {
    "nickname": "Northwest",
    "flight_count": 33580,
    "percent_of_flights": 0.09738216554968143
  },
  {
    "nickname": "United",
    "flight_count": 32757,
    "percent_of_flights": 0.09499546149228454
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE WHEN group_set=1 THEN
      carriers_0."nickname"
      END as "nickname__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1",
    (CASE WHEN group_set=1 THEN
      COUNT(1)
      END)*1.0/MAX((CASE WHEN group_set=0 THEN
      COUNT(1)
      END)) OVER () as "percent_of_flights__1"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/carriers.parquet' AS carriers_0
    ON base."carrier"=carriers_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2
)
SELECT
  "nickname__1" as "nickname",
  MAX(CASE WHEN group_set=1 THEN "flight_count__1" END) as "flight_count",
  MAX(CASE WHEN group_set=1 THEN "percent_of_flights__1" END) as "percent_of_flights"
FROM __stage0
WHERE group_set NOT IN (0)
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

All of a particular grouping

The all() function can optionally take the names of output columns to show all of a particular value. You can see that all of Southwests fights is still 88,751. The output column name for carriers.nickname is nickname so we use that in the calculation. The exclude() function lets you eliminate a dimension from grouping.

document
run: flights -> {
  group_by:
    carriers.nickname
    destination
    origin
  aggregate: 
    flight_count
    flights_by_this_carrier is all(flight_count, nickname)
    flights_to_this_destination is all(flight_count, destination)
    flights_by_this_origin is all(flight_count, origin)
    flights_on_this_route is exclude(flight_count, nickname)
  limit: 20
}
QUERY RESULTS
[
  {
    "nickname": "Delta",
    "destination": "LGA",
    "origin": "DCA",
    "flight_count": 1903,
    "flights_by_this_carrier": 32130,
    "flights_to_this_destination": 7625,
    "flights_by_this_origin": 6678,
    "flights_on_this_route": 2143
  },
  {
    "nickname": "Delta",
    "destination": "DCA",
    "origin": "LGA",
    "flight_count": 1901,
    "flights_by_this_carrier": 32130,
    "flights_to_this_destination": 6695,
    "flights_by_this_origin": 7623,
    "flights_on_this_route": 2123
  },
  {
    "nickname": "Delta",
    "destination": "BOS",
    "origin": "LGA",
    "flight_count": 1033,
    "flights_by_this_carrier": 32130,
    "flights_to_this_destination": 5799,
    "flights_by_this_origin": 7623,
    "flights_on_this_route": 1214
  },
  {
    "nickname": "Delta",
    "destination": "LGA",
    "origin": "BOS",
    "flight_count": 1031,
    "flights_by_this_carrier": 32130,
    "flights_to_this_destination": 7625,
    "flights_by_this_origin": 5797,
    "flights_on_this_route": 1201
  },
  {
    "nickname": "Southwest",
    "destination": "OAK",
    "origin": "LAX",
    "flight_count": 821,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 5082,
    "flights_by_this_origin": 11077,
    "flights_on_this_route": 868
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "PFN",
    "origin": "ATL",
    "flight_count": 787,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 789,
    "flights_by_this_origin": 17875,
    "flights_on_this_route": 787
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "ATL",
    "origin": "PFN",
    "flight_count": 776,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 17832,
    "flights_by_this_origin": 778,
    "flights_on_this_route": 776
  },
  {
    "nickname": "Southwest",
    "destination": "LAX",
    "origin": "OAK",
    "flight_count": 745,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 11074,
    "flights_by_this_origin": 5076,
    "flights_on_this_route": 792
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "ATL",
    "origin": "AGS",
    "flight_count": 716,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 17832,
    "flights_by_this_origin": 756,
    "flights_on_this_route": 717
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "AGS",
    "origin": "ATL",
    "flight_count": 696,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 737,
    "flights_by_this_origin": 17875,
    "flights_on_this_route": 697
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "CHA",
    "origin": "ATL",
    "flight_count": 670,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 696,
    "flights_by_this_origin": 17875,
    "flights_on_this_route": 670
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "ATL",
    "origin": "CHA",
    "flight_count": 670,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 17832,
    "flights_by_this_origin": 697,
    "flights_on_this_route": 670
  },
  {
    "nickname": "Southwest",
    "destination": "LAX",
    "origin": "PHX",
    "flight_count": 661,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 11074,
    "flights_by_this_origin": 12476,
    "flights_on_this_route": 819
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "AVL",
    "origin": "ATL",
    "flight_count": 657,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 741,
    "flights_by_this_origin": 17875,
    "flights_on_this_route": 657
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "ATL",
    "origin": "CSG",
    "flight_count": 654,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 17832,
    "flights_by_this_origin": 654,
    "flights_on_this_route": 654
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "CSG",
    "origin": "ATL",
    "flight_count": 651,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 651,
    "flights_by_this_origin": 17875,
    "flights_on_this_route": 651
  },
  {
    "nickname": "Atlantic Southeast",
    "destination": "ATL",
    "origin": "AVL",
    "flight_count": 643,
    "flights_by_this_carrier": 15769,
    "flights_to_this_destination": 17832,
    "flights_by_this_origin": 727,
    "flights_on_this_route": 643
  },
  {
    "nickname": "Southwest",
    "destination": "PHX",
    "origin": "LAS",
    "flight_count": 641,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 12477,
    "flights_by_this_origin": 11096,
    "flights_on_this_route": 777
  },
  {
    "nickname": "Southwest",
    "destination": "LAS",
    "origin": "PHX",
    "flight_count": 637,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 11092,
    "flights_by_this_origin": 12476,
    "flights_on_this_route": 788
  },
  {
    "nickname": "Southwest",
    "destination": "LAS",
    "origin": "LAX",
    "flight_count": 556,
    "flights_by_this_carrier": 88751,
    "flights_to_this_destination": 11092,
    "flights_by_this_origin": 11077,
    "flights_on_this_route": 1073
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE WHEN group_set IN (5,1) THEN
      carriers_0."nickname"
      END as "nickname__5",
    CASE WHEN group_set IN (5,2,4) THEN
      base."destination"
      END as "destination__5",
    CASE WHEN group_set IN (5,3,4) THEN
      base."origin"
      END as "origin__5",
    CASE WHEN group_set=5 THEN
      COUNT(1)
      END as "flight_count__5",
    MAX((CASE WHEN group_set=1 THEN
      COUNT(1)
      END)) OVER (PARTITION BY CASE WHEN group_set IN (5,1) THEN
      carriers_0."nickname"
      END) as "flights_by_this_carrier__5",
    MAX((CASE WHEN group_set=2 THEN
      COUNT(1)
      END)) OVER (PARTITION BY CASE WHEN group_set IN (5,2,4) THEN
      base."destination"
      END) as "flights_to_this_destination__5",
    MAX((CASE WHEN group_set=3 THEN
      COUNT(1)
      END)) OVER (PARTITION BY CASE WHEN group_set IN (5,3,4) THEN
      base."origin"
      END) as "flights_by_this_origin__5",
    MAX((CASE WHEN group_set=4 THEN
      COUNT(1)
      END)) OVER (PARTITION BY CASE WHEN group_set IN (5,2,4) THEN
      base."destination"
      END, CASE WHEN group_set IN (5,3,4) THEN
      base."origin"
      END) as "flights_on_this_route__5"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/carriers.parquet' AS carriers_0
    ON base."carrier"=carriers_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,5,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
SELECT
  "nickname__5" as "nickname",
  "destination__5" as "destination",
  "origin__5" as "origin",
  MAX(CASE WHEN group_set=5 THEN "flight_count__5" END) as "flight_count",
  MAX(CASE WHEN group_set=5 THEN "flights_by_this_carrier__5" END) as "flights_by_this_carrier",
  MAX(CASE WHEN group_set=5 THEN "flights_to_this_destination__5" END) as "flights_to_this_destination",
  MAX(CASE WHEN group_set=5 THEN "flights_by_this_origin__5" END) as "flights_by_this_origin",
  MAX(CASE WHEN group_set=5 THEN "flights_on_this_route__5" END) as "flights_on_this_route"
FROM __stage0
WHERE group_set NOT IN (0,1,2,3,4)
GROUP BY 1,2,3
ORDER BY 4 desc NULLS LAST
LIMIT 20

As Percentages

Displaying results as percentages is often gives clues as to how numbers relate. Is this number a large or small percentage of the group? Level of detail calculations are great for this. In Malloy, identifiers enclosed in back-ticks can have spaces.

document
run: flights -> {
  group_by:
    carriers.nickname
    destination
    origin
  aggregate: 
    flight_count
    # percent
    `carrier as a percent of all flights` is all(flight_count, nickname) / all(flight_count)
    # percent
    `destination as a percent of all flights` is all(flight_count, destination) / all(flight_count)
    # percent
    `origin as a percent of all flights` is all(flight_count, origin) / all(flight_count)
    # percent
    `carriers as a percentage of route` is flight_count / exclude(flight_count, nickname)
}
QUERY RESULTS