Malloy Documentation
search

Window functions in Malloy are expressed in calculation fields, using the calculate: keyword. Calculation fields operate on the results of a aggregation or selection, while still operating within the same query stage. Logically these calculation operations occur "after" the other operations, so their exact semantics can be challenging to understand. For a full list of the window functions that Malloy supports, visit our function reference documentation.

Here is a simple example:

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER(  ORDER BY  DATE_TRUNC('year', base."dep_time") asc NULLS LAST )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In this query, we compute flight_count as an aggregate, and then compute the year-over-year change, year_change in the calculate: clause, using the lag window function.

Ordering

Some window functions in SQL, such as lag, require an ORDER BY clause to determine behavior. In Malloy, such analytic functions use the ordering of the query itself. For example:

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER(  ORDER BY  DATE_TRUNC('year', base."dep_time") asc NULLS LAST )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

lag(flight_count) for each row is calculated with respect to dep_year asc. The value of lag(flight_count) for the year 2001 is the value of flight_count for the year 2000.

Note that Malloy queries use a default ordering if none is explicitly specified:

  1. If a time dimension is specified in the query, default ordering is by "descending time", showing newest rows first

  2. If no time is specified, but there is a numeric measure, default ordering is by "measure values descending", showing largest values first

Ordering can also be changed by adding an order_by property on the calculation. This allows you to control the ordering of the input to the window function independently from the ordering of the final results.

document
##! experimental
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  calculate: year_change is flight_count - lag(flight_count) {
    order_by: dep_year asc
  }
  order_by: dep_year desc
}
QUERY RESULTS
[
  {
    "dep_year": "2005-01-01T00:00:00.000Z",
    "flight_count": 17549,
    "year_change": 2909
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER( ORDER BY (DATE_TRUNC('year', base."dep_time")) ASC )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1
ORDER BY 1 desc NULLS LAST

In this example, the window function is calculated in ascending order with respect to dep_year, but the results are output in descending order (most recent year first).

Partitioning

Some window function operations require the specification of partition_by clauses in SQL. For example, calculating year-over-year changes for a specific time span, such as "number of flights in March of this year versus March of last year". In Malloy, these can be specified by adding a partition_by property to the calculation.

document
##! experimental
run: flights -> {
  where: carrier = 'WN'
  group_by:
    dep_year is dep_time.year
    dep_month is month(dep_time)
  aggregate: flight_count

  calculate: year_change is flight_count - lag(flight_count) {
    partition_by: dep_month
    order_by: dep_year
  }

  order_by: dep_year, dep_month
}
QUERY RESULTS
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "dep_month": 1,
    "flight_count": 1114,
    "year_change": null
  },
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "dep_month": 2,
    "flight_count": 1088,
    "year_change": null
  },
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "dep_month": 3,
    "flight_count": 1117,
    "year_change": null
  },
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "dep_month": 4,
    "flight_count": 1097,
    "year_change": null
  },
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "dep_month": 5,
    "flight_count": 1151,
    "year_change": null
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   EXTRACT(month FROM base."dep_time") as "dep_month",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER(PARTITION BY (EXTRACT(month FROM base."dep_time")) ORDER BY (DATE_TRUNC('year', base."dep_time")) )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1,2
ORDER BY 1 ASC NULLS LAST,2 ASC NULLS LAST

This query modifies the window by adding a partition_by clause on the month number, and an order_by on the year. This means each window contains only data for a given month number, and the lag function grabs the value for flight_count of the prior year within that window.

Field References in Calculations

Because calculations operate logically on the output of the grouping/projecting/aggregating operations, field references behave differently inside a calculate: block. In particular, field references refer by default to output names from those operations.

document
run: flights -> {
  group_by: lower_carrier is lower(carrier)
  calculate: prev_carrier is lag(lower_carrier)
}
QUERY RESULTS
[
  {
    "lower_carrier": "aa",
    "prev_carrier": null
  },
  {
    "lower_carrier": "as",
    "prev_carrier": "aa"
  },
  {
    "lower_carrier": "b6",
    "prev_carrier": "as"
  },
  {
    "lower_carrier": "co",
    "prev_carrier": "b6"
  },
  {
    "lower_carrier": "dl",
    "prev_carrier": "co"
  }
]
SELECT 
   LOWER(base."carrier") as "lower_carrier",
   LAG((LOWER(base."carrier"))) OVER(  ORDER BY  LOWER(base."carrier") asc NULLS LAST ) as "prev_carrier"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In a group_by or aggregate, you cannot reference lower_carrier because it is not a field defined inside of flights, but in calculate, you can.

For a detailed explanation of the exact semantics, see the evaluation space documentation.