Malloy Documentation
search

Window functions in Malloy are expressed in calculation fields, using the calculate: keyword. Calculation fields operate on the results of a aggregation or selection, while still operating within the same query stage. Logically these calculation operations occur "after" the other operations, so their exact semantics can be challenging to understand. For a full list of the window functions that Malloy supports, visit our function reference documentation.

Here is a simple example:

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER(  ORDER BY  DATE_TRUNC('year', base."dep_time") asc NULLS LAST )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In this query, we compute flight_count as an aggregate, and then compute the year-over-year change, year_change in the calculate: clause, using the lag window function.

Ordering

Some window functions in SQL, such as lag, require an ORDER BY clause to determine behavior. In Malloy, such analytic functions use the ordering of the query itself. For example:

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "dep_year",
   COUNT(1) as "flight_count",
   (COUNT(1))-(LAG((COUNT(1))) OVER(  ORDER BY  DATE_TRUNC('year', base."dep_time") asc NULLS LAST )) as "year_change"
FROM '../data/flights.parquet' as base
WHERE base."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

lag(flight_count) for each row is calculated with respect to dep_year asc. The value of lag(flight_count) for the year 2001 is the value of flight_count for the year 2000.

Note that Malloy queries use a default ordering if none is explicitly specified:

  1. If a time dimension is specified in the query, default ordering is by "descending time", showing newest rows first

  2. If no time is specified, but there is a numeric measure, default ordering is by "measure values descending", showing largest values first

Ordering can also be changed by adding an order_by property on the calculation. This allows you to control the ordering of the input to the window function independently from the ordering of the final results.

document
##! experimental
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  calculate: year_change is flight_count - lag(flight_count) {
    order_by: dep_year asc
  }
  order_by: dep_year desc
}
QUERY RESULTS