All expressions in Malloy have an "evaluation space" which can be one of four values: literal, constant, input, and output.
Functions may have constraints on what evaluation space particular arguments can be. For example:
In
avg_moving(expr, preceding)
,preceding
must be a literalnumber
.In
lag(expr)
,expr
must be an output value.In
avg(expr)
,expr
must be an input value.In
lag(expr, offset, default)
,default
must be a constant value.
Literals
Literal expressions are any literal value that you can write in Malloy. These include literals of type number
, boolean
, string
, date
, timestamp
, and even regular expressions. The following are all literals: 12
, true
, 'hello world'
, @2003
, @2011-11-11 11:11:11
, r'.*'
.
Some functions require that particular arguments be literals. For example, avg_moving(expr, preceding)
required that preceding
be a literal number
.
run: flights -> { group_by: carrier aggregate: flight_count // Second argument must be a literal calculate: rolling_avg is avg_moving(flight_count, 3) }
[ { "carrier": "WN", "flight_count": 88751, "rolling_avg": 88751 }, { "carrier": "US", "flight_count": 37683, "rolling_avg": 63217 }, { "carrier": "AA", "flight_count": 34577, "rolling_avg": 53670.333333333336 }, { "carrier": "NW", "flight_count": 33580, "rolling_avg": 48647.75 }, { "carrier": "UA", "flight_count": 32757, "rolling_avg": 34649.25 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", AVG((COUNT(1))) OVER( ORDER BY COUNT(1) desc NULLS LAST ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING) as "rolling_avg" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Constants
Constant expressions are produced by operating on literals, e.g. 1 + 1
etc.
Some functions require that particular arguments be constants. For example, the default
argument to lag
must be constant:
run: flights -> { group_by: carrier aggregate: flight_count // Third argument must be a constant (or literal) calculate: prev_carrier1 is lag(carrier, 1, concat('NO', ' ', 'VALUE')) calculate: prev_carrier2 is lag(carrier, 1, 'NO VALUE') }
[ { "carrier": "WN", "flight_count": 88751, "prev_carrier1": "NO VALUE", "prev_carrier2": "NO VALUE" }, { "carrier": "US", "flight_count": 37683, "prev_carrier1": "WN", "prev_carrier2": "WN" }, { "carrier": "AA", "flight_count": 34577, "prev_carrier1": "US", "prev_carrier2": "US" }, { "carrier": "NW", "flight_count": 33580, "prev_carrier1": "AA", "prev_carrier2": "AA" }, { "carrier": "UA", "flight_count": 32757, "prev_carrier1": "NW", "prev_carrier2": "NW" } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", LAG((base."carrier"),1,CONCAT('NO',' ','VALUE')) OVER( ORDER BY COUNT(1) desc NULLS LAST ) as "prev_carrier1", LAG((base."carrier"),1,'NO VALUE') OVER( ORDER BY COUNT(1) desc NULLS LAST ) as "prev_carrier2" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Currently, when constants are saved as dimensions, they become input fields, even though they are known to be constants. This may change in the future.
Inputs
Input expressions are those that reference columns in a table or dimensions defined in a source. They represent data that exists inside the source table or that can be computed directly from a particular row of data.
run: flights -> { group_by: carrier }
[ { "carrier": "AA" }, { "carrier": "AS" }, { "carrier": "B6" }, { "carrier": "CO" }, { "carrier": "DL" } ]
SELECT base."carrier" as "carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
In the above query, carrier
is an input expression (in particular, an input field).
Outputs
Output expressions are those which reference columns in the output of a query or aggregate values. They represent values which can be produced from the source table. Dimensions are only output values when they are included in the query. Aggregate values are always output values, because they don't need to be included in the result table to be known.
Dimensional Outputs
Any time a field is included in a query, it creates an output field with the same name:
run: flights -> { group_by: carrier new_field is 1 aggregate: flight_count }
[ { "carrier": "WN", "new_field": 1, "flight_count": 88751 }, { "carrier": "US", "new_field": 1, "flight_count": 37683 }, { "carrier": "AA", "new_field": 1, "flight_count": 34577 }, { "carrier": "NW", "new_field": 1, "flight_count": 33580 }, { "carrier": "UA", "new_field": 1, "flight_count": 32757 } ]
SELECT base."carrier" as "carrier", 1 as "new_field", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1,2 ORDER BY 3 desc NULLS LAST
The above query defines carrier
, new_field
, and flight_count
as output fields of the query.
When inside a calculate:
statement, field references will by default use the output field with that name. There is often an input field with the same name, and in that case the output field will take precedence. In the following query, group_by: carrier
creates an output field called carrier
. In lag(carrier)
, carrier
refers to that output field (rather than the input field carrier
defined in flights
) because it appears inside a calculate:
statement.
run: flights -> { group_by: carrier calculate: prev_carrier is lag(carrier) }
[ { "carrier": "AA", "prev_carrier": null }, { "carrier": "AS", "prev_carrier": "AA" }, { "carrier": "B6", "prev_carrier": "AS" }, { "carrier": "CO", "prev_carrier": "B6" }, { "carrier": "DL", "prev_carrier": "CO" } ]
SELECT base."carrier" as "carrier", LAG((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
In this next query, group_by: output_carrier is carrier
creates an output field called output_carrier
, which is referenced in the subsequent line.
run: flights -> { group_by: output_carrier is carrier calculate: prev_carrier is lag(output_carrier) }
[ { "output_carrier": "AA", "prev_carrier": null }, { "output_carrier": "AS", "prev_carrier": "AA" }, { "output_carrier": "B6", "prev_carrier": "AS" }, { "output_carrier": "CO", "prev_carrier": "B6" }, { "output_carrier": "DL", "prev_carrier": "CO" } ]
SELECT base."carrier" as "output_carrier", LAG((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
Here, calculate: prev_carrier is lag(carrier)
would result in an error, because carrier
is an input expression, and lag
requires that its first argument be an output expression.
Aggregate Outputs
An aggregate value is always considered to be an output expression, even when it is a direct reference to a measure
.
run: flights -> { group_by: carrier calculate: prev_carrier_flight_count is lag(flight_count) }
[ { "carrier": "AA", "prev_carrier_flight_count": null }, { "carrier": "AS", "prev_carrier_flight_count": 34577 }, { "carrier": "B6", "prev_carrier_flight_count": 8453 }, { "carrier": "CO", "prev_carrier_flight_count": 4842 }, { "carrier": "DL", "prev_carrier_flight_count": 7139 } ]
SELECT base."carrier" as "carrier", LAG((COUNT(1))) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST