Malloy Documentation
search

All expressions in Malloy have an "evaluation space" which can be one of four values: literal, constant, input, and output.

Functions may have constraints on what evaluation space particular arguments can be. For example:

  • In avg_moving(expr, preceding), preceding must be a literal number.

  • In lag(expr), expr must be an output value.

  • In avg(expr), expr must be an input value.

  • In lag(expr, offset, default), default must be a constant value.

Literals

Literal expressions are any literal value that you can write in Malloy. These include literals of type number, boolean, string, date, timestamp, and even regular expressions. The following are all literals: 12, true, 'hello world', @2003, @2011-11-11 11:11:11, r'.*'.

Some functions require that particular arguments be literals. For example, avg_moving(expr, preceding) required that preceding be a literal number.

document
run: flights -> {
  group_by: carrier
  aggregate: flight_count
  // Second argument must be a literal
  calculate: rolling_avg is avg_moving(flight_count, 3)
}
QUERY RESULTS
[
  {
    "carrier": "WN",
    "flight_count": 88751,
    "rolling_avg": 88751
  },
  {
    "carrier": "US",
    "flight_count": 37683,
    "rolling_avg": 63217
  },
  {
    "carrier": "AA",
    "flight_count": 34577,
    "rolling_avg": 53670.333333333336
  },
  {
    "carrier": "NW",
    "flight_count": 33580,
    "rolling_avg": 48647.75
  },
  {
    "carrier": "UA",
    "flight_count": 32757,
    "rolling_avg": 34649.25
  }
]
SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count",
   AVG((COUNT(1))) OVER(  ORDER BY  COUNT(1) desc NULLS LAST ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING) as "rolling_avg"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Constants

Constant expressions are produced by operating on literals, e.g. 1 + 1 etc.

Some functions require that particular arguments be constants. For example, the default argument to lag must be constant:

document
run: flights -> {
  group_by: carrier
  aggregate: flight_count
  // Third argument must be a constant (or literal)
  calculate: prev_carrier1 is lag(carrier, 1, concat('NO', ' ', 'VALUE'))
  calculate: prev_carrier2 is lag(carrier, 1, 'NO VALUE')
}
QUERY RESULTS
[
  {
    "carrier": "WN",
    "flight_count": 88751,
    "prev_carrier1": "NO VALUE",
    "prev_carrier2": "NO VALUE"
  },
  {
    "carrier": "US",
    "flight_count": 37683,
    "prev_carrier1": "WN",
    "prev_carrier2": "WN"
  },
  {
    "carrier": "AA",
    "flight_count": 34577,
    "prev_carrier1": "US",
    "prev_carrier2": "US"
  },
  {
    "carrier": "NW",
    "flight_count": 33580,
    "prev_carrier1": "AA",
    "prev_carrier2": "AA"
  },
  {
    "carrier": "UA",
    "flight_count": 32757,
    "prev_carrier1": "NW",
    "prev_carrier2": "NW"
  }
]
SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count",
   LAG((base."carrier"),1,CONCAT('NO',' ','VALUE')) OVER(  ORDER BY  COUNT(1) desc NULLS LAST ) as "prev_carrier1",
   LAG((base."carrier"),1,'NO VALUE') OVER(  ORDER BY  COUNT(1) desc NULLS LAST ) as "prev_carrier2"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Currently, when constants are saved as dimensions, they become input fields, even though they are known to be constants. This may change in the future.

Inputs

Input expressions are those that reference columns in a table or dimensions defined in a source. They represent data that exists inside the source table or that can be computed directly from a particular row of data.

document
run: flights -> {
  group_by: carrier
}
QUERY RESULTS
[
  {
    "carrier": "AA"
  },
  {
    "carrier": "AS"
  },
  {
    "carrier": "B6"
  },
  {
    "carrier": "CO"
  },
  {
    "carrier": "DL"
  }
]
SELECT 
   base."carrier" as "carrier"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In the above query, carrier is an input expression (in particular, an input field).

Outputs

Output expressions are those which reference columns in the output of a query or aggregate values. They represent values which can be produced from the source table. Dimensions are only output values when they are included in the query. Aggregate values are always output values, because they don't need to be included in the result table to be known.

Dimensional Outputs

Any time a field is included in a query, it creates an output field with the same name:

document
run: flights -> {
  group_by: 
    carrier
    new_field is 1
  aggregate: flight_count
}
QUERY RESULTS
[
  {
    "carrier": "WN",
    "new_field": 1,
    "flight_count": 88751
  },
  {
    "carrier": "US",
    "new_field": 1,
    "flight_count": 37683
  },
  {
    "carrier": "AA",
    "new_field": 1,
    "flight_count": 34577
  },
  {
    "carrier": "NW",
    "new_field": 1,
    "flight_count": 33580
  },
  {
    "carrier": "UA",
    "new_field": 1,
    "flight_count": 32757
  }
]
SELECT 
   base."carrier" as "carrier",
   1 as "new_field",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1,2
ORDER BY 3 desc NULLS LAST

The above query defines carrier, new_field, and flight_count as output fields of the query.

When inside a calculate: statement, field references will by default use the output field with that name. There is often an input field with the same name, and in that case the output field will take precedence. In the following query, group_by: carrier creates an output field called carrier. In lag(carrier), carrier refers to that output field (rather than the input field carrier defined in flights) because it appears inside a calculate: statement.

document
run: flights -> {
  group_by: carrier
  calculate: prev_carrier is lag(carrier)
}
QUERY RESULTS
[
  {
    "carrier": "AA",
    "prev_carrier": null
  },
  {
    "carrier": "AS",
    "prev_carrier": "AA"
  },
  {
    "carrier": "B6",
    "prev_carrier": "AS"
  },
  {
    "carrier": "CO",
    "prev_carrier": "B6"
  },
  {
    "carrier": "DL",
    "prev_carrier": "CO"
  }
]
SELECT 
   base."carrier" as "carrier",
   LAG((base."carrier")) OVER(  ORDER BY  base."carrier" asc NULLS LAST ) as "prev_carrier"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In this next query, group_by: output_carrier is carrier creates an output field called output_carrier, which is referenced in the subsequent line.

document
run: flights -> {
  group_by: output_carrier is carrier
  calculate: prev_carrier is lag(output_carrier)
}
QUERY RESULTS
[
  {
    "output_carrier": "AA",
    "prev_carrier": null
  },
  {
    "output_carrier": "AS",
    "prev_carrier": "AA"
  },
  {
    "output_carrier": "B6",
    "prev_carrier": "AS"
  },
  {
    "output_carrier": "CO",
    "prev_carrier": "B6"
  },
  {
    "output_carrier": "DL",
    "prev_carrier": "CO"
  }
]
SELECT 
   base."carrier" as "output_carrier",
   LAG((base."carrier")) OVER(  ORDER BY  base."carrier" asc NULLS LAST ) as "prev_carrier"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST

Here, calculate: prev_carrier is lag(carrier) would result in an error, because carrier is an input expression, and lag requires that its first argument be an output expression.

Aggregate Outputs

An aggregate value is always considered to be an output expression, even when it is a direct reference to a measure.

document
run: flights -> {
  group_by: carrier
  calculate: prev_carrier_flight_count is lag(flight_count)
}
QUERY RESULTS
[
  {
    "carrier": "AA",
    "prev_carrier_flight_count": null
  },
  {
    "carrier": "AS",
    "prev_carrier_flight_count": 34577
  },
  {
    "carrier": "B6",
    "prev_carrier_flight_count": 8453
  },
  {
    "carrier": "CO",
    "prev_carrier_flight_count": 4842
  },
  {
    "carrier": "DL",
    "prev_carrier_flight_count": 7139
  }
]
SELECT 
   base."carrier" as "carrier",
   LAG((COUNT(1))) OVER(  ORDER BY  base."carrier" asc NULLS LAST ) as "prev_carrier_flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST