Malloy Documentation
search

Expressions in Malloy are much like expressions in any other language; they can have variables and operators and function calls in the same syntax users are familiar with. However, Malloy also introduces several other kinds of expressions useful for the task of data analysis and transformation.

Section Examples
Identifiers
Fields referenced by name
origin.city
`year`
Mathematical operations x * 100
-cost
(a + b) / c
Comparison operators x > 200
state != null
Boolean operators
Logically combine booleans
height > 10 and height < 100
is_cancelled or is_delayed
not is_commercial_flight
Functions floor(10.35)
concat(first_name, ' ', last_name)
cbrt!(x)
timestamp_seconds!timestamp(val)
Aggregation sum(distance)
aircraft.count()
aircraft_models.seats.avg()
Aggregation Control / Subtotals all(sum(distance))
all(aircraft.count(), destination_code)
exclude(aircraft_models.seats.avg(), carrier.nickname)
Filtered expressions avg(age) { where: state = 'CA' }
flight_count { where: origin.county != null }
Type cast
Safe type cast birth_year::number
start_time_string::timestamp
Pick expressions
Malloy's take on CASE statements
pick 'S' when size < 3 else 'L'
kind ? pick 'other' when null
Time ranges
Ranges with start and end or duration
start_time for 3 hours
@2003 for 10 months
@2003 to @2005
Numeric ranges
Numeric ranges with start and end
10 to 20
Time truncation event_time.quarter
now.year
Time extraction
Extract one part of a time value
day_of_year(event_time)
minute(now)
Interval extraction
Extract the interval between two times
days(created_at to shipped_at)
Time literals @2003-04-19
@2020-Q4
@2021-10-24 10:00:00
Partial comparison
Reusable conditions
> 42
!= null
~ r'C.*'
Alternation
Logically combine conditions
> 5 & < 10
'red' | 'blue'
Application
Apply conditions to values
state ? 'CA'
weight ? > 100 & < 1000
Null Operations
Expressions relating to NULL values
eventStatus != null
eventStatus ?? 'unknown'
Operator Precedence a + b * c and d ?? e > f

Identifiers

Fields may be referenced by name, and fields in joins or nested structures can be accessed using a period (.) as in item.item_property

Escaping reserved keywords

If a column name in a table conflicts with a keyword in Malloy, use backquotes to quote the keyword. Back-quoted identifiers can also contain spaces and any special characters the database supports in field names; .s in back-quoted identifiers are not currently supported.

dimension: 
  year_plus_one is `year` + 1
  `name with spaces` is some value
QUERY RESULTS
[
  {
    "county": "WAYNE"
  },
  {
    "county": "KENT"
  },
  {
    "county": "GENESEE"
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    origin_0."state" as "state__0",
    CASE WHEN group_set=1 THEN
      origin_0."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/airports.parquet' AS origin_0
    ON origin_0."code"=base."origin"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE origin_0."county" IS NOT NULL
  GROUP BY 1,2,3
)
, __stage1 AS (
  SELECT
    "state__0" as "state",
    COALESCE(LIST({
      "county": "county__1", 
      "flight_count": "flight_count__1"}  ORDER BY  "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_county"
  FROM __stage0
  GROUP BY 1
)
SELECT 
   by_county_0."county" as "county"
FROM __stage1 as base
LEFT JOIN LATERAL (SELECT UNNEST(base."by_county"), 1 as ignoreme) as by_county_0_outer(by_county_0,ignoreme) ON by_county_0_outer.ignoreme=1
LIMIT 3

Identifiers that share a name with a keyword in Malloy must be enclosed in back ticks `, e.g. `year`.

Mathematical Operators

Typical mathematical operators +, -, *, /, and % (modulo) work as expected, and parentheses may be used to override precedence, e.g. six is 10 * (3 - 2) / 2 + 1.

The unary minus / negation operator is also allowed, e.g. value is -cost.

Logical Operators

Comparison Operators

Standard comparison operators >, <, >=, <=, and = are available in Malloy. "Not equals" is expressed using the != operator.

Boolean Operators

Malloy includes the basic binary boolean operators and and or, as well as the unary not operator.

Functions

Many functions available in SQL are available unchanged in Malloy. Malloy provides some compatibility transformations for known functions. Unknown functions may be called like cbrt!(x) or timestamp_seconds!timestap(value) to specify a return type. For detailed information, as well as a full list of all available functions, see the Functions section.

Aggregation

Aggregations may included in an expression to create measures, e.g. count(), sum(distance), or aircraft_models.seats.avg(). For detailed information, see the Aggregates section.

Filtered Expressions

Aggregate expressions may be filtered, using the usual filter syntax.

document
run: flights -> {
  aggregate:
    distance_2003 is sum(distance) { where: dep_time ? @2003 }
    ca_flights is count() { where: origin.state ? 'CA' }
}
QUERY RESULTS
[
  {
    "distance_2003": 42545813,
    "ca_flights": 40670
  }
]
SELECT 
   COALESCE(SUM(CASE WHEN (base."dep_time">=TIMESTAMP '2003-01-01 00:00:00') and (base."dep_time"<TIMESTAMP '2004-01-01 00:00:00') THEN base."distance" END),0) as "distance_2003",
   COUNT(CASE WHEN origin_0."state"='CA' THEN 1 END) as "ca_flights"
FROM '../data/flights.parquet' as base
 LEFT JOIN '../data/airports.parquet' AS origin_0
  ON origin_0."code"=base."origin"

Type Cast

Type cast may be accomplished with the :: operator.

Type casting may be accomplished with the ::type syntax.

document
run: flights -> {
  aggregate: distance_summary is concat(total_distance::string, ' miles')
}
QUERY RESULTS
[
  {
    "distance_summary": "255337195 miles"
  }
]
SELECT 
   CONCAT(CAST((COALESCE(SUM(base."distance"),0)) AS varchar),' miles') as "distance_summary"
FROM '../data/flights.parquet' as base

Safe Type Cast

Safe Type cast may be accomplished with the ::: operator. A safe type cast will return null if an error occurs during the cast operation. Safe Type Cast is particularly useful in loading data. Not all databases support safe type cast.

Type casting may be accomplished with the :::type syntax.

document
run: flights -> {
  aggregate: distance_summary is concat(total_distance:::string, ' miles')
}
QUERY RESULTS
[
  {
    "distance_summary": "255337195 miles"
  }
]
SELECT 
   CONCAT(TRY_CAST((COALESCE(SUM(base."distance"),0)) AS varchar),' miles') as "distance_summary"
FROM '../data/flights.parquet' as base

Casting to specific database types

Malloy has its own type system. A database engine needs to know for how each piece of data is stored, but Malloy only needs to worry about the underlying type (like string or numeric) of a piece of data. Sometimes, when writing a data transformation, it is important torefelct back to the SQL engine a particular datatype. Quoting the destination type will tell Malloy to use that specific type for the result.

Note: If you cast an expression to an unsupported data type, there is a very limited set of operations in Malloy which can use that value

document
run: flights -> {
    select:
        dest is destination_code::"VARCHAR(3)"
        dist is distance::"smallint"
    limit: 1
}
QUERY RESULTS
[
  {
    "dest": "ABE",
    "dist": 55
  }
]
SELECT 
   CAST(base."destination" AS VARCHAR(3)) as "dest",
   CAST(base."distance" AS smallint) as "dist"
FROM '../data/flights.parquet' as base
LIMIT 1

Pick Expressions

The pick construction in Malloy is similar to CASE statements in SQL.

pick 'small'  when size < 10
pick 'medium' when size < 20
else 'large'

Pick expressions are also compatible with the apply operator and partial comparisons.

size ?
  pick 'small' when < 10
  pick 'medium' when < 20
  else 'large'

Pick can be used to "clean" data, combining similar dirty values into one clean value. In the following example, the pick statement collects all the "this actually shipped" statuses, and because there is no else, leaves the other status values alone.

shipping_status ?
  pick 'shipped' when 'will call' | 'shipped'
  pick 'ignore' when 'bad1' | 'bad2' | 'testing'

Another common kind of cleaning is to have a small set you want to group by and all other values are compressed into null. A pick clause with no value picks an applied value when the condition is met.

status ?
  pick when 'good' | 'ok' | 'fine' // leave these alone
  else null                        // ignore the rest

If you need to match on multiple fields and conditions, use a dimension.

dimension: Facility_Type_State is
  pick 'Cali Heli' when state = 'CA' and fac_type = 'HELIPORT'
  pick 'Seaplane at SeaTac' when state = 'WA' and city = 'SEATTLE' and fac_type = 'SEAPLANE BASE'
  pick 'Phoenix Ultra' when state = 'AZ' and fac_type = 'ULTRALIGHT' and city = 'PHOENIX'
  else 'Not Needed'

Time Expressions

Malloy has built in constructs to simplify many time-related operations, which are described here.

Time Ranges

A time value can be compared to a range. If you apply a time to a range, (for example, event_time ? @2003 to @2004) that will check if the value is within the range. You can also use = to see if a time is with a range, or < to test for before the range it will be <, or > for after the range.

See Time Ranges for more details.

Time Truncation

To truncate a time value to a given timeframe, use the . operator followed by the timeframe, e.g. event_time.quarter or now.year. See Timestamp Truncation for details.

A truncation made this way can also function as a range. The range begins at the moment of truncation and the duration is the timeframe unit used to specify the truncation, so for example time.year would be a range covering the entire year which contains time.

This is extremely useful with the apply operator, ?. To see if two events happen in the same calendar year, for example, the boolean expression in Malloy is one_time ? other_time.year.

Time Extraction

Another very common grouping for time related data is by particular components, and this extraction of a single component as an integer. In Malloy this gesture looks like hour(event_time) or minute(event_time). See Timestamp extraction for more details.

Interval extraction

To measure the difference between two times, pass a range expression to one of the extraction functions.

For more details see Interval Measurement

Time Literals

Time literals are specified in Malloy with the @ character. A literal specified this way has an implied duration which means a literal can act like a range. See Timestamp literals and Date literals for more details.

In addition the @ based literal syntax, Malloy also has one built in time constant, now, representing the current time.

Numeric Ranges

A numeric value can be compared to a range. If you apply a number to a range, (for example, distance ? 10 to 20) that will check if the value is within the range (left inclusive, right exclusive).

Special Filter Expression Syntax

As filtering is an incredibly common operation in data analysis, Malloy has special syntax to make filter expressions succinct and powerful. In addition to regular comparison and boolean operators, Malloy includes partial comparisons, alternation, and application, as described below.

Partial Comparison

Partial comparisons, or "partials" are written with a binary comparison operator followed by a value, e.g. > 42 or != null. These can be thought of as conditions-as-values, or as functions that return a boolean.

Alternation

Conditions can be logically combined with the two alternation operators, & and |. These are different from and and or in that they operate on conditions which return boolean values, rather than boolean values directly.

The union alternation operator | represents the logical union of two conditions. An expression like x | y can be read "if either x or y." For example = 'CA' | = 'NY' represents the condition "is either CA or NY".

The conjunction alternation operator & represents the logical conjunction of two conditions. An expression like x & y can be read "if both x and y." For example, > 5 & < 10 represents the condition "is greater than 5 and less than 10".

Values can be used directly with the alternation operators, in which case the operator is assumed to be =. For example, 'CA' | 'NY' is equivalent to = 'CA' | = 'NY'.

Application

The apply operator ? "applies" a value to another value, condition, or computation. This is most often used with partial comparisons or alternations.

Applying a value to a condition is like filling in the condition with the given value. For example, height ? > 5 & < 10 is equivalent to height > 5 and height < 10.

Applying a value to another value applies a default comparison on the two values:

Left Right Example Meaning
number number size ? 10 size = 10
string string state ? 'CA' state = 'CA'
string regular expression name ? r'Z$' name ~ r'Z$'
boolean boolean is_cool ? true is_cool = true
number numeric range x ? 10 to 20 x >= 10 and x < 20
date or timestamp date or timestamp time ? @2003 time is during 2003

Values can be applied to pick expressions to make them more succinct.

size ?
  pick 'small' when < 10
  pick 'medium' when < 20
  else 'large'

Null Operations

Malloy has the keyword null to represent the null value.

To check to see if a value is null, simply compare it to null, there is no IS NULL or IS NOT NULL operator in Malloy.

dimension: name_not_present is name = null
dimension: name_present is name != null

Malloy also has the operator ??, which in other languages is sometimes called the "Nullish Coalescing" operator, for providing a value to use in case an expression is null.

dimension: name_cleaned is name ?? '(No Name Provided)'

Operator Precedence

From highest to lowest precedence, here are the Malloy operators

Operator Function
{ where: } Aggregate filtering
:: Type cast
- Unary negation
* / % Multiply, Divide, Modulo
+ - Addition, Subtraction
to Range with endpoints
for N XXX Range with duration
& Alternation with and
| Alternation with or
? Apply
not Unary boolean not
and Bolean and
or Bolean or
?? Null coalesce