Expressions in Malloy are much like expressions in any other language; they can have variables and operators and function calls in the same syntax users are familiar with. However, Malloy also introduces several other kinds of expressions useful for the task of data analysis and transformation.
Section | Examples |
---|---|
Identifiers Fields referenced by name |
origin.city `year` |
Mathematical operations | x * 100 -cost (a + b) / c |
Comparison operators | x > 200 state != null |
Boolean operators Logically combine booleans |
height > 10 and height < 100 is_cancelled or is_delayed not is_commercial_flight |
Functions | floor(10.35) concat(first_name, ' ', last_name) cbrt!(x) timestamp_seconds!timestamp(val) |
Aggregation | sum(distance) aircraft.count() aircraft_models.seats.avg() |
Aggregation Control / Subtotals | all(sum(distance)) all(aircraft.count(), destination_code) exclude(aircraft_models.seats.avg(), carrier.nickname) |
Filtered expressions | avg(age) { where: state = 'CA' } flight_count { where: origin.county != null } |
Type cast |
|
Safe type cast | birth_year::number start_time_string::timestamp |
Pick expressions Malloy's take on CASE statements |
pick 'S' when size < 3 else 'L' kind ? pick 'other' when null |
Time ranges Ranges with start and end or duration |
start_time for 3 hours @2003 for 10 months @2003 to @2005 |
Numeric ranges Numeric ranges with start and end |
10 to 20 |
Time truncation | event_time.quarter now.year |
Time extraction Extract one part of a time value |
day_of_year(event_time) minute(now) |
Interval extraction Extract the interval between two times |
days(created_at to shipped_at) |
Time literals | @2003-04-19 @2020-Q4 @2021-10-24 10:00:00 |
Partial comparison Reusable conditions |
> 42 != null ~ r'C.*' |
Alternation Logically combine conditions |
> 5 & < 10 'red' | 'blue' |
Application Apply conditions to values |
state ? 'CA' weight ? > 100 & < 1000 |
Null Operations Expressions relating to NULL values |
eventStatus != null eventStatus ?? 'unknown' |
Operator Precedence | a + b * c and d ?? e > f |
Identifiers
Fields may be referenced by name, and fields in joins or nested structures can be accessed using a period (.
) as in item.item_property
Escaping reserved keywords
If a column name in a table conflicts with a keyword in Malloy, use backquotes to quote the keyword. Back-quoted identifiers can also
contain spaces and any special characters the database supports in field names; .
s in back-quoted identifiers are not currently supported.
dimension: year_plus_one is `year` + 1 `name with spaces` is some value
[ { "county": "WAYNE" }, { "county": "KENT" }, { "county": "GENESEE" } ]
WITH __stage0 AS ( SELECT group_set, origin_0."state" as "state__0", CASE WHEN group_set=1 THEN origin_0."county" END as "county__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1" FROM '../data/flights.parquet' as base LEFT JOIN '../data/airports.parquet' AS origin_0 ON origin_0."code"=base."origin" CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set WHERE origin_0."county" IS NOT NULL GROUP BY 1,2,3 ) , __stage1 AS ( SELECT "state__0" as "state", COALESCE(LIST({ "county": "county__1", "flight_count": "flight_count__1"} ORDER BY "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_county" FROM __stage0 GROUP BY 1 ) SELECT by_county_0."county" as "county" FROM __stage1 as base LEFT JOIN LATERAL (SELECT UNNEST(base."by_county"), 1 as ignoreme) as by_county_0_outer(by_county_0,ignoreme) ON by_county_0_outer.ignoreme=1 LIMIT 3
Identifiers that share a name with a keyword in Malloy must be enclosed in back ticks `
, e.g. `year`
.
Mathematical Operators
Typical mathematical operators +
, -
, *
, /
, and %
(modulo) work as expected, and parentheses may be used to override precedence, e.g. six is 10 * (3 - 2) / 2 + 1
.
The unary minus / negation operator is also allowed, e.g. value is -cost
.
Logical Operators
Comparison Operators
Standard comparison operators >
, <
, >=
, <=
, and =
are available in Malloy. "Not equals" is expressed using the !=
operator.
Boolean Operators
Malloy includes the basic binary boolean operators and
and or
, as well as the unary not
operator.
Functions
Many functions available in SQL are available unchanged in Malloy. Malloy provides some compatibility transformations for known functions. Unknown functions may be called like cbrt!(x)
or timestamp_seconds!timestap(value)
to specify a return type. For detailed information, as well as a full list of all available functions, see the Functions section.
Aggregation
Aggregations may included in an expression to create measures, e.g. count()
, sum(distance)
, or aircraft_models.seats.avg()
. For detailed information, see the Aggregates section.
Filtered Expressions
Aggregate expressions may be filtered, using the usual filter syntax.
run: flights -> { aggregate: distance_2003 is sum(distance) { where: dep_time ? @2003 } ca_flights is count() { where: origin.state ? 'CA' } }
[ { "distance_2003": 42545813, "ca_flights": 40670 } ]
SELECT COALESCE(SUM(CASE WHEN (base."dep_time">=TIMESTAMP '2003-01-01 00:00:00') and (base."dep_time"<TIMESTAMP '2004-01-01 00:00:00') THEN base."distance" END),0) as "distance_2003", COUNT(CASE WHEN origin_0."state"='CA' THEN 1 END) as "ca_flights" FROM '../data/flights.parquet' as base LEFT JOIN '../data/airports.parquet' AS origin_0 ON origin_0."code"=base."origin"
Type Cast
Type cast may be accomplished with the ::
operator.
Type casting may be accomplished with the ::type
syntax.
run: flights -> { aggregate: distance_summary is concat(total_distance::string, ' miles') }
[ { "distance_summary": "255337195 miles" } ]
SELECT CONCAT(CAST((COALESCE(SUM(base."distance"),0)) AS varchar),' miles') as "distance_summary" FROM '../data/flights.parquet' as base
Safe Type Cast
Safe Type cast may be accomplished with the :::
operator. A safe type cast will return null
if an error occurs during the cast operation. Safe Type Cast is particularly useful in loading data. Not all databases support safe type cast.
Type casting may be accomplished with the :::type
syntax.
run: flights -> { aggregate: distance_summary is concat(total_distance:::string, ' miles') }
[ { "distance_summary": "255337195 miles" } ]
SELECT CONCAT(TRY_CAST((COALESCE(SUM(base."distance"),0)) AS varchar),' miles') as "distance_summary" FROM '../data/flights.parquet' as base
Casting to specific database types
Malloy has its own type system. A database engine needs to know for how each piece of data is stored, but Malloy only needs to worry about the underlying type (like string or numeric) of a piece of data. Sometimes, when writing a data transformation, it is important torefelct back to the SQL engine a particular datatype. Quoting the destination type will tell Malloy to use that specific type for the result.
Note: If you cast an expression to an unsupported data type, there is a very limited set of operations in Malloy which can use that value
run: flights -> { select: dest is destination_code::"VARCHAR(3)" dist is distance::"smallint" limit: 1 }
[ { "dest": "ABE", "dist": 55 } ]
SELECT CAST(base."destination" AS VARCHAR(3)) as "dest", CAST(base."distance" AS smallint) as "dist" FROM '../data/flights.parquet' as base LIMIT 1
Pick Expressions
The pick
construction in Malloy is similar to CASE
statements in SQL.
pick 'small' when size < 10 pick 'medium' when size < 20 else 'large'
Pick expressions are also compatible with the apply operator and partial comparisons.
size ? pick 'small' when < 10 pick 'medium' when < 20 else 'large'
Pick can be used to "clean" data, combining similar dirty values into one clean value. In the following example, the pick
statement collects all the "this actually
shipped" statuses, and because there is no else
, leaves the other
status values alone.
shipping_status ? pick 'shipped' when 'will call' | 'shipped' pick 'ignore' when 'bad1' | 'bad2' | 'testing'
Another common kind of cleaning is to have a small set you want to group
by and all other values are compressed into null
. A pick
clause with no value
picks an applied value when the condition is met.
status ? pick when 'good' | 'ok' | 'fine' // leave these alone else null // ignore the rest
If you need to match on multiple fields and conditions, use a dimension.
dimension: Facility_Type_State is pick 'Cali Heli' when state = 'CA' and fac_type = 'HELIPORT' pick 'Seaplane at SeaTac' when state = 'WA' and city = 'SEATTLE' and fac_type = 'SEAPLANE BASE' pick 'Phoenix Ultra' when state = 'AZ' and fac_type = 'ULTRALIGHT' and city = 'PHOENIX' else 'Not Needed'
Time Expressions
Malloy has built in constructs to simplify many time-related operations, which are described here.
Time Ranges
A time value can be compared to a range. If you apply a time to a range, (for example, event_time ? @2003 to @2004
) that will check if the value is within the range. You can also use =
to see if a time is with a range, or <
to test for before the range it will be <
, or >
for after the range.
See Time Ranges for more details.
Time Truncation
To truncate a time value to a given timeframe, use the .
operator followed by the timeframe, e.g. event_time.quarter
or now.year
. See Timestamp Truncation for details.
A truncation made this way can also function as a range. The range begins
at the moment of truncation and the duration is the timeframe unit
used to specify the truncation, so for example time.year
would be a range covering the entire year which contains time
.
This is extremely useful with the apply operator, ?
. To see if two events happen in the same calendar year, for example, the boolean expression in Malloy is one_time ? other_time.year
.
Time Extraction
Another very common grouping for time related data is by particular components, and this extraction of a single component as an integer. In Malloy this gesture looks like hour(event_time)
or minute(event_time)
. See Timestamp extraction for more details.
Interval extraction
To measure the difference between two times, pass a range expression to one of the extraction functions.
For more details see Interval Measurement
Time Literals
Time literals are specified in Malloy with the @
character. A literal
specified this way has an implied duration which means a literal
can act like a range. See Timestamp literals and Date literals for more details.
In addition the @
based literal syntax, Malloy also has one built in time constant, now
, representing the current time.
Numeric Ranges
A numeric value can be compared to a range. If you apply a number to a range, (for example, distance ? 10 to 20
) that will check if the value is within the range (left inclusive, right exclusive).
Special Filter Expression Syntax
As filtering is an incredibly common operation in data analysis, Malloy has special syntax to make filter expressions succinct and powerful. In addition to regular comparison and boolean operators, Malloy includes partial comparisons, alternation, and application, as described below.
Partial Comparison
Partial comparisons, or "partials" are written with a binary comparison operator followed by a value, e.g. > 42
or != null
. These can be thought of as conditions-as-values, or as functions that return a boolean.
Alternation
Conditions can be logically combined with the two alternation operators, &
and |
. These are different from and
and or
in that they operate on conditions which return boolean values, rather than boolean values directly.
The union alternation operator |
represents the logical union of two conditions. An expression like x | y
can be read "if either x
or y
." For example = 'CA' | = 'NY'
represents the condition "is either CA or NY".
The conjunction alternation operator &
represents the logical conjunction of two conditions. An expression like x & y
can be read "if both x
and y
." For example, > 5 & < 10
represents the condition "is greater than 5 and less than 10".
Values can be used directly with the alternation operators, in which case the operator is assumed to be =
. For example, 'CA' | 'NY'
is equivalent to = 'CA' | = 'NY'
.
Application
The apply operator ?
"applies" a value to another value, condition, or computation. This is most often used with partial comparisons or alternations.
Applying a value to a condition is like filling in the condition with the given value. For example, height ? > 5 & < 10
is equivalent to height > 5 and height < 10
.
Applying a value to another value applies a default comparison on the two values:
Left | Right | Example | Meaning |
---|---|---|---|
number |
number |
size ? 10 |
size = 10 |
string |
string |
state ? 'CA' |
state = 'CA' |
string |
regular expression | name ? r'Z$' |
name ~ r'Z$' |
boolean |
boolean |
is_cool ? true |
is_cool = true |
number |
numeric range | x ? 10 to 20 |
x >= 10 and x < 20 |
date or timestamp |
date or timestamp |
time ? @2003 |
time is during 2003 |
Values can be applied to pick expressions to make them more succinct.
size ? pick 'small' when < 10 pick 'medium' when < 20 else 'large'
Null Operations
Malloy has the keyword null
to represent the null value.
To check to see if a value is null, simply compare it to null
, there is no IS NULL
or IS NOT NULL
operator in Malloy.
dimension: name_not_present is name = null dimension: name_present is name != null
Malloy also has the operator ??
, which in other languages is sometimes called the "Nullish Coalescing" operator, for providing a value to use in case an expression is null.
dimension: name_cleaned is name ?? '(No Name Provided)'
Operator Precedence
From highest to lowest precedence, here are the Malloy operators
Operator | Function |
---|---|
{ where: } |
Aggregate filtering |
:: |
Type cast |
- |
Unary negation |
* / % |
Multiply, Divide, Modulo |
+ - |
Addition, Subtraction |
to |
Range with endpoints |
for N XXX |
Range with duration |
& |
Alternation with and |
| |
Alternation with or |
? |
Apply |
not |
Unary boolean not |
and |
Bolean and |
or |
Bolean or |
?? |
Null coalesce |