Malloy has a standard set of functions available at all times. These include scalar functions (e.g. concat
), aggregate functions (e.g. stddev
), and analytic (or "window") functions (e.g. lag
).
These functions are defined in Malloy in order to provide basic typechecking of arguments and return types, as well as some compatibility transformations. Perfect compatibility across all dialects for all functions is not guaranteed, and specific differences are noted on a case-by-case basis.
For quick function lookup, see the alphebetized table of all available functions.
Note: the behavior of functions changed in v0.0.39. For more information, see a description of that change.
Syntax
Built-In Functions
Functions that are "built-in" to Malloy can be called the usual way with the name of the function, an open parenthesis, a comma separated list of Malloy expressions, and a close parenthesis, e.g. concat(upper(first), ' ', upper(last))
.
Raw SQL Functions
Functions that are not built in may be called in the same way, but with an !
in between the function name and argument list: sinh!(x)
. In this case, Malloy assumes the return type of the function is the same as that of the first argument (or number
if there are no arguments). When this is not correct, the return type can be specified after the !
, e.g. hash!number(username)
.
Function Documentation
Section |
---|
String Functions |
Numeric Functions |
Interval Functions |
Date and Timestamp Functions |
Other Functions |
Aggregate Functions |
Ungrouped Aggregate Functions |
Analytic Functions |
String Functions
ascii | byte_length | chr | concat | ends_with |
length | lower | ltrim | regexp_extract | repeat |
replace | repeat | rtrim | starts_with | strpos |
substr | trim | unicode | upper |
concat
concat(value, ...)
Concatenates multiple values together, casting non-string
values to string
. The exact behavior of string casting may depend on the dialect.
If no values are given, concat
returns the empty string.
Behavior for null
depends on dialect: in BigQuery, if any argument is null
, the result is null
; in DuckDB and Postgres, null
is treated as an empty string.
run: empty -> { group_by: e1 is concat('foo', 'bar') e2 is concat(1, null) e4 is concat('Date: ', @2021-01-23) e3 is concat() }
[ { "e1": "foobar", "e2": "1", "e4": "Date: 2021-01-23", "e3": "" } ]
SELECT CONCAT('foo','bar') as "e1", CONCAT(1,NULL) as "e2", CONCAT('Date: ',DATE '2021-01-23') as "e4", '' as "e3" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
lower
lower(value)
Returns a string like value
but with all alphabetic characters in lowercase.
run: empty -> { group_by: e1 is lower('FOO') }
[ { "e1": "foo" } ]
SELECT LOWER('FOO') as "e1" FROM ( SELECT 1 ) as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
upper
upper(value)
Returns a string like value
but with all alphabetic characters in uppercase.
run: empty -> { group_by: e1 is upper('foo') }
[ { "e1": "FOO" } ]
SELECT UPPER('foo') as "e1" FROM ( SELECT 1 ) as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
strpos
strpos(test_string, search_string)
Returns the 1-based position of the first occurrence of search_string
in test_string
, or 0
if search_string
is not found.
run: empty -> { group_by: e1 is strpos('abc', 'b') e2 is strpos('abc', 'd') e3 is strpos('abc', null) e4 is strpos(null, 'a') }
[ { "e1": 2, "e2": 0, "e3": null, "e4": null } ]
SELECT STRPOS('abc','b') as "e1", STRPOS('abc','d') as "e2", STRPOS('abc',NULL) as "e3", STRPOS(NULL,'a') as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
starts_with
starts_with(value, prefix)
Returns true
if value
begins with prefix
and false
otherwise. If either value
or prefix
is null
, the result is false
(unlike in SQL, where it would be null
).
run: empty -> { group_by: e1 is starts_with('abc', 'a') e2 is starts_with('abc', 'b') e3 is starts_with('abc', null) e4 is starts_with(null, 'a') }
[ { "e1": true, "e2": false, "e3": false, "e4": false } ]
SELECT COALESCE(STARTS_WITH('abc', 'a'), false) as "e1", COALESCE(STARTS_WITH('abc', 'b'), false) as "e2", COALESCE(STARTS_WITH('abc', NULL), false) as "e3", COALESCE(STARTS_WITH(NULL, 'a'), false) as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
ends_with
ends_with(value, suffix)
Returns true
if value
ends with prefix
and false
otherwise. If either value
or suffix
is null
, the result is false
(unlike in SQL, where it would be null
).
run: empty -> { group_by: e1 is ends_with('abc', 'c') e2 is ends_with('abc', 'b') e3 is ends_with('abc', null) e4 is ends_with(null, 'a') }
[ { "e1": true, "e2": false, "e3": false, "e4": false } ]
SELECT COALESCE(SUFFIX('abc', 'c'), false) as "e1", COALESCE(SUFFIX('abc', 'b'), false) as "e2", COALESCE(SUFFIX('abc', NULL), false) as "e3", COALESCE(SUFFIX(NULL, 'a'), false) as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
trim
trim(value) trim(value, trim_characters)
Returns a string with leading and trailing characters in trim_characters
(or whitespace, if trim_characters
is unspecified) removed.
run: empty -> { group_by: e1 is trim(' keep this ') e2 is trim('_ _keep_this_ _', '_ ') e3 is trim(' keep everything ', '') e4 is trim('null example', null) e5 is trim(null, ' _') }
[ { "e1": "keep this", "e2": "keep_this", "e3": " keep everything ", "e4": null, "e5": null } ]
SELECT TRIM(' keep this ') as "e1", TRIM('_ _keep_this_ _','_ ') as "e2", TRIM(' keep everything ','') as "e3", TRIM('null example',NULL) as "e4", TRIM(NULL,' _') as "e5" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
ltrim
ltrim(value) ltrim(value, trim_characters)
Like trim(value, trim_characters)
but only removes leading characters.
run: empty -> { group_by: e1 is ltrim(' keep this -> ') e2 is ltrim('_ _keep_this -> _ _', '_ ') e3 is ltrim(' keep everything ', '') e4 is ltrim('null example', null) e5 is ltrim(null, ' _') }
[ { "e1": "keep this -> ", "e2": "keep_this -> _ _", "e3": " keep everything ", "e4": null, "e5": null } ]
SELECT LTRIM(' keep this -> ') as "e1", LTRIM('_ _keep_this -> _ _','_ ') as "e2", LTRIM(' keep everything ','') as "e3", LTRIM('null example',NULL) as "e4", LTRIM(NULL,' _') as "e5" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
rtrim
rtrim(value) rtrim(value, trim_characters)
Like trim(value, trim_characters)
but only removes trailing characters.
run: empty -> { group_by: e1 is rtrim(' <- keep this ') e2 is rtrim('_ _ <- keep_this _ _', '_ ') e3 is rtrim(' keep everything ', '') e4 is rtrim('null example', null) e5 is rtrim(null, ' _') }
[ { "e1": " <- keep this", "e2": "_ _ <- keep_this", "e3": " keep everything ", "e4": null, "e5": null } ]
SELECT RTRIM(' <- keep this ') as "e1", RTRIM('_ _ <- keep_this _ _','_ ') as "e2", RTRIM(' keep everything ','') as "e3", RTRIM('null example',NULL) as "e4", RTRIM(NULL,' _') as "e5" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
substr
substr(value, position) substr(value, position, length)
Returns a substring of value
starting at the 1-based index position
. If length
is specified, the returned string will be at most length
characters long; otherwise the returned string will extend to the end of value
. Negative values of position
index the starting value from the end of the string (with -1
for the last character of the string). A position
of 0
is equivalent to a position
of 1
.
run: empty -> { group_by: e1 is substr('123456789', 1) e2 is substr('123456789', 0) e3 is substr('123456789', 5) e4 is substr('123456789', -4) e5 is substr('123456789', 1, 3) e6 is substr('123456789', -5, 3) e7 is substr('123456789', null, 1) e8 is substr('123456789', 1, null) e9 is substr(null, 1, 1) }
[ { "e1": "123456789", "e2": "123456789", "e3": "56789", "e4": "6789", "e5": "123", "e6": "567", "e7": null, "e8": null, "e9": null } ]
SELECT SUBSTR('123456789',1) as "e1", SUBSTR('123456789',0) as "e2", SUBSTR('123456789',5) as "e3", SUBSTR('123456789',-4) as "e4", SUBSTR('123456789',1,3) as "e5", SUBSTR('123456789',-5,3) as "e6", SUBSTR('123456789',NULL,1) as "e7", SUBSTR('123456789',1,NULL) as "e8", SUBSTR(NULL,1,1) as "e9" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6,7,8,9 ORDER BY 1 asc NULLS LAST
regexp_extract
regexp_extract(value, pattern)
Returns the first substring of value
which matches pattern
. Returns null
if there is no matching substring.
run: empty -> { group_by: e1 is regexp_extract('the Malloy language', r'M....y') e2 is regexp_extract('the SQL language', r'M....y') e3 is regexp_extract('the null language', null) e4 is regexp_extract(null, r'nothing') }
[ { "e1": "Malloy", "e2": "", "e3": null, "e4": null } ]
SELECT REGEXP_EXTRACT('the Malloy language','M....y') as "e1", REGEXP_EXTRACT('the SQL language','M....y') as "e2", REGEXP_EXTRACT('the null language',NULL) as "e3", REGEXP_EXTRACT(NULL,'nothing') as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
replace
replace(value, pattern, replacement)
Returns a copy of value
with all occurrences of pattern
replaced with replacement
.
If pattern
is empty, no replacement occurs and value
is returned unchanged.
If pattern
is a regular expression, parenthesized capturing groups can be included in the replacement
string with '\\1'
to '\\9'
. In BigQuery and DuckDB, the full matched string can be referenced with '\\0'
.
run: empty -> { group_by: e1 is replace('the SQL language', 'SQL', 'Malloy') e2 is replace('the SQL language', r'S.L', 'Malloy') e3 is replace('SQL SQL SQL', 'SQL', 'Malloy') e4 is replace('SQL SQL SQL', r'S.L', 'Malloy') e5 is replace(null, 'SQL', 'Malloy') e6 is replace('the null language', null, 'Malloy') e7 is replace('the null language', 'SQL', null) e8 is replace('the language', '', 'Malloy') e9 is replace('axbxc', r'(a).(b).(c)', '\\0 - \\1 - \\2 - \\3') }
[ { "e1": "the Malloy language", "e2": "the Malloy language", "e3": "Malloy Malloy Malloy", "e4": "Malloy Malloy Malloy", "e5": null, "e6": null, "e7": null, "e8": "the language", "e9": "axbxc - a - b - c" } ]
SELECT REPLACE('the SQL language','SQL','Malloy') as "e1", REGEXP_REPLACE('the SQL language', 'S.L', 'Malloy', 'g') as "e2", REPLACE('SQL SQL SQL','SQL','Malloy') as "e3", REGEXP_REPLACE('SQL SQL SQL', 'S.L', 'Malloy', 'g') as "e4", REPLACE(NULL,'SQL','Malloy') as "e5", REPLACE('the null language',NULL,'Malloy') as "e6", REPLACE('the null language','SQL',NULL) as "e7", REPLACE('the language','','Malloy') as "e8", REGEXP_REPLACE('axbxc', '(a).(b).(c)', '\0 - \1 - \2 - \3', 'g') as "e9" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6,7,8,9 ORDER BY 1 asc NULLS LAST
length
length(value)
Returns the number of characters in value
.
run: empty -> { group_by: e1 is length('Malloy') e2 is length('') e3 is length('𝔐𝔞𝔩𝔩𝔬𝔶') e4 is length(null) }
[ { "e1": 6, "e2": 0, "e3": 6, "e4": null } ]
SELECT LENGTH('Malloy') as "e1", LENGTH('') as "e2", LENGTH('𝔐𝔞𝔩𝔩𝔬𝔶') as "e3", LENGTH(NULL) as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
byte_length
byte_length(value)
Returns the number of bytes in the unicode encoding of value
.
run: empty -> { group_by: e1 is byte_length('Malloy') e2 is byte_length('') e3 is byte_length('𝔐𝔞𝔩𝔩𝔬𝔶') e4 is byte_length(null) }
[ { "e1": 6, "e2": 0, "e3": 24, "e4": null } ]
SELECT (BIT_LENGTH('Malloy') / 8) as "e1", (BIT_LENGTH('') / 8) as "e2", (BIT_LENGTH('𝔐𝔞𝔩𝔩𝔬𝔶') / 8) as "e3", (BIT_LENGTH(NULL) / 8) as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
chr
chr(value)
Returns a unicode code point value
and returns a string containing the character with the matching code point. If value
is 0, returns an empty string not a string containing the null character.
run: empty -> { group_by: e1 is chr(65) e2 is chr(255) e3 is chr(0) e4 is chr(null) }
[ { "e1": "A", "e2": "ÿ", "e3": "", "e4": null } ]
SELECT CASE WHEN 65 = 0 THEN '' ELSE CHR(65) END as "e1", CASE WHEN 255 = 0 THEN '' ELSE CHR(255) END as "e2", CASE WHEN 0 = 0 THEN '' ELSE CHR(0) END as "e3", CASE WHEN NULL = 0 THEN '' ELSE CHR(NULL) END as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
Returns an error if value
is not a valid unicode code point.
ascii
ascii(value)
Returns the ASCII code point of the first character of value
. If value
is empty, returns 0.
In BigQuery, ascii
errors if the first character of value
is not representable in ASCII. In Postgres and DuckDB, ascii
returns the Unicode code point of the first character.
run: empty -> { group_by: e1 is ascii('Malloy') e2 is ascii('') e3 is ascii(null) }
[ { "e1": 77, "e2": 0, "e3": null } ]
SELECT ASCII('Malloy') as "e1", ASCII('') as "e2", ASCII(NULL) as "e3" FROM ( SELECT 1 ) as base GROUP BY 1,2,3 ORDER BY 1 asc NULLS LAST
unicode
unicode(value)
Returns the Unicode code point of the first character of value
. If value
is empty, returns 0.
run: empty -> { group_by: e1 is unicode('Malloy') e2 is unicode('𝔐𝔞𝔩𝔩𝔬𝔶') e3 is unicode('') e4 is unicode(null) }
[ { "e1": 77, "e2": 120080, "e3": 0, "e4": null } ]
SELECT CASE WHEN 'Malloy' = '' THEN 0 ELSE UNICODE('Malloy') END as "e1", CASE WHEN '𝔐𝔞𝔩𝔩𝔬𝔶' = '' THEN 0 ELSE UNICODE('𝔐𝔞𝔩𝔩𝔬𝔶') END as "e2", CASE WHEN '' = '' THEN 0 ELSE UNICODE('') END as "e3", CASE WHEN NULL = '' THEN 0 ELSE UNICODE(NULL) END as "e4" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
repeat
repeat(value, num_repetitions)
Return a string consisting of value
repeated num_repetitions
times.
Undefined behavior if num_repetitions
is negative or a non-integer.
run: empty -> { group_by: e1 is repeat('(A)', 3) e2 is repeat('(A)', 1) e3 is repeat('(A)', 0) e4 is repeat('', 3) e5 is repeat(null, 1) e6 is repeat('(A)', null) }
[ { "e1": "(A)(A)(A)", "e2": "(A)", "e3": "", "e4": "", "e5": null, "e6": null } ]
SELECT REPEAT('(A)',3) as "e1", REPEAT('(A)',1) as "e2", REPEAT('(A)',0) as "e3", REPEAT('',3) as "e4", REPEAT(NULL,1) as "e5", REPEAT('(A)',NULL) as "e6" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6 ORDER BY 1 asc NULLS LAST
reverse
reverse(value)
Return a copy of value
with the characters in reverse order.
run: empty -> { group_by: e1 is reverse('Malloy') e2 is reverse('𝔶𝔬𝔩𝔩𝔞𝔐') e3 is reverse('racecar') e4 is reverse('') e5 is reverse(null) }
[ { "e1": "yollaM", "e2": "𝔐𝔞𝔩𝔩𝔬𝔶", "e3": "racecar", "e4": "", "e5": null } ]
SELECT REVERSE('Malloy') as "e1", REVERSE('𝔶𝔬𝔩𝔩𝔞𝔐') as "e2", REVERSE('racecar') as "e3", REVERSE('') as "e4", REVERSE(NULL) as "e5" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
Numeric Functions
abs | acos | asin | atan | atan2 | ceil |
cos | div | exp | floor | is_inf | is_nan |
log | ln | pi | pow | rand | round |
sign | sin | sqrt | tan | trunc |
round
round(value) round(value, precision)
Round value
to the nearest integer, or if precision
is specified, round to precision
decimal places right of the decimal. If precision
is negative, round to a precision that many places to the left of the decimal. Halfway cases are rounded away from zero.
run: empty -> { group_by: e1 is round(1.49) e2 is round(1.51) e3 is round(1.5) e4 is round(-1.5) e5 is round(1.5, 0) e6 is round(1.551, 1) e7 is round(14.12, -1) e8 is round(1.4, null) e9 is round(null, 1) }
[ { "e1": 1, "e2": 2, "e3": 2, "e4": -2, "e5": 2, "e6": 1.6, "e7": 10, "e8": null, "e9": null } ]
SELECT ROUND(1.49::DOUBLE) as "e1", ROUND(1.51::DOUBLE) as "e2", ROUND(1.5::DOUBLE) as "e3", ROUND(-1.5::DOUBLE) as "e4", ROUND(1.5::DOUBLE,0) as "e5", ROUND(1.551::DOUBLE,1) as "e6", ROUND(14.12::DOUBLE,-1) as "e7", ROUND(1.4::DOUBLE,NULL) as "e8", ROUND(NULL,1) as "e9" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6,7,8,9 ORDER BY 1 asc NULLS LAST
trunc
trunc(value) trunc(value, precision)
Truncate value
to an integer, or if precision
is specified, truncate after that many decimal places. If precision
is negative, truncate to that many decimal places to the left of the decimal point. Similar to round(value, precision)
but always rounds toward zero.
run: empty -> { group_by: e1 is trunc(1.49) e2 is trunc(1.51) e3 is trunc(1.5) e4 is trunc(-1.5) e5 is trunc(1.5, 0) e6 is trunc(1.551, 1) e7 is trunc(19.999, -1) e8 is trunc(1.4, null) e9 is trunc(null, 1) }
[ { "e1": 1, "e2": 1, "e3": 1, "e4": -1, "e5": 1, "e6": 1.5, "e7": 10, "e8": null, "e9": null } ]
SELECT CASE WHEN 1.49::DOUBLE < 0 THEN CEIL(1.49::DOUBLE) ELSE FLOOR(1.49::DOUBLE) END as "e1", CASE WHEN 1.51::DOUBLE < 0 THEN CEIL(1.51::DOUBLE) ELSE FLOOR(1.51::DOUBLE) END as "e2", CASE WHEN 1.5::DOUBLE < 0 THEN CEIL(1.5::DOUBLE) ELSE FLOOR(1.5::DOUBLE) END as "e3", CASE WHEN -1.5::DOUBLE < 0 THEN CEIL(-1.5::DOUBLE) ELSE FLOOR(-1.5::DOUBLE) END as "e4", CASE WHEN 1.5::DOUBLE < 0 THEN CEIL(1.5::DOUBLE * POW(10, 0)) / POW(10, 0) ELSE FLOOR(1.5::DOUBLE * POW(10, 0)) / POW(10, 0) END as "e5", CASE WHEN 1.551::DOUBLE < 0 THEN CEIL(1.551::DOUBLE * POW(10, 1)) / POW(10, 1) ELSE FLOOR(1.551::DOUBLE * POW(10, 1)) / POW(10, 1) END as "e6", CASE WHEN 19.999::DOUBLE < 0 THEN CEIL(19.999::DOUBLE * POW(10, -1)) / POW(10, -1) ELSE FLOOR(19.999::DOUBLE * POW(10, -1)) / POW(10, -1) END as "e7", CASE WHEN 1.4::DOUBLE < 0 THEN CEIL(1.4::DOUBLE * POW(10, NULL)) / POW(10, NULL) ELSE FLOOR(1.4::DOUBLE * POW(10, NULL)) / POW(10, NULL) END as "e8", CASE WHEN NULL < 0 THEN CEIL(NULL * POW(10, 1)) / POW(10, 1) ELSE FLOOR(NULL * POW(10, 1)) / POW(10, 1) END as "e9" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6,7,8,9 ORDER BY 1 asc NULLS LAST
floor
floor(value)
Round value
down to the greatest integer not larger than value
.
run: empty -> { group_by: a is floor(1.0) b is floor(1.9) c is floor(-1.1) d is floor(null) }
[ { "a": 1, "b": 1, "c": -2, "d": null } ]
SELECT FLOOR(1.0::DOUBLE) as "a", FLOOR(1.9::DOUBLE) as "b", FLOOR(-1.1::DOUBLE) as "c", FLOOR(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
ceil
ceil(value)
Round value
up to the smallest integer not less than value
.
run: empty -> { group_by: a is ceil(1.0) b is ceil(1.1) c is ceil(-1.9) d is ceil(null) }
[ { "a": 1, "b": 2, "c": -1, "d": null } ]
SELECT CEIL(1.0::DOUBLE) as "a", CEIL(1.1::DOUBLE) as "b", CEIL(-1.9::DOUBLE) as "c", CEIL(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
cos
cos(angle)
Computes the cosine of angle
where angle
is specified in radians.
run: empty -> { group_by: a is cos(0) b is cos(pi()) c is cos(pi() / 2) d is cos(null) }
[ { "a": 1, "b": -1, "c": 6.123233995736766e-17, "d": null } ]
SELECT COS(0) as "a", COS(PI()) as "b", COS((PI())*1.0/2) as "c", COS(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
acos
acos(value)
Computes the principal value of the inverse cosine of value
. The return value is in the range [0,π]. Generates an error if value
is a value outside of the range [-1, 1].
run: empty -> { group_by: a is acos(0) b is acos(1) c is acos(-1) d is acos(null) }
[ { "a": 1.5707963267948966, "b": 0, "c": 3.141592653589793, "d": null } ]
SELECT ACOS(0) as "a", ACOS(1) as "b", ACOS(-1) as "c", ACOS(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
sin
sin(angle)
Computes the sine of angle
where angle
is specified in radians.
run: empty -> { group_by: a is sin(0) b is sin(pi()) c is sin(pi() / 2) d is sin(null) }
[ { "a": 0, "b": 1.2246467991473532e-16, "c": 1, "d": null } ]
SELECT SIN(0) as "a", SIN(PI()) as "b", SIN((PI())*1.0/2) as "c", SIN(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
asin
asin(value)
Computes the principal value of the inverse sine of value
. The return value is in the range [-π/2,π/2]. Generates an error if value
is outside of the range [-1, 1].
run: empty -> { group_by: a is asin(0) b is asin(1) c is asin(-1) d is asin(null) }
[ { "a": 0, "b": 1.5707963267948966, "c": -1.5707963267948966, "d": null } ]
SELECT ASIN(0) as "a", ASIN(1) as "b", ASIN(-1) as "c", ASIN(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
tan
tan(angle)
Computes the tangent of angle
where angle
is specified in radians.
run: empty -> { group_by: a is tan(0) b is tan(pi()) c is tan(pi() / 4) d is tan(3 * pi() / 4) e is tan(null) }
[ { "a": 0, "b": -1.2246467991473532e-16, "c": 0.9999999999999999, "d": -1.0000000000000002, "e": null } ]
SELECT TAN(0) as "a", TAN(PI()) as "b", TAN((PI())*1.0/4) as "c", TAN((3*(PI()))*1.0/4) as "d", TAN(NULL) as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
atan
atan(value)
Computes the principal value of the inverse tangent of value
. The return value is in the range [-π/2,π/2].
run: empty -> { group_by: a is atan(0) b is atan(1) c is atan(-1) d is atan(null) }
[ { "a": 0, "b": 0.7853981633974483, "c": -0.7853981633974483, "d": null } ]
SELECT ATAN(0) as "a", ATAN(1) as "b", ATAN(-1) as "c", ATAN(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
atan2
atan2(y, x)
Calculates the principal value of the inverse tangent of x / y
using the signs of the two arguments to determine the quadrant. The return value is in the range [-π,π].
run: empty -> { group_by: a is atan2(1, 0) b is atan2(0, 1) c is atan2(-1, 0) d is atan2(0, -1) e is atan2(null, 1) f is atan2(1, null) }
[ { "a": 1.5707963267948966, "b": 0, "c": -1.5707963267948966, "d": 3.141592653589793, "e": null, "f": null } ]
SELECT ATAN2(1,0) as "a", ATAN2(0,1) as "b", ATAN2(-1,0) as "c", ATAN2(0,-1) as "d", ATAN2(NULL,1) as "e", ATAN2(1,NULL) as "f" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6 ORDER BY 1 asc NULLS LAST
sqrt
sqrt(value)
Computes the square root of value
. Generates an error if value < 0
.
run: empty -> { group_by: a is sqrt(9) b is sqrt(0) c is sqrt(null) }
[ { "a": 3, "b": 0, "c": null } ]
SELECT SQRT(9) as "a", SQRT(0) as "b", SQRT(NULL) as "c" FROM ( SELECT 1 ) as base GROUP BY 1,2,3 ORDER BY 1 asc NULLS LAST
pow
pow(base, exponent)
Returns base
raised to the power of exponent
.
run: empty -> { group_by: a is pow(2, 3) b is pow(100, 0) c is pow(10, -1) d is pow(0, 10) e is pow(null, 1) f is pow(1, null) }
[ { "a": 8, "b": 1, "c": 0.1, "d": 0, "e": null, "f": null } ]
SELECT POW(2,3) as "a", POW(100,0) as "b", POW(10,-1) as "c", POW(0,10) as "d", POW(NULL,1) as "e", POW(1,NULL) as "f" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6 ORDER BY 1 asc NULLS LAST
Generates an error if base
is 0 and exponent < 0
.
abs
abs(value)
Returns the absolute value of value
.
run: empty -> { group_by: a is abs(100) b is abs(-100) c is abs(0) d is abs(null) }
[ { "a": 100, "b": 100, "c": 0, "d": null } ]
SELECT ABS(100) as "a", ABS(-100) as "b", ABS(0) as "c", ABS(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
sign
sign(value)
Returns -1
, 0
, or 1
for negative, zero and positive arguments respectively. For floating point arguments, this function does not distinguish between positive and negative zero.
run: empty -> { group_by: a is sign(100) b is sign(-100) c is sign(0) d is sign(null) }
[ { "a": 1, "b": -1, "c": 0, "d": null } ]
SELECT SIGN(100) as "a", SIGN(-100) as "b", SIGN(0) as "c", SIGN(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
is_inf
is_inf(value)
Returns true
if value
is infinite (positive or negative), false
otherwise. Unlike in SQL, is_inf(null) = false
(not null
).
run: empty -> { group_by: a is is_inf(100) b is is_inf('inf'::number) c is is_inf('-inf'::number) d is is_inf('NaN'::number) e is is_inf(null) }
[ { "a": false, "b": true, "c": true, "d": false, "e": false } ]
SELECT COALESCE(ISINF(100), false) as "a", COALESCE(ISINF(CAST('inf' AS double precision)), false) as "b", COALESCE(ISINF(CAST('-inf' AS double precision)), false) as "c", COALESCE(ISINF(CAST('NaN' AS double precision)), false) as "d", COALESCE(ISINF(NULL), false) as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
is_nan
is_nan(value)
Returns true
if value
is NaN
, false
otherwise. Unlike in SQL, is_nan(null) = false
(not null
).
run: empty -> { group_by: a is is_nan(100) b is is_nan('NaN'::number) c is is_nan('inf'::number) d is is_nan(null) }
[ { "a": false, "b": true, "c": false, "d": false } ]
SELECT COALESCE(ISNAN(100), false) as "a", COALESCE(ISNAN(CAST('NaN' AS double precision)), false) as "b", COALESCE(ISNAN(CAST('inf' AS double precision)), false) as "c", COALESCE(ISNAN(NULL), false) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
div
div(dividend, divisor)
Returns the (truncated) integer part of the division of dividend
by divisor
. Division by zero returns an error.
run: empty -> { group_by: a is div(9, 2) a2 is trunc(9 / 2) b is div(-9, 2) c is div(10, 1) d is div(null, 1) e is div(1, null) }
[ { "a": 4, "a2": 4, "b": -4, "c": 10, "d": null, "e": null } ]
SELECT CASE WHEN 9 / 2 < 0 THEN CEIL(9 / 2) ELSE FLOOR(9 / 2) END as "a", CASE WHEN 9*1.0/2 < 0 THEN CEIL(9*1.0/2) ELSE FLOOR(9*1.0/2) END as "a2", CASE WHEN -9 / 2 < 0 THEN CEIL(-9 / 2) ELSE FLOOR(-9 / 2) END as "b", CASE WHEN 10 / 1 < 0 THEN CEIL(10 / 1) ELSE FLOOR(10 / 1) END as "c", CASE WHEN NULL / 1 < 0 THEN CEIL(NULL / 1) ELSE FLOOR(NULL / 1) END as "d", CASE WHEN 1 / NULL < 0 THEN CEIL(1 / NULL) ELSE FLOOR(1 / NULL) END as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6 ORDER BY 1 asc NULLS LAST
rand
rand()
Returns a random number in the range [0, 1).
run: empty -> { group_by: a is rand() b is rand() c is rand() }
[ { "a": 0.2799579987768084, "b": 0.20149066345766187, "c": 0.17836001049727201 } ]
SELECT RANDOM() as "a", RANDOM() as "b", RANDOM() as "c" FROM ( SELECT 1 ) as base GROUP BY 1,2,3 ORDER BY 1 asc NULLS LAST
pi
pi()
Returns the value of π.
run: empty -> { group_by: pi is pi() }
[ { "pi": 3.141592653589793 } ]
SELECT PI() as "pi" FROM ( SELECT 1 ) as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
log
log(value, base)
Returns the logarithm of value
with the specified base
. Note that base
is required. Errors if value <= 0
, base <= 0
, or if base = 1
.
run: empty -> { group_by: a is log(10, 10) b is log(100, 10) c is log(32, 2) d is log(null, 2) e is log(10, null) }
[ { "a": 1, "b": 2, "c": 5, "d": null, "e": null } ]
SELECT (LN(10) / LN(10)) as "a", (LN(100) / LN(10)) as "b", (LN(32) / LN(2)) as "c", (LN(NULL) / LN(2)) as "d", (LN(10) / LN(NULL)) as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 1 asc NULLS LAST
ln
ln(value)
Returns the natural log (log base e) of value
. Equivalent to log(value, exp(1))
. Errors if value <= 0
.
run: empty -> { group_by: a is ln(exp(1)) b is ln(exp(2)) c is ln(100) d is ln(null) }
[ { "a": 1, "b": 2, "c": 4.605170185988092, "d": null } ]
SELECT LN(EXP(1)) as "a", LN(EXP(2)) as "b", LN(100) as "c", LN(NULL) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
exp
exp(power)
Returns e (Euler's number) raised to the given power
.
run: empty -> { group_by: a is exp(1) b is exp(null) }
[ { "a": 2.718281828459045, "b": null } ]
SELECT EXP(1) as "a", EXP(NULL) as "b" FROM ( SELECT 1 ) as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
Interval Functions
days | hours |
minutes | months |
seconds | quarters |
weeks | years |
seconds
seconds(interval)
Calculate the number of seconds in interval
.
minutes
minutes(interval)
Calculate the number of minutes in interval
.
hours
hours(interval)
Calculate the number of hours in interval
.
days
days(interval)
Calculate the number of days in interval
.
weeks
weeks(interval)
Calculate the number of weeks in interval
. Note: this function does not currently work.
months
months(interval)
Calculate the number of months in interval
. Note: this function does not currently work.
quarters
quarters(interval)
Calculate the number of quarters in interval
. Note: this function does not currently work.
years
years(interval)
Calculate the number of years in interval
. Note: this function does not currently work.
run: empty -> { group_by: a is seconds(now to now + 1 second) b is minutes(now to now + 1 minute) c is hours(now to now + 1 hour) d is days(now to now + 1 day) // e is weeks(now::date to now::date + 1 week) // f is months(now::date to now::date + 1 month) // g is quarters(now::date to now::date + 1 quarter) // h is years(now::date to now::date + 1 year) }
[ { "a": 1, "b": 1, "c": 1, "d": 1 } ]
SELECT DATE_SUB('second', LOCALTIMESTAMP, (LOCALTIMESTAMP + INTERVAL (1) second)) as "a", DATE_SUB('minute', LOCALTIMESTAMP, (LOCALTIMESTAMP + INTERVAL (1) minute)) as "b", DATE_SUB('hour', LOCALTIMESTAMP, (LOCALTIMESTAMP + INTERVAL (1) hour)) as "c", DATE_SUB('day', LOCALTIMESTAMP, (LOCALTIMESTAMP + INTERVAL (1) day)) as "d" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4 ORDER BY 1 asc NULLS LAST
Date and Timestamp Functions
day | day_of_week | day_of_year |
hour | minute | month |
quarter | second | week |
year |
day_of_year
day_of_year(moment)
Returns the day of the year (from 1 to 365) of moment
.
day
day(moment)
Returns the day of the month (from 1 to 31) of moment
.
day_of_week
day_of_week(moment)
Returns the day of the week (from 1 to 7) of moment
, where 1 represents Sunday.
week
week(moment)
Returns the week of the year (from 1 to 53) of moment
.
month
month(moment)
Returns the month of the year (from 1 to 12) of moment
.
quarter
quarter(moment)
Returns the quarter of the year (from 1 to 53) of moment
.
year
year(moment)
Returns the year of moment
.
hour
hour(time)
Returns the hour of the day (from 0 to 23) of time
.
minute
minute(time)
Returns the minute of the hour (from 0 to 59) of time
.
second
second(time)
Returns the second of the minute (from 0 to 59) of time
.
run: empty extend { dimension: t is @2023-12-31 23:59:59 } -> { group_by: doy is day_of_year(t) dom is day(t) dow is day_of_week(t) woy is week(t) moy is month(t) qoy is quarter(t) yyy is year(t) hod is hour(t) moh is minute(t) som is second(t) }
[ { "doy": 365, "dom": 31, "dow": 1, "woy": 52, "moy": 12, "qoy": 4, "yyy": 2023, "hod": 23, "moh": 59, "som": 59 } ]
SELECT EXTRACT(doy FROM (TIMESTAMP '2023-12-31 23:59:59')) as "doy", EXTRACT(day FROM (TIMESTAMP '2023-12-31 23:59:59')) as "dom", mod(EXTRACT(dow FROM (TIMESTAMP '2023-12-31 23:59:59'))+1,7) as "dow", EXTRACT(week FROM (TIMESTAMP '2023-12-31 23:59:59')) as "woy", EXTRACT(month FROM (TIMESTAMP '2023-12-31 23:59:59')) as "moy", EXTRACT(quarter FROM (TIMESTAMP '2023-12-31 23:59:59')) as "qoy", EXTRACT(year FROM (TIMESTAMP '2023-12-31 23:59:59')) as "yyy", EXTRACT(hour FROM (TIMESTAMP '2023-12-31 23:59:59')) as "hod", EXTRACT(minute FROM (TIMESTAMP '2023-12-31 23:59:59')) as "moh", EXTRACT(second FROM (TIMESTAMP '2023-12-31 23:59:59')) as "som" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5,6,7,8,9,10 ORDER BY 1 asc NULLS LAST
Other Functions
greatest | least | coalesce |
ifnull | nullif | |
hll_accumulate | hll_estimate | hll_combine |
hll_export | hll_import |
greatest
greatest(value, ...)
Returns the greatest value of all arguments, supporting, number
, string
, date
, and timestamp
. Returns null
if any argument is null
.
run: empty -> { group_by: a is greatest(3, 1, 8) b is greatest('z', 'b', 'f') c is greatest(@2003-11-03, @2001-10-21) d is greatest(@2003-11-03 11:25, @2003-11-03 11:24) e is greatest(100, 99, null) }
[ { "a": 8, "b": "z", "c": "2003-11-03T00:00:00.000Z", "d": "2003-11-03T11:25:00.000Z", "e": null } ]
SELECT CASE WHEN LEN(LIST_FILTER([3,1,8], x -> x is null)) > 0 THEN NULL ELSE GREATEST(3,1,8) END as "a", CASE WHEN LEN(LIST_FILTER(['z','b','f'], x -> x is null)) > 0 THEN NULL ELSE GREATEST('z','b','f') END as "b", CASE WHEN LEN(LIST_FILTER([DATE '2003-11-03',DATE '2001-10-21'], x -> x is null)) > 0 THEN NULL ELSE GREATEST(DATE '2003-11-03',DATE '2001-10-21') END as "c", CASE WHEN LEN(LIST_FILTER([TIMESTAMP '2003-11-03 11:25:00',TIMESTAMP '2003-11-03 11:24:00'], x -> x is null)) > 0 THEN NULL ELSE GREATEST(TIMESTAMP '2003-11-03 11:25:00',TIMESTAMP '2003-11-03 11:24:00') END as "d", CASE WHEN LEN(LIST_FILTER([100,99,NULL], x -> x is null)) > 0 THEN NULL ELSE GREATEST(100,99,NULL) END as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 3 desc NULLS LAST
least
least(value, ...)
Returns the least value of all arguments, supporting, number
, string
, date
, and timestamp
. Returns null
if any argument is null
.
run: empty -> { group_by: a is least(3, 1, 8) b is least('z', 'b', 'f') c is least(@2003-11-03, @2001-10-21) d is least(@2003-11-03 11:25, @2003-11-03 11:24) e is least(100, 99, null) }
[ { "a": 1, "b": "b", "c": "2001-10-21T00:00:00.000Z", "d": "2003-11-03T11:24:00.000Z", "e": null } ]
SELECT CASE WHEN LEN(LIST_FILTER([3,1,8], x -> x is null)) > 0 THEN NULL ELSE LEAST(3,1,8) END as "a", CASE WHEN LEN(LIST_FILTER(['z','b','f'], x -> x is null)) > 0 THEN NULL ELSE LEAST('z','b','f') END as "b", CASE WHEN LEN(LIST_FILTER([DATE '2003-11-03',DATE '2001-10-21'], x -> x is null)) > 0 THEN NULL ELSE LEAST(DATE '2003-11-03',DATE '2001-10-21') END as "c", CASE WHEN LEN(LIST_FILTER([TIMESTAMP '2003-11-03 11:25:00',TIMESTAMP '2003-11-03 11:24:00'], x -> x is null)) > 0 THEN NULL ELSE LEAST(TIMESTAMP '2003-11-03 11:25:00',TIMESTAMP '2003-11-03 11:24:00') END as "d", CASE WHEN LEN(LIST_FILTER([100,99,NULL], x -> x is null)) > 0 THEN NULL ELSE LEAST(100,99,NULL) END as "e" FROM ( SELECT 1 ) as base GROUP BY 1,2,3,4,5 ORDER BY 3 desc NULLS LAST
ifnull
ifnull(value, default)
Return value
unless it is null
, or default
otherwise.
Note: it is more idiomatic in Malloy to use ??
, the null-coalescing operator.
run: empty -> { group_by: a_sql is ifnull(null, 1) a_malloy is null ?? 1 }
[ { "a_sql": 1, "a_malloy": 1 } ]
SELECT IFNULL(NULL,1) as "a_sql", 1 as "a_malloy" FROM ( SELECT 1 ) as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
coalesce
ifnull(value, ...)
Return the first value
which is not null
, or null
if all value
s are null
. Essentially the same as ifnull
but allowing more than two arguments.
Note: it is more idiomatic in Malloy to use ??
, the null-coalescing operator.
run: empty -> { group_by: a_sql is coalesce(null, 1, 2) a_malloy is null ?? 1 ?? 2 }
[ { "a_sql": 1, "a_malloy": 1 } ]
SELECT COALESCE(NULL,1,2) as "a_sql", COALESCE(1,2) as "a_malloy" FROM ( SELECT 1 ) as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
nullif
nullif(value, condition)
Equivalent to the SQL NULLIF
function: returns value
unless value = condition
, in which case it returns null
.
Note: the use of nullif
is not idiomatic to Malloy; use pick
statements instead, as they are more flexible.
run: empty extend { dimension: value is 1 } -> { group_by: a_sql is nullif(value, 2) a_malloy is value ? pick null when 2 b_malloy is value ? pick null when < 2 }
[ { "a_sql": 1, "a_malloy": 1, "b_malloy": null } ]
SELECT NULLIF((1),2) as "a_sql", CASE WHEN ((1)=2) THEN NULL ELSE (1) END as "a_malloy", CASE WHEN ((1)<2) THEN NULL ELSE (1) END as "b_malloy" FROM ( SELECT 1 ) as base GROUP BY 1,2,3 ORDER BY 1 asc NULLS LAST
There are many functions which are useful in SQL which are not part of Malloy for a variety of reasons including a) there is a different way to do it in Malloy, b) it is not generalizable to other dialects, c) it just hasn't been added yet. Some of the more common ones are listed here.
IFNULL, COALESCE
Neither IFNULL
nor COALESCE
is available in Malloy, because Malloy has a specific coalescing operator: ??
.
run: empty -> { group_by: a is null ?? 1 // same as IFNULL(NULL, 1) b is null ?? 1 ?? 2 // same as COALESCE(NULL, 1, 2) }
[ { "a": 1, "b": 1 } ]
SELECT 1 as "a", COALESCE(1,2) as "b" FROM ( SELECT 1 ) as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
NULLIF
The NULLIF
SQL function is not available in Malloy, because the same thing is expressible in a more Malloy-like way, using a pick
statement. This is also more flexible, as the pick statement allows for partial comparisons.
run: empty extend { dimension: value is 1 } -> { group_by: a is value ? pick null when 2 // same as NULLIF(value, 2) b is value ? pick null when < 2 // same as CASE NULL WHEN value < 2 ELSE value END }
[ { "a": 1, "b": null } ]
SELECT CASE WHEN ((1)=2) THEN NULL ELSE (1) END as "a", CASE WHEN ((1)<2) THEN NULL ELSE (1) END as "b" FROM ( SELECT 1 ) as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
Aggregate Functions
Malloy has several "native" aggregate functions, sum
, avg
, max
, min
, and count
, as well as "non-native" aggregate functions, which currently only includes stddev
. All of these support aggregate locality and symmetric aggregate handling.
Note: Aggregate locality is currently not supported in BigQuery for stddev
.
avg | count |
max | min |
stddev | sum |
stddev
stddev(value)
Returns the standard deviation of values of value
across all rows.
run: flights -> { group_by: carrier aggregate: dep_delay_avg is avg(dep_delay) aggregate: dep_delay_stddev is stddev(dep_delay) order_by: dep_delay_avg asc }
[ { "carrier": "TZ", "dep_delay_avg": 4.366963402571711, "dep_delay_stddev": 31.293795536719024 }, { "carrier": "RU", "dep_delay_avg": 5.342229687694413, "dep_delay_stddev": 30.35020135821442 }, { "carrier": "MQ", "dep_delay_avg": 5.438716995399836, "dep_delay_stddev": 28.64584040820042 }, { "carrier": "CO", "dep_delay_avg": 6.0128869589578375, "dep_delay_stddev": 30.242701868501392 }, { "carrier": "NW", "dep_delay_avg": 6.116289458010721, "dep_delay_stddev": 30.586134538485435 } ]
SELECT base."carrier" as "carrier", AVG(base."dep_delay") as "dep_delay_avg", STDDEV(base."dep_delay") as "dep_delay_stddev" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 asc NULLS LAST
Ungrouped Aggregate Functions
Malloy has two ungrouped aggregate functions, all()
, and exclude()
, which allow you to control which dimensional values are included in the grouping while calculating a particular aggregate expression.
all |
exclude |
Window Functions
Syntax
Window functions can only appear in a calculate:
statement:
run: flights -> { group_by: carrier calculate: prev_carrier is lag(carrier) }
[ { "carrier": "AA", "prev_carrier": null }, { "carrier": "AS", "prev_carrier": "AA" }, { "carrier": "B6", "prev_carrier": "AS" }, { "carrier": "CO", "prev_carrier": "B6" }, { "carrier": "DL", "prev_carrier": "CO" } ]
SELECT base."carrier" as "carrier", LAG((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
For more detailed information, see the Calculations and Window Functions section.
run: flights -> { group_by: carrier is 'CA' carrier2 is carrier calculate: prev_carrier is lag(carrier) }
[ { "carrier": "CA", "carrier2": "DL", "prev_carrier": null }, { "carrier": "CA", "carrier2": "WN", "prev_carrier": "CA" }, { "carrier": "CA", "carrier2": "AA", "prev_carrier": "CA" }, { "carrier": "CA", "carrier2": "NW", "prev_carrier": "CA" }, { "carrier": "CA", "carrier2": "EV", "prev_carrier": "CA" } ]
SELECT 'CA' as "carrier", base."carrier" as "carrier2", LAG(('CA')) OVER( ORDER BY 'CA' asc NULLS LAST ) as "prev_carrier" FROM '../data/flights.parquet' as base GROUP BY 1,2 ORDER BY 1 asc NULLS LAST
In the above example, carrier
in the output refers to the carrier is 'CA'
, whereas carrier
in the input refers to the column in flights
. Inside a group_by
it is not possible to reference an output field, so carrier2 is carrier
refers to the column carrier
. In the calculate: prev_carrier is lag(carrier)
, however, carrier
refers to the output field, i.e. carrier is 'CA'
.
Most arguments to analytic functions are required to be output fields or aggregate fields (see specific requirements below).
run: flights -> { // ERROR: Parameter 1 ('value') of lag must be literal, constant, or output, but received input calculate: prev_carrier is lag(carrier) }
avg_moving | first_value | lag | last_value |
lead | max_cumulative | max_window | min_cumulative |
min_window | rank | row_number | sum_cumulative |
sum_window |
row_number
row_number()
Returns the row number of the current result row after grouping and aggregating.
run: flights -> { group_by: carrier calculate: row is row_number() }
[ { "carrier": "AA", "row": 1 }, { "carrier": "AS", "row": 2 }, { "carrier": "B6", "row": 3 }, { "carrier": "CO", "row": 4 }, { "carrier": "DL", "row": 5 } ]
SELECT base."carrier" as "carrier", ROW_NUMBER() OVER( ORDER BY base."carrier" asc NULLS LAST ) as "row" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
rank
rank()
Returns the rank according to the query ordering, with values having equal ordering value getting equal rank.
run: flights -> { group_by: carrier aggregate: flight_count_bucket is round(flight_count, -6) order_by: flight_count_bucket desc calculate: flight_count_rank is rank() }
[ { "carrier": "DL", "flight_count_bucket": 0, "flight_count_rank": 1 }, { "carrier": "EV", "flight_count_bucket": 0, "flight_count_rank": 1 }, { "carrier": "RU", "flight_count_bucket": 0, "flight_count_rank": 1 }, { "carrier": "OH", "flight_count_bucket": 0, "flight_count_rank": 1 }, { "carrier": "CO", "flight_count_bucket": 0, "flight_count_rank": 1 } ]
SELECT base."carrier" as "carrier", ROUND((COUNT(1)),-6) as "flight_count_bucket", RANK() OVER( ORDER BY ROUND((COUNT(1)),-6) desc NULLS LAST ) as "flight_count_rank" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
lag
lag(expr) lag(expr, offset) lag(expr, offset, default)
Returns the value of expr
for the previous row. If offset
is specified, returns the value of expr
for the row offset
rows before the current row. Returns default
(or null
if unspecified) when the referenced row doesn't exist.
Generates an error if offset
is null
, negative, or not an integer.
The value of offset
must be a constant.
run: flights -> { group_by: carrier calculate: prev_carrier is lag(carrier) calculate: prev_prev_carrier is lag(carrier, 2) calculate: prev_carrier_or_none is lag(carrier, 1, 'NONE') }
[ { "carrier": "AA", "prev_carrier": null, "prev_prev_carrier": null, "prev_carrier_or_none": "NONE" }, { "carrier": "AS", "prev_carrier": "AA", "prev_prev_carrier": null, "prev_carrier_or_none": "AA" }, { "carrier": "B6", "prev_carrier": "AS", "prev_prev_carrier": "AA", "prev_carrier_or_none": "AS" }, { "carrier": "CO", "prev_carrier": "B6", "prev_prev_carrier": "AS", "prev_carrier_or_none": "B6" }, { "carrier": "DL", "prev_carrier": "CO", "prev_prev_carrier": "B6", "prev_carrier_or_none": "CO" } ]
SELECT base."carrier" as "carrier", LAG((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier", LAG((base."carrier"),2) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_prev_carrier", LAG((base."carrier"),1,'NONE') OVER( ORDER BY base."carrier" asc NULLS LAST ) as "prev_carrier_or_none" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
lead
lead(expr) lead(expr, offset) lead(expr, offset, default)
Returns the value of expr
for the next row. If offset
is specified, returns the value of expr
for the row offset
rows after the current row. Returns default
(or null
if unspecified) when the referenced row doesn't exist.
Generates an error if offset
is null
, negative, or not an integer.
The value of offset
must be a constant.
run: flights -> { group_by: carrier limit: 5 } -> { group_by: carrier calculate: next_carrier is lead(carrier) calculate: next_next_carrier is lead(carrier, 2) calculate: next_carrier_or_none is lead(carrier, 1, 'NONE') }
[ { "carrier": "AA", "next_carrier": "AS", "next_next_carrier": "B6", "next_carrier_or_none": "AS" }, { "carrier": "AS", "next_carrier": "B6", "next_next_carrier": "CO", "next_carrier_or_none": "B6" }, { "carrier": "B6", "next_carrier": "CO", "next_next_carrier": "DL", "next_carrier_or_none": "CO" }, { "carrier": "CO", "next_carrier": "DL", "next_next_carrier": null, "next_carrier_or_none": "DL" }, { "carrier": "DL", "next_carrier": null, "next_next_carrier": null, "next_carrier_or_none": "NONE" } ]
WITH __stage0 AS ( SELECT base."carrier" as "carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST LIMIT 5 ) SELECT base."carrier" as "carrier", LEAD((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "next_carrier", LEAD((base."carrier"),2) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "next_next_carrier", LEAD((base."carrier"),1,'NONE') OVER( ORDER BY base."carrier" asc NULLS LAST ) as "next_carrier_or_none" FROM __stage0 as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
first_value
first_value(expr)
Returns the first value of expr
across all rows (i.e. the value for the first row).
run: flights -> { group_by: carrier calculate: first_carrier is first_value(carrier) }
[ { "carrier": "AA", "first_carrier": "AA" }, { "carrier": "AS", "first_carrier": "AA" }, { "carrier": "B6", "first_carrier": "AA" }, { "carrier": "CO", "first_carrier": "AA" }, { "carrier": "DL", "first_carrier": "AA" } ]
SELECT base."carrier" as "carrier", FIRST_VALUE((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "first_carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
last_value
last_value(expr)
Returns the last value of expr
across all rows (i.e. the value for the last row).
run: flights -> { group_by: carrier limit: 5 } -> { group_by: carrier calculate: last_carrier is last_value(carrier) }
[ { "carrier": "AA", "last_carrier": "DL" }, { "carrier": "AS", "last_carrier": "DL" }, { "carrier": "B6", "last_carrier": "DL" }, { "carrier": "CO", "last_carrier": "DL" }, { "carrier": "DL", "last_carrier": "DL" } ]
WITH __stage0 AS ( SELECT base."carrier" as "carrier" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST LIMIT 5 ) SELECT base."carrier" as "carrier", LAST_VALUE((base."carrier")) OVER( ORDER BY base."carrier" asc NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as "last_carrier" FROM __stage0 as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
min_cumulative
min_cumulative(expr)
Returns the minimum value of expr
among rows from the first row to the current row.
run: flights -> { group_by: carrier order_by: carrier asc aggregate: flight_count calculate: min_cumulative_flight_count is min_cumulative(flight_count) }
[ { "carrier": "AA", "flight_count": 34577, "min_cumulative_flight_count": 34577 }, { "carrier": "AS", "flight_count": 8453, "min_cumulative_flight_count": 8453 }, { "carrier": "B6", "flight_count": 4842, "min_cumulative_flight_count": 4842 }, { "carrier": "CO", "flight_count": 7139, "min_cumulative_flight_count": 4842 }, { "carrier": "DL", "flight_count": 32130, "min_cumulative_flight_count": 4842 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", MIN((COUNT(1))) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "min_cumulative_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
max_cumulative
max_cumulative(expr)
Returns the maximum value of expr
among rows from the first row to the current row.
run: flights -> { group_by: carrier order_by: carrier asc aggregate: flight_count calculate: max_cumulative_flight_count is max_cumulative(flight_count) }
[ { "carrier": "AA", "flight_count": 34577, "max_cumulative_flight_count": 34577 }, { "carrier": "AS", "flight_count": 8453, "max_cumulative_flight_count": 34577 }, { "carrier": "B6", "flight_count": 4842, "max_cumulative_flight_count": 34577 }, { "carrier": "CO", "flight_count": 7139, "max_cumulative_flight_count": 34577 }, { "carrier": "DL", "flight_count": 32130, "max_cumulative_flight_count": 34577 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", MAX((COUNT(1))) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "max_cumulative_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
sum_cumulative
sum_cumulative(expr)
Returns the cumulative sum of values of expr
from the first row to the current row.
run: flights -> { group_by: carrier order_by: carrier asc aggregate: flight_count calculate: sum_cumulative_flight_count is sum_cumulative(flight_count) }
[ { "carrier": "AA", "flight_count": 34577, "sum_cumulative_flight_count": 34577 }, { "carrier": "AS", "flight_count": 8453, "sum_cumulative_flight_count": 43030 }, { "carrier": "B6", "flight_count": 4842, "sum_cumulative_flight_count": 47872 }, { "carrier": "CO", "flight_count": 7139, "sum_cumulative_flight_count": 55011 }, { "carrier": "DL", "flight_count": 32130, "sum_cumulative_flight_count": 87141 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", SUM((COUNT(1))) OVER( ORDER BY base."carrier" asc NULLS LAST ) as "sum_cumulative_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
min_window
min_window(expr)
Returns the minimum of all values of expr
across all rows.
run: flights -> { group_by: carrier aggregate: flight_count calculate: min_flight_count is min_window(flight_count) }
[ { "carrier": "WN", "flight_count": 88751, "min_flight_count": 3033 }, { "carrier": "US", "flight_count": 37683, "min_flight_count": 3033 }, { "carrier": "AA", "flight_count": 34577, "min_flight_count": 3033 }, { "carrier": "NW", "flight_count": 33580, "min_flight_count": 3033 }, { "carrier": "UA", "flight_count": 32757, "min_flight_count": 3033 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", MIN((COUNT(1))) OVER( ) as "min_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
max_window
max_window(expr)
Returns the maximum of all values of expr
across all rows.
run: flights -> { group_by: carrier aggregate: flight_count calculate: max_flight_count is max_window(flight_count) }
[ { "carrier": "WN", "flight_count": 88751, "max_flight_count": 88751 }, { "carrier": "US", "flight_count": 37683, "max_flight_count": 88751 }, { "carrier": "AA", "flight_count": 34577, "max_flight_count": 88751 }, { "carrier": "NW", "flight_count": 33580, "max_flight_count": 88751 }, { "carrier": "UA", "flight_count": 32757, "max_flight_count": 88751 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", MAX((COUNT(1))) OVER( ) as "max_flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
sum_window
sum_window(expr)
Returns the sum of all values of expr
across all rows.
run: flights -> { group_by: carrier aggregate: flight_count calculate: total_flight_count1 is sum_window(flight_count) aggregate: total_flight_count2 is all(flight_count) }
[ { "carrier": "WN", "flight_count": 88751, "total_flight_count1": 344827, "total_flight_count2": 344827 }, { "carrier": "US", "flight_count": 37683, "total_flight_count1": 344827, "total_flight_count2": 344827 }, { "carrier": "AA", "flight_count": 34577, "total_flight_count1": 344827, "total_flight_count2": 344827 }, { "carrier": "NW", "flight_count": 33580, "total_flight_count1": 344827, "total_flight_count2": 344827 }, { "carrier": "UA", "flight_count": 32757, "total_flight_count1": 344827, "total_flight_count2": 344827 } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set=1 THEN base."carrier" END as "carrier__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1", CASE WHEN group_set=1 THEN SUM((CASE WHEN group_set=1 THEN COUNT(1) END)) OVER(PARTITION BY group_set ) END as "total_flight_count1__1", MAX((CASE WHEN group_set=0 THEN COUNT(1) END)) OVER () as "total_flight_count2__1" FROM '../data/flights.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2 ) SELECT "carrier__1" as "carrier", MAX(CASE WHEN group_set=1 THEN "flight_count__1" END) as "flight_count", MAX(CASE WHEN group_set=1 THEN "total_flight_count1__1" END) as "total_flight_count1", MAX(CASE WHEN group_set=1 THEN "total_flight_count2__1" END) as "total_flight_count2" FROM __stage0 WHERE group_set NOT IN (0) GROUP BY 1 ORDER BY 2 desc NULLS LAST
avg_moving
avg_moving(expr, preceding) avg_moving(expr, preceding, following)
Produces a moving average (or 'rolling average') of values of expr
among rows between preceding
rows before the current row and following
rows after the current row (or the current row if following
is not specified).
Both preceding
and following
must be literals.
Note: avg_moving(value, 3)
means that the average is computed over 4 rows (the current row and 3 preceding).
run: flights -> { group_by: carrier aggregate: flight_count nest: carrier_analysis is { group_by: yr is dep_time.year aggregate: flight_count order_by: yr asc calculate: three_year_avg is avg_moving(flight_count, 2) calculate: three_year_avg_centered is avg_moving(flight_count, 1, 1) } }
[ { "carrier": "WN", "flight_count": 88751, "carrier_analysis": [ { "yr": "2000-01-01T00:00:00.000Z", "flight_count": 13133, "three_year_avg": 13133, "three_year_avg_centered": 13777 }, { "yr": "2001-01-01T00:00:00.000Z", "flight_count": 14421, "three_year_avg": 13777, "three_year_avg_centered": 14087.333333333334 }, { "yr": "2002-01-01T00:00:00.000Z", "flight_count": 14708, "three_year_avg": 14087.333333333334, "three_year_avg_centered": 14476.333333333334 }, { "yr": "2003-01-01T00:00:00.000Z", "flight_count": 14300, "three_year_avg": 14476.333333333334, "three_year_avg_centered": 14549.333333333334 }, { "yr": "2004-01-01T00:00:00.000Z", "flight_count": 14640, "three_year_avg": 14549.333333333334, "three_year_avg_centered": 15496.333333333334 }, { "yr": "2005-01-01T00:00:00.000Z", "flight_count": 17549, "three_year_avg": 15496.333333333334, "three_year_avg_centered": 16094.5 } ] }, { "carrier": "US", "flight_count": 37683, "carrier_analysis": [ { "yr": "2000-01-01T00:00:00.000Z", "flight_count": 7161, "three_year_avg": 7161, "three_year_avg_centered": 7013.5 }, { "yr": "2001-01-01T00:00:00.000Z", "flight_count": 6866, "three_year_avg": 7013.5, "three_year_avg_centered": 6842.666666666667 }, { "yr": "2002-01-01T00:00:00.000Z", "flight_count": 6501, "three_year_avg": 6842.666666666667, "three_year_avg_centered": 5896 }, { "yr": "2003-01-01T00:00:00.000Z", "flight_count": 4321, "three_year_avg": 5896, "three_year_avg_centered": 5403.333333333333 }, { "yr": "2004-01-01T00:00:00.000Z", "flight_count": 5388, "three_year_avg": 5403.333333333333, "three_year_avg_centered": 5718.333333333333 }, { "yr": "2005-01-01T00:00:00.000Z", "flight_count": 7446, "three_year_avg": 5718.333333333333, "three_year_avg_centered": 6417 } ] }, { "carrier": "AA", "flight_count": 34577, "carrier_analysis": [ { "yr": "2000-01-01T00:00:00.000Z", "flight_count": 6050, "three_year_avg": 6050, "three_year_avg_centered": 5950.5 }, { "yr": "2001-01-01T00:00:00.000Z", "flight_count": 5851, "three_year_avg": 5950.5, "three_year_avg_centered": 5954.666666666667 }, { "yr": "2002-01-01T00:00:00.000Z", "flight_count": 5963, "three_year_avg": 5954.666666666667, "three_year_avg_centered": 5793.333333333333 }, { "yr": "2003-01-01T00:00:00.000Z", "flight_count": 5566, "three_year_avg": 5793.333333333333, "three_year_avg_centered": 5711 }, { "yr": "2004-01-01T00:00:00.000Z", "flight_count": 5604, "three_year_avg": 5711, "three_year_avg_centered": 5571 }, { "yr": "2005-01-01T00:00:00.000Z", "flight_count": 5543, "three_year_avg": 5571, "three_year_avg_centered": 5573.5 } ] }, { "carrier": "NW", "flight_count": 33580, "carrier_analysis": [ { "yr": "2000-01-01T00:00:00.000Z", "flight_count": 5756, "three_year_avg": 5756, "three_year_avg_centered": 5568.5 }, { "yr": "2001-01-01T00:00:00.000Z", "flight_count": 5381, "three_year_avg": 5568.5, "three_year_avg_centered": 5406 }, { "yr": "2002-01-01T00:00:00.000Z", "flight_count": 5081, "three_year_avg": 5406, "three_year_avg_centered": 5216.333333333333 }, { "yr": "2003-01-01T00:00:00.000Z", "flight_count": 5187, "three_year_avg": 5216.333333333333, "three_year_avg_centered": 5524.666666666667 }, { "yr": "2004-01-01T00:00:00.000Z", "flight_count": 6306, "three_year_avg": 5524.666666666667, "three_year_avg_centered": 5787.333333333333 }, { "yr": "2005-01-01T00:00:00.000Z", "flight_count": 5869, "three_year_avg": 5787.333333333333, "three_year_avg_centered": 6087.5 } ] }, { "carrier": "UA", "flight_count": 32757, "carrier_analysis": [ { "yr": "2000-01-01T00:00:00.000Z", "flight_count": 3621, "three_year_avg": 3621, "three_year_avg_centered": 3703 }, { "yr": "2001-01-01T00:00:00.000Z", "flight_count": 3785, "three_year_avg": 3703, "three_year_avg_centered": 4332 }, { "yr": "2002-01-01T00:00:00.000Z", "flight_count": 5590, "three_year_avg": 4332, "three_year_avg_centered": 5147.666666666667 }, { "yr": "2003-01-01T00:00:00.000Z", "flight_count": 6068, "three_year_avg": 5147.666666666667, "three_year_avg_centered": 6158.333333333333 }, { "yr": "2004-01-01T00:00:00.000Z", "flight_count": 6817, "three_year_avg": 6158.333333333333, "three_year_avg_centered": 6587 }, { "yr": "2005-01-01T00:00:00.000Z", "flight_count": 6876, "three_year_avg": 6587, "three_year_avg_centered": 6846.5 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."carrier" as "carrier__0", CASE WHEN group_set=0 THEN COUNT(1) END as "flight_count__0", CASE WHEN group_set=1 THEN DATE_TRUNC('year', base."dep_time") END as "yr__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1", CASE WHEN group_set=1 THEN AVG((CASE WHEN group_set=1 THEN COUNT(1) END)) OVER(PARTITION BY group_set, base."carrier" ORDER BY CASE WHEN group_set=1 THEN DATE_TRUNC('year', base."dep_time") END asc NULLS LAST ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) END as "three_year_avg__1", CASE WHEN group_set=1 THEN AVG((CASE WHEN group_set=1 THEN COUNT(1) END)) OVER(PARTITION BY group_set, base."carrier" ORDER BY CASE WHEN group_set=1 THEN DATE_TRUNC('year', base."dep_time") END asc NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) END as "three_year_avg_centered__1" FROM '../data/flights.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,4 ) SELECT "carrier__0" as "carrier", MAX(CASE WHEN group_set=0 THEN "flight_count__0" END) as "flight_count", COALESCE(LIST({ "yr": "yr__1", "flight_count": "flight_count__1", "three_year_avg": "three_year_avg__1", "three_year_avg_centered": "three_year_avg_centered__1"} ORDER BY "yr__1" asc NULLS LAST) FILTER (WHERE group_set=1),[]) as "carrier_analysis" FROM __stage0 GROUP BY 1 ORDER BY 2 desc NULLS LAST