Fields constitute all kinds of data in Malloy. They can represent dimensional attributes sourced directly from tables in a database, constant values to be used in later analysis, computed metrics derived from other fields, or even nested structures created from aggregating subqueries.
Defining Fields
Fields defined in sources are reusable. A field is a dimension, measure or view, or calculation. When these are used in a query, these fields are invoked with select:
, group_by:
, aggregate:
, nest:
, or calculate:
. Their definitions are syntactically identical whether defined in a source or a view (with the exception of calculations, which can only be defined in a view, and not in a source). In either case, they are defined using the is
keyword.
In a source
source: users is duckdb.table('../data/users.parquet') extend { dimension: age_in_dog_years is age * 7 }
In a query
run: users -> { group_by: age_in_dog_years is age * 7 }
[ { "age_in_dog_years": 84 }, { "age_in_dog_years": 98 }, { "age_in_dog_years": 112 }, { "age_in_dog_years": 126 }, { "age_in_dog_years": 133 } ]
SELECT base."age"*7 as "age_in_dog_years" FROM '../data/users.parquet' as base GROUP BY 1 ORDER BY 1 asc NULLS LAST
The right hand side of this kind of definition can be any field expression. See the Expressions section for more information.
Like dimensions and measures, views can also be defined as part of a source or in a query's view. When a view is used or defined in another view, it is known as a "nested view" and produces an "aggregating subquery." See the Nesting section for a detailed discussion of nested views.
run: flights -> { group_by: carrier nest: by_month is { group_by: departure_month is dep_time.month aggregate: flight_count is count() limit: 3 } }
[ { "carrier": "AA", "by_month": [ { "departure_month": "2005-12-01T00:00:00.000Z", "flight_count": 428 }, { "departure_month": "2005-11-01T00:00:00.000Z", "flight_count": 401 }, { "departure_month": "2005-10-01T00:00:00.000Z", "flight_count": 406 } ] }, { "carrier": "AS", "by_month": [ { "departure_month": "2005-12-01T00:00:00.000Z", "flight_count": 104 }, { "departure_month": "2005-11-01T00:00:00.000Z", "flight_count": 94 }, { "departure_month": "2005-10-01T00:00:00.000Z", "flight_count": 97 } ] }, { "carrier": "B6", "by_month": [ { "departure_month": "2005-12-01T00:00:00.000Z", "flight_count": 246 }, { "departure_month": "2005-11-01T00:00:00.000Z", "flight_count": 222 }, { "departure_month": "2005-10-01T00:00:00.000Z", "flight_count": 255 } ] }, { "carrier": "CO", "by_month": [ { "departure_month": "2005-12-01T00:00:00.000Z", "flight_count": 93 }, { "departure_month": "2005-11-01T00:00:00.000Z", "flight_count": 82 }, { "departure_month": "2005-10-01T00:00:00.000Z", "flight_count": 102 } ] }, { "carrier": "DL", "by_month": [ { "departure_month": "2005-12-01T00:00:00.000Z", "flight_count": 139 }, { "departure_month": "2005-11-01T00:00:00.000Z", "flight_count": 185 }, { "departure_month": "2005-10-01T00:00:00.000Z", "flight_count": 352 } ] } ]
WITH __stage0 AS ( SELECT group_set, base."carrier" as "carrier__0", CASE WHEN group_set=1 THEN DATE_TRUNC('month', base."dep_time") END as "departure_month__1", CASE WHEN group_set=1 THEN COUNT(1) END as "flight_count__1" FROM '../data/flights.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,3 ) SELECT "carrier__0" as "carrier", COALESCE(LIST({ "departure_month": "departure_month__1", "flight_count": "flight_count__1"} ORDER BY "departure_month__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "by_month" FROM __stage0 GROUP BY 1 ORDER BY 1 asc NULLS LAST
Field Names
Field names generally must start with a letter or underscore, and can only contain letters, numbers, and underscores. Field names which don't follow these rules, or which conflict with a Malloy keyword, must be enclosed in back ticks, e.g. `year` is dep_time.year
.
Kinds of Fields
Malloy includes three different kinds of fields: dimensions, measures, views, and calculations.
Dimensions
Dimensions are fields representing scalar values. All fields inherited directly from a table are dimensions.
Dimensions are defined using expressions that contain no aggregate functions.
source: users2 is duckdb.table('../data/users.parquet') extend { dimension: full_name is concat(first_name, ' ', last_name) }
Dimensions may be used in both reductions and projections.
// Show the top 10 full names by number of occurrences run: users2 -> { limit: 10 group_by: full_name aggregate: occurrences is count() } // Show 10 users' full names run: users2 -> { select: full_name limit: 10 }
[ { "full_name": "FRANK HUGHES" }, { "full_name": "HAROLD GATEWOOD" }, { "full_name": "SCOTT JOACHIM" }, { "full_name": "JAMES ALLEN" }, { "full_name": "JEREMY CASAS" } ]
SELECT CONCAT(base."first_name",' ',base."last_name") as "full_name" FROM '../data/users.parquet' as base LIMIT 10
Measures
Measures are fields representing aggregated data over multiple records.
Measures may not be used in projections (select:
views). However, any measures that appear in a reduction are "dimensionalized" as part of the query, and are therefore usable as dimensions in subsequent stages.
run: flights -> { group_by: carrier aggregate: flight_count is count() } -> { select: flight_count }
[ { "flight_count": 16074 }, { "flight_count": 4420 }, { "flight_count": 7139 }, { "flight_count": 32130 }, { "flight_count": 88751 } ]
WITH __stage0 AS ( SELECT base."carrier" as "carrier", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ) SELECT base."flight_count" as "flight_count" FROM __stage0 as base
Views
A view represents a pipelined data transformation of one or more stages.
source: flights is duckdb.table('../data/flights.parquet') extend { view: by_carrier is { group_by: carrier aggregate: flight_count is count() } }
A view can always begin with another view from the same source.
source: flights is duckdb.table('../data/flights.parquet') extend { ... view: top_carriers is by_carrier -> { select: carrier limit: 5 } }
See the Nesting section for more details about nested views.
Calculations (Window Functions)
Calculations are fields based off of groupings and aggregate values in a view, and therefore can only be created in a view with calculate:
and can not be predefined in a source. See the Calculations and Window Functions section for details.
run: flights -> { group_by: carrier aggregate: flight_count calculate: flight_count_rank is rank() }
[ { "carrier": "WN", "flight_count": 88751, "flight_count_rank": 1 }, { "carrier": "US", "flight_count": 37683, "flight_count_rank": 2 }, { "carrier": "AA", "flight_count": 34577, "flight_count_rank": 3 }, { "carrier": "NW", "flight_count": 33580, "flight_count_rank": 4 }, { "carrier": "UA", "flight_count": 32757, "flight_count_rank": 5 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", RANK() OVER( ORDER BY COUNT(1) desc NULLS LAST ) as "flight_count_rank" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST