Often when querying data the amount of data returned to look at is much smaller than the full result set, so the ordering of the data makes a big difference in what you actually see. To make things easier, Malloy has some smart defaults in the way it presents data. For the most part, you don't have to think too much about it, but in order to understand it, this document will show you how Malloy makes decisions about what to show you.
Implicit Ordering
Rule 1: Newest first
If a query stage has a dimensional column that represents a point in time, it is usually the most important concept in the query. Because the most recent data is usually the most relevant, Malloy sorts the newest data first.
run: flights -> { group_by: dep_month is dep_time.month aggregate: flight_count is count() }
[ { "dep_month": "2005-12-01T00:00:00.000Z", "flight_count": 5770 }, { "dep_month": "2005-11-01T00:00:00.000Z", "flight_count": 5533 }, { "dep_month": "2005-10-01T00:00:00.000Z", "flight_count": 6083 }, { "dep_month": "2005-09-01T00:00:00.000Z", "flight_count": 6100 }, { "dep_month": "2005-08-01T00:00:00.000Z", "flight_count": 6415 } ]
SELECT DATE_TRUNC('month', base."dep_time") as "dep_month", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST
Rule 2: Largest first
If there is a measure involved, Malloy sorts larger values first.
In the following example, Rule 1 doesn't apply, so the default behavior is to sort by first aggregate, flight_count
with the largest values first.
run: flights -> { group_by: carrier aggregate: flight_count is count() }
[ { "carrier": "WN", "flight_count": 88751 }, { "carrier": "US", "flight_count": 37683 }, { "carrier": "AA", "flight_count": 34577 }, { "carrier": "NW", "flight_count": 33580 }, { "carrier": "UA", "flight_count": 32757 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Explicit Ordering
You can be explicit about result ordering by using the order_by
clause.
In the following example, the results are ordered by carrier
in reverse alphabetical order.
run: flights -> { order_by: carrier desc group_by: carrier aggregate: flight_count is count() }
[ { "carrier": "WN", "flight_count": 88751 }, { "carrier": "US", "flight_count": 37683 }, { "carrier": "UA", "flight_count": 32757 }, { "carrier": "TZ", "flight_count": 3033 }, { "carrier": "RU", "flight_count": 16074 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST
Like in SQL, Malloy's order_by
always defaults to ascending order when desc
is omitted. This is true for any column of any type. In the example below,
the results are ordered by carrier
in alphabetical order.
run: flights -> { order_by: carrier group_by: carrier aggregate: flight_count is count() }
[ { "carrier": "AA", "flight_count": 34577 }, { "carrier": "AS", "flight_count": 8453 }, { "carrier": "B6", "flight_count": 4842 }, { "carrier": "CO", "flight_count": 7139 }, { "carrier": "DL", "flight_count": 32130 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 ASC NULLS LAST
Limiting
In Malloy, you can limit the number of results returned using limit: integer literal
or top: integer literal
. Both are provided for readability.
In the example below, the results are limited to 2 rows, which are sorted by dep_month
with newest results first (due to Rule 1).
run: flights -> { limit: 2 group_by: dep_month is dep_time.month aggregate: flight_count is count() }
[ { "dep_month": "2005-12-01T00:00:00.000Z", "flight_count": 5770 }, { "dep_month": "2005-11-01T00:00:00.000Z", "flight_count": 5533 } ]
SELECT DATE_TRUNC('month', base."dep_time") as "dep_month", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 1 desc NULLS LAST LIMIT 2