Ordering and Limiting

VSCode

Often when querying data the amount of data returned to look at is much smaller than the full result set, so the ordering of the data makes a big difference in what you actually see. To make things easier, Malloy has some smart defaults in the way it presents data. For the most part, you don't have to think too much about it, but in order to understand it, this document will show you how Malloy makes decisions about what to show you.

Implicit Ordering

Rule 1: Newest first

If a query stage has a dimensional column that represents a point in time, it is usually the most important concept in the query. Because the most recent data is usually the most relevant, Malloy sorts the newest data first.

run: flights -> {
  group_by: dep_month is dep_time.month
  aggregate: flight_count is count()
}

QUERY RESULTS

[
  {
    "dep_month": "2005-12-01T00:00:00.000Z",
    "flight_count": 5770
  },
  {
    "dep_month": "2005-11-01T00:00:00.000Z",
    "flight_count": 5533
  },
  {
    "dep_month": "2005-10-01T00:00:00.000Z",
    "flight_count": 6083
  },
  {
    "dep_month": "2005-09-01T00:00:00.000Z",
    "flight_count": 6100
  },
  {
    "dep_month": "2005-08-01T00:00:00.000Z",
    "flight_count": 6415
  }
]

SELECT 
   DATE_TRUNC('month', base."dep_time") as "dep_month",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Rule 2: Largest first

If there is a measure involved, Malloy sorts larger values first.

In the following example, Rule 1 doesn't apply, so the default behavior is to sort by first aggregate, flight_count with the largest values first.

run: flights -> {
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

[
  {
    "carrier": "WN",
    "flight_count": 88751
  },
  {
    "carrier": "US",
    "flight_count": 37683
  },
  {
    "carrier": "AA",
    "flight_count": 34577
  },
  {
    "carrier": "NW",
    "flight_count": 33580
  },
  {
    "carrier": "UA",
    "flight_count": 32757
  }
]

SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Explicit Ordering

You can be explicit about result ordering by using the order_by clause.

In the following example, the results are ordered by carrier in reverse alphabetical order.

run: flights -> {
  order_by: carrier desc
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

[
  {
    "carrier": "WN",
    "flight_count": 88751
  },
  {
    "carrier": "US",
    "flight_count": 37683
  },
  {
    "carrier": "UA",
    "flight_count": 32757
  },
  {
    "carrier": "TZ",
    "flight_count": 3033
  },
  {
    "carrier": "RU",
    "flight_count": 16074
  }
]

SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Like in SQL, Malloy's order_by always defaults to ascending order when desc is omitted. This is true for any column of any type. In the example below, the results are ordered by carrier in alphabetical order.

run: flights -> {
  order_by: carrier
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

[
  {
    "carrier": "AA",
    "flight_count": 34577
  },
  {
    "carrier": "AS",
    "flight_count": 8453
  },
  {
    "carrier": "B6",
    "flight_count": 4842
  },
  {
    "carrier": "CO",
    "flight_count": 7139
  },
  {
    "carrier": "DL",
    "flight_count": 32130
  }
]

SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 ASC NULLS LAST

Malloy also supports lists of items (using commas or line breaks) in the order_by: clause. Malloy will sort by the first entry first, then by the second within the first, etc.

You cannot have multiple order_by: clauses within a query. When an order_by: is applied in a refinement, it will fully override whatever the previous ordering was.

// This query will only be ordered by total_distance and carrier due to the refinement.
run: flights -> {
  group_by: flight_num, carrier
  order_by:
    flight_num desc
    total_distance
    flight_count desc
  aggregate: flight_count is count()
  aggregate: total_distance
} + {
    order_by: total_distance, carrier
}

QUERY RESULTS

[
  {
    "flight_num": "3755",
    "carrier": "RU",
    "flight_count": 1,
    "total_distance": 28
  },
  {
    "flight_num": "3756",
    "carrier": "RU",
    "flight_count": 1,
    "total_distance": 28
  },
  {
    "flight_num": "3302",
    "carrier": "RU",
    "flight_count": 1,
    "total_distance": 35
  },
  {
    "flight_num": "1739",
    "carrier": "NW",
    "flight_count": 1,
    "total_distance": 56
  },
  {
    "flight_num": "1098",
    "carrier": "NW",
    "flight_count": 1,
    "total_distance": 56
  }
]

SELECT 
   base."flight_num" as "flight_num",
   base."carrier" as "carrier",
   COUNT(1) as "flight_count",
   COALESCE(SUM(base."distance"),0) as "total_distance"
FROM '../data/flights.parquet' as base
GROUP BY 1,2
ORDER BY 4 ASC NULLS LAST,2 ASC NULLS LAST

Limiting

In Malloy, you can limit the number of results returned using limit: integer literal or top: integer literal. Both are provided for readability.

In the example below, the results are limited to 2 rows, which are sorted by dep_month with newest results first (due to Rule 1).

run: flights -> {
  limit: 2
  group_by: dep_month is dep_time.month
  aggregate: flight_count is count()
}

QUERY RESULTS

[
  {
    "dep_month": "2005-12-01T00:00:00.000Z",
    "flight_count": 5770
  },
  {
    "dep_month": "2005-11-01T00:00:00.000Z",
    "flight_count": 5533
  }
]

SELECT 
   DATE_TRUNC('month', base."dep_time") as "dep_month",
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2