Malloy Documentation
search

Filtering which data is used in a query is an incredibly important aspect of data analysis. Malloy makes it easy to target specific parts of a query to apply individual filters.

Filter Syntax

Regardless of the placement of a filter, the syntax looks the same.

{ where: filter_one, filter_two }

Each filter can be any expression of type boolean, whether that's a boolean field is_commercial_flight, a comparison distance > 1000, or any of the other kinds of boolean expressions that Malloy supports. For examples see the table below, or for detailed information on the kinds of expressions Malloy supports, see the Expressions section.

The comma-separated list of filters are anded together, so where: x, y is equivalent to where: x and y.

Filter Placement

A filter can be applied to the source of a query, to just one stage of a query, or even to a particular field or expression (measure or nested view).

Filtering a Query's Source

When filtering a query's source, the filter applies to the whole query.

document
run: flights extend { where: distance > 1000 } -> { aggregate: flight_count } 
QUERY RESULTS
[
  {
    "flight_count": 90002
  }
]
SELECT 
   COUNT(1) as "flight_count"
FROM '../data/flights.parquet' as base
WHERE base."distance">1000

Filtering in a Query Stage

A filter can also be applied to an individual query stage.

document
run: flights -> {
  group_by: carrier
  aggregate: flight_count
} -> {
  where: carrier ? 'UA' | 'AA'
  select: carrier, flight_count
}
QUERY RESULTS
[
  {
    "carrier": "UA",
    "flight_count": 32757
  },
  {
    "carrier": "AA",
    "flight_count": 34577
  }
]
WITH __stage0 AS (
  SELECT 
     base."carrier" as "carrier",
     COUNT(1) as "flight_count"
  FROM '../data/flights.parquet' as base
  GROUP BY 1
)
SELECT 
   base."carrier" as "carrier",
   base."flight_count" as "flight_count"
FROM __stage0 as base
WHERE base."carrier" IN ('UA','AA')

Filtering Aggregate Calculations

Any measure can be filtered by adding a where clause.

document
run: flights -> {
  aggregate:
    ca_flights is flight_count { where: origin.state = 'CA' }
    ny_flights is count() { where: origin.state = 'NY' }
    avg_km_from_ca is avg(distance / 0.621371) { where: origin.state = 'CA' }
}
QUERY RESULTS