Malloy Documentation
search

Fields constitute all kinds of data in Malloy. They can represent dimensional attributes sourced directly from tables in a database, constant values to be used in later analysis, computed metrics derived from other fields, or even nested structures created from aggregating subqueries.

Defining Fields

Fields defined in sources are reusable. A field is a dimension, measure or view, or calculation. When these are used in a query, these fields are invoked with select:, group_by:, aggregate:, nest:, or calculate:. Their definitions are syntactically identical whether defined in a source or a view (with the exception of calculations, which can only be defined in a view, and not in a source). In either case, they are defined using the is keyword.

In a source

document
source: users is duckdb.table('../data/users.parquet') extend {
  dimension: age_in_dog_years is age * 7
}

In a query

document
run: users -> {
  group_by: age_in_dog_years is age * 7
}
QUERY RESULTS
[
  {
    "age_in_dog_years": 84
  },
  {
    "age_in_dog_years": 98
  },
  {
    "age_in_dog_years": 112
  },
  {
    "age_in_dog_years": 126
  },
  {
    "age_in_dog_years": 133
  }
]
SELECT 
   base."age"*7 as "age_in_dog_years"
FROM '../data/users.parquet' as base
GROUP BY 1
ORDER BY 1 asc NULLS LAST

The right hand side of this kind of definition can be any field expression. See the Expressions section for more information.

Like dimensions and measures, views can also be defined as part of a source or in a query's view. When a view is used or defined in another view, it is known as a "nested view" and produces an "aggregating subquery." See the Nesting section for a detailed discussion of nested views.

document
run: flights -> {
  group_by: carrier
  nest: by_month is {
    group_by: departure_month is dep_time.month
    aggregate: flight_count is count()
    limit: 3
  }
}
QUERY RESULTS
[
  {
    "carrier": "AA",
    "by_month": [
      {
        "departure_month": "2005-12-01T00:00:00.000Z",
        "flight_count": 428
      },
      {
        "departure_month": "2005-11-01T00:00:00.000Z",
        "flight_count": 401
      },
      {
        "departure_month": "2005-10-01T00:00:00.000Z",
        "flight_count": 406
      }
    ]
  },
  {
    "carrier": "AS",
    "by_month": [
      {
        "departure_month": "2005-12-01T00:00:00.000Z",
        "flight_count": 104
      },
      {
        "departure_month": "2005-11-01T00:00:00.000Z",
        "flight_count": 94
      },
      {
        "departure_month": "2005-10-01T00:00:00.000Z",
        "flight_count": 97
      }
    ]
  },
  {
    "carrier": "B6",
    "by_month": [
      {
        "departure_month": "2005-12-01T00:00:00.000Z",
        "flight_count": 246
      },
      {
        "departure_month": "2005-11-01T00:00:00.000Z",
        "flight_count": 222
      },
      {
        "departure_month": "2005-10-01T00:00:00.000Z",
        "flight_count": 255
      }
    ]
  },
  {
    "carrier": "CO",
    "by_month": [
      {
        "departure_month": "2005-12-01T00:00:00.000Z",
        "flight_count": 93
      },
      {
        "departure_month": "2005-11-01T00:00:00.000Z",
        "flight_count": 82
      },
      {
        "departure_month": "2005-10-01T00:00:00.000Z",
        "flight_count": 102
      }
    ]
  },
  {
    "carrier": "DL",
    "by_month": [
      {
        "departure_month": "2005-12-01T00:00:00.000Z",
        "flight_count": 139
      },
      {
        "departure_month": "2005-11-01T00:00:00.000Z",
        "flight_count": 185
      },
      {
        "departure_month": "2005-10-01T00:00:00.000Z",
        "flight_count": 352
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    base."carrier" as "carrier__0",
    CASE WHEN group_set=1 THEN
      DATE_TRUNC('month', base."dep_time")
      END as "departure_month__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1"
  FROM '../data/flights.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,3
)
SELECT
  "carrier__0" as "carrier",
  COALESCE(LIST({
    "departure_month": "departure_month__1", 
    "flight_count": "flight_count__1"}  ORDER BY  "departure_month__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "by_month"
FROM __stage0
GROUP BY 1
ORDER BY 1 asc NULLS LAST

Field Names

Field names generally must start with a letter or underscore, and can only contain letters, numbers, and underscores. Field names which don't follow these rules, or which conflict with a Malloy keyword, must be enclosed in back ticks, e.g. `year` is dep_time.year.

Kinds of Fields

Malloy includes three different kinds of fields: dimensions, measures, views, and calculations.

Dimensions

Dimensions are fields representing scalar values. All fields inherited directly from a table are dimensions.

Dimensions are defined using expressions that contain no aggregate functions.

document
source: users2 is duckdb.table('../data/users.parquet') extend {
  dimension: full_name is concat(first_name, ' ', last_name)
}

Dimensions may be used in both reductions and projections.

document
// Show the top 10 full names by number of occurrences
run: users2 -> {
  limit: 10
  group_by: full_name
  aggregate: occurrences is count()
}

// Show 10 users' full names
run: users2 -> {
  select: full_name
  limit: 10
}
QUERY RESULTS
[
  {
    "full_name": "FRANK HUGHES"
  },
  {
    "full_name": "HAROLD GATEWOOD"
  },
  {
    "full_name": "SCOTT JOACHIM"
  },
  {
    "full_name": "JAMES ALLEN"
  },
  {
    "full_name": "JEREMY CASAS"
  }
]
SELECT 
   CONCAT(base."first_name",' ',base."last_name") as "full_name"
FROM '../data/users.parquet' as base
LIMIT 10

Measures

Measures are fields representing aggregated data over multiple records.

Measures may not be used in projections (select: views). However, any measures that appear in a reduction are "dimensionalized" as part of the query, and are therefore usable as dimensions in subsequent stages.

document
run: flights -> {
  group_by: carrier
  aggregate: flight_count is count()
} -> {
  select: flight_count
}
QUERY RESULTS
[
  {
    "flight_count": 16074
  },
  {
    "flight_count": 4420
  },
  {
    "flight_count": 7139
  },
  {
    "flight_count": 32130
  },
  {
    "flight_count": 88751
  }
]
WITH __stage0 AS (
  SELECT 
     base."carrier" as "carrier",
     COUNT(1) as "flight_count"
  FROM '../data/flights.parquet' as base
  GROUP BY 1
)
SELECT 
   base."flight_count" as "flight_count"
FROM __stage0 as base

Views

A view represents a pipelined data transformation of one or more stages.

source: flights is duckdb.table('../data/flights.parquet') extend {
  view: by_carrier is {
    group_by: carrier
    aggregate: flight_count is count()
  }
}

A view can always begin with another view from the same source.

source: flights is duckdb.table('../data/flights.parquet') extend {
  ...
  view: top_carriers is by_carrier -> {
    select: carrier
    limit: 5
  }
}

See the Nesting section for more details about nested views.

Calculations (Window Functions)

Calculations are fields based off of groupings and aggregate values in a view, and therefore can only be created in a view with calculate: and can not be predefined in a source. See the Calculations and Window Functions section for details.

document
run: flights -> {
  group_by: carrier
  aggregate: flight_count
  calculate: flight_count_rank is rank()
}
QUERY RESULTS
[
  {
    "carrier": "WN",
    "flight_count": 88751,
    "flight_count_rank": 1
  },
  {
    "carrier": "US",
    "flight_count": 37683,
    "flight_count_rank": 2
  },
  {
    "carrier": "AA",
    "flight_count": 34577,
    "flight_count_rank": 3
  },
  {
    "carrier": "NW",
    "flight_count": 33580,
    "flight_count_rank": 4
  },
  {
    "carrier": "UA",
    "flight_count": 32757,
    "flight_count_rank": 5
  }
]
SELECT 
   base."carrier" as "carrier",
   COUNT(1) as "flight_count",
   RANK() OVER(  ORDER BY  COUNT(1) desc NULLS LAST ) as "flight_count_rank"
FROM '../data/flights.parquet' as base
GROUP BY 1
ORDER BY 2 desc NULLS LAST