Malloy Documentation
search

This guide expands on information shared in Malloy for SQL Folks.

CTE and Refactoring for Reusability.

Malloy has the notion of querys and sources. It can be a little confusing when to use one or the others.

A query is a transformation that returns a table. If you are doing a transfomation of data to be joined in another query or source, make a query.

A source is datasource, like a table with added declarations that can be used against the data it contains.

A source can also be joined with sources and queries to be a network of data that you can query.

Example SQL

This SQL block should be used as a reference when reading the section below it.

-- conneciton: duckdb
WITH orig_facts as (    -- (1) CTE computes a rollup of flights 
    SELECT 
      origin as code,
      COUNT(*) as total_flights
    FROM '../data/flights.parquet' as flights
    GROUP BY 1
)
SELECT
  orig.code, 
  orig.city,
  orig_facts.total_flights as total_flights_to_origin,
  carriers.nickname,
  count(*) as flight_count,
  count(*)/(orig_facts.total_flights*1.0) as percent_of_flights_to_origin   -- (2) division is integer by default
FROM '../data/flights.parquet' as flights
LEFT JOIN '../data/airports.parquet' as orig ON flights.origin = orig.code
LEFT JOIN orig_facts ON flights.origin = orig_facts.code
LEFT JOIN '../data/carriers.parquet' as carriers ON flights.carrier = carriers.code
GROUP BY 1,2,3,4
ORDER BY 5 DESC
LIMIT 10

Translate the structure of the query.

  1. For each table in the query, make a source: and add relevent calculations. Aggregate calculations are declared as measure:s. Scalar calculations are declared as dimension:s. These calculations can be used in queries, but simply naming them. We make sources for the flights, airports and carriers tables.

  2. For each CTE, declare a query:. In SQL, each CTE stage produces a table that is used in subsequent stages in the query. It works the same in malloy. CTEs are declared and named in a query: block. They usually use one of the previously defined source:s as a base. These queries are usually result in fact tables as they show up at some level of granularity. We declare origin_facts.

  3. Finally, declare the source: that you are going base your complex queries o flights_explore. flights_explore inherits from flights and adds joins to the above tables.

document
source: flights is duckdb.table('../data/flights.parquet') extend { -- (1)
  measure: 
    flight_count is count()
    total_distance is distance.sum()        -- add some measures
    avg_distance is distance.avg()
  rename: 
    origin_code is origin
    destination_code is destination
} 
source: airports is duckdb.table('../data/airports.parquet') extend {
   dimension:
      name is concat(code, '-', city)
   measure: 
    airport_count is count()
    avg_elevation is elevation.avg()
}
source: carriers is  duckdb.table('../data/carriers.parquet') extend {
  measure: carrier_count is count()
}
query: origin_facts is flights -> { -- (2) CTEs become fact queries
  group_by: code is origin_code
  aggregate: total_flights is flight_count
}

source: flights_explore is flights extend {  -- (3), (4)
  join_one: carriers on carrier = carriers.code
  join_one: origin is airports on origin_code = origin.code
  join_one: origin_facts on origin_code = origin_facts.code
}
  1. For each table in the query, make a source:. If there are aggregate calculations that are relevent to the source declare them measures or dimensions.

  2. For each CTE, declare a query:. Usually these are 'fact' tables of some kind.

  3. Declare a new source that extends the root table in the query, that joins eveything together into a network.

  4. extend: lets you inhert from an exising source.

Writing the query becomes easy.

document
run: flights_explore -> {
  group_by:
    origin.code
    origin.city
    total_flights_to_origin is  origin_facts.total_flights
    carriers.nickname
  aggregate:
    flight_count
     percent_of_flights_to_origin is flight_count/origin_facts.total_flights
  limit: 10
}
QUERY RESULTS
[
  {
    "code": "DFW",
    "city": "DALLAS-FORT WORTH",
    "total_flights_to_origin": 17782,
    "nickname": "American",
    "flight_count": 8742,
    "percent_of_flights_to_origin": 0.49162074007423234
  },
  {
    "code": "MSP",
    "city": "MINNEAPOLIS",
    "total_flights_to_origin": 9762,
    "nickname": "Northwest",
    "flight_count": 8662,
    "percent_of_flights_to_origin": 0.887318172505634
  },
  {
    "code": "ATL",
    "city": "ATLANTA",
    "total_flights_to_origin": 17875,
    "nickname": "Delta",
    "flight_count": 8419,
    "percent_of_flights_to_origin": 0.470993006993007
  },
  {
    "code": "ATL",
    "city": "ATLANTA",
    "total_flights_to_origin": 17875,
    "nickname": "Atlantic Southeast",
    "flight_count": 7392,
    "percent_of_flights_to_origin": 0.4135384615384615
  },
  {
    "code": "ORD",
    "city": "CHICAGO",
    "total_flights_to_origin": 14214,
    "nickname": "United",
    "flight_count": 6802,
    "percent_of_flights_to_origin": 0.4785422822569298
  }
]
WITH __stage0 AS (
  SELECT 
     base."origin" as "code",
     (COUNT(1)) as "total_flights"
  FROM '../data/flights.parquet' as base
  GROUP BY 1
)
SELECT 
   origin_0."code" as "code",
   origin_0."city" as "city",
   origin_facts_0."total_flights" as "total_flights_to_origin",
   carriers_0."nickname" as "nickname",
   COUNT(1) as "flight_count",
   (COUNT(1))*1.0/origin_facts_0."total_flights" as "percent_of_flights_to_origin"
FROM '../data/flights.parquet' as base
 LEFT JOIN __stage0 AS origin_facts_0
  ON base."origin"=origin_facts_0."code"
 LEFT JOIN '../data/airports.parquet' AS origin_0
  ON base."origin"=origin_0."code"
 LEFT JOIN '../data/carriers.parquet' AS carriers_0
  ON base."carrier"=carriers_0."code"
GROUP BY 1,2,3,4
ORDER BY 5 desc NULLS LAST
LIMIT 10

This semantic model can be used to generate lots of interesting queries.

See Querying a semantic model

document
run: flights_explore -> {
  group_by: origin.name
  aggregate: flight_count, carriers.carrier_count
  limit: 10
  # list_detail                       -- (1)
  nest: carriers is {                 -- (2)
    group_by: carriers.nickname 
    aggregate: flight_count
  }
}
QUERY RESULTS
[
  {
    "name": "ATL-ATLANTA",
    "flight_count": 17875,
    "carrier_count": 12,
    "carriers": [
      {
        "nickname": "Delta",
        "flight_count": 8419
      },
      {
        "nickname": "Atlantic Southeast",
        "flight_count": 7392
      },
      {
        "nickname": "USAir",
        "flight_count": 521
      },
      {
        "nickname": "American",
        "flight_count": 338
      },
      {
        "nickname": "Northwest",
        "flight_count": 310
      },
      {
        "nickname": "United",
        "flight_count": 258
      },
      {
        "nickname": "Continental",
        "flight_count": 248
      },
      {
        "nickname": "Comair",
        "flight_count": 239
      },
      {
        "nickname": "Continental Express",
        "flight_count": 81
      },
      {
        "nickname": "American Eagle",
        "flight_count": 46
      },
      {
        "nickname": "Jetblue",
        "flight_count": 16
      },
      {
        "nickname": "America West",
        "flight_count": 7
      }
    ]
  },
  {
    "name": "DFW-DALLAS-FORT WORTH",
    "flight_count": 17782,
    "carrier_count": 13,
    "carriers": [
      {
        "nickname": "American",
        "flight_count": 8742
      },
      {
        "nickname": "American Eagle",
        "flight_count": 6146
      },
      {
        "nickname": "Delta",
        "flight_count": 668
      },
      {
        "nickname": "USAir",
        "flight_count": 657
      },
      {
        "nickname": "Atlantic Southeast",
        "flight_count": 395
      },
      {
        "nickname": "United",
        "flight_count": 303
      },
      {
        "nickname": "Northwest",
        "flight_count": 282
      },
      {
        "nickname": "America West",
        "flight_count": 275
      },
      {
        "nickname": "ATA",
        "flight_count": 132
      },
      {
        "nickname": "Continental",
        "flight_count": 111
      },
      {
        "nickname": "Continental Express",
        "flight_count": 67
      },
      {
        "nickname": "Comair",
        "flight_count": 3
      },
      {
        "nickname": "Alaska",
        "flight_count": 1
      }
    ]
  },
  {
    "name": "ORD-CHICAGO",
    "flight_count": 14214,
    "carrier_count": 10,
    "carriers": [
      {
        "nickname": "United",
        "flight_count": 6802
      },
      {
        "nickname": "American",
        "flight_count": 5143
      },
      {
        "nickname": "American Eagle",
        "flight_count": 985
      },
      {
        "nickname": "USAir",
        "flight_count": 502
      },
      {
        "nickname": "Northwest",
        "flight_count": 297
      },
      {
        "nickname": "Delta",
        "flight_count": 218
      },
      {
        "nickname": "Continental",
        "flight_count": 141
      },
      {
        "nickname": "Continental Express",
        "flight_count": 90
      },
      {
        "nickname": "America West",
        "flight_count": 19
      },
      {
        "nickname": "Comair",
        "flight_count": 17
      }
    ]
  },
  {
    "name": "PHX-PHOENIX",
    "flight_count": 12476,
    "carrier_count": 12,
    "carriers": [
      {
        "nickname": "Southwest",
        "flight_count": 6456
      },
      {
        "nickname": "America West",
        "flight_count": 3696
      },
      {
        "nickname": "United",
        "flight_count": 662
      },
      {
        "nickname": "Northwest",
        "flight_count": 480
      },
      {
        "nickname": "American",
        "flight_count": 355
      },
      {
        "nickname": "Alaska",
        "flight_count": 346
      },
      {
        "nickname": "USAir",
        "flight_count": 316
      },
      {
        "nickname": "Delta",
        "flight_count": 90
      },
      {
        "nickname": "ATA",
        "flight_count": 42
      },
      {
        "nickname": "Continental",
        "flight_count": 17
      },
      {
        "nickname": "Continental Express",
        "flight_count": 9
      },
      {
        "nickname": "Jetblue",
        "flight_count": 7
      }
    ]
  },
  {
    "name": "LAS-LAS VEGAS",
    "flight_count": 11096,
    "carrier_count": 10,
    "carriers": [
      {
        "nickname": "Southwest",
        "flight_count": 6292
      },
      {
        "nickname": "United",
        "flight_count": 1340
      },
      {
        "nickname": "America West",
        "flight_count": 1221
      },
      {
        "nickname": "Northwest",
        "flight_count": 633
      },
      {
        "nickname": "Alaska",
        "flight_count": 418
      },
      {
        "nickname": "American",
        "flight_count": 391
      },
      {
        "nickname": "Delta",
        "flight_count": 376
      },
      {
        "nickname": "USAir",
        "flight_count": 223
      },
      {
        "nickname": "Jetblue",
        "flight_count": 146
      },
      {
        "nickname": "ATA",
        "flight_count": 56
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CONCAT(origin_0."code",'-',origin_0."city") as "name__0",
    CASE WHEN group_set=0 THEN
      COUNT(1)
      END as "flight_count__0",
    CASE WHEN group_set=0 THEN
      COUNT(DISTINCT carriers_0."__distinct_key")
      END as "carrier_count__0",
    CASE WHEN group_set=1 THEN
      carriers_0."nickname"
      END as "nickname__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/airports.parquet' AS origin_0
    ON base."origin"=origin_0."code"
   LEFT JOIN (SELECT GEN_RANDOM_UUID() as "__distinct_key", x.*  FROM '../data/carriers.parquet' as x) AS carriers_0
    ON base."carrier"=carriers_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,5
)
SELECT
  "name__0" as "name",
  MAX(CASE WHEN group_set=0 THEN "flight_count__0" END) as "flight_count",
  MAX(CASE WHEN group_set=0 THEN "carrier_count__0" END) as "carrier_count",
  COALESCE(LIST({
    "nickname": "nickname__1", 
    "flight_count": "flight_count__1"}  ORDER BY  "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "carriers"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 10

Another example query

document
run: flights_explore ->  {
  group_by: carrier is carriers.nickname
  aggregate: flight_count
  # list_detail
  nest: by_origin is {
    group_by: origin.name 
    aggregate: flight_count 
    limit: 10
  }
  # line_chart
  nest: by_month is {
    group_by: dep_year is dep_time.year
    aggregate: flight_count
  }
}
QUERY RESULTS
[
  {
    "carrier": "Southwest",
    "flight_count": 88751,
    "by_origin": [
      {
        "name": "PHX-PHOENIX",
        "flight_count": 6456
      },
      {
        "name": "LAS-LAS VEGAS",
        "flight_count": 6292
      },
      {
        "name": "BWI-BALTIMORE",
        "flight_count": 5471
      },
      {
        "name": "MDW-CHICAGO",
        "flight_count": 4702
      },
      {
        "name": "LAX-LOS ANGELES",
        "flight_count": 4282
      },
      {
        "name": "OAK-OAKLAND",
        "flight_count": 3964
      },
      {
        "name": "HOU-HOUSTON",
        "flight_count": 3219
      },
      {
        "name": "BNA-NASHVILLE",
        "flight_count": 3170
      },
      {
        "name": "SAN-SAN DIEGO",
        "flight_count": 2865
      },
      {
        "name": "MCI-KANSAS CITY",
        "flight_count": 2583
      }
    ],
    "by_month": [
      {
        "dep_year": "2005-01-01T00:00:00.000Z",
        "flight_count": 17549
      },
      {
        "dep_year": "2004-01-01T00:00:00.000Z",
        "flight_count": 14640
      },
      {
        "dep_year": "2003-01-01T00:00:00.000Z",
        "flight_count": 14300
      },
      {
        "dep_year": "2002-01-01T00:00:00.000Z",
        "flight_count": 14708
      },
      {
        "dep_year": "2001-01-01T00:00:00.000Z",
        "flight_count": 14421
      },
      {
        "dep_year": "2000-01-01T00:00:00.000Z",
        "flight_count": 13133
      }
    ]
  },
  {
    "carrier": "USAir",
    "flight_count": 37683,
    "by_origin": [
      {
        "name": "CLT-CHARLOTTE",
        "flight_count": 6427
      },
      {
        "name": "PHL-PHILADELPHIA",
        "flight_count": 5779
      },
      {
        "name": "PIT-PITTSBURGH",
        "flight_count": 4137
      },
      {
        "name": "DCA-WASHINGTON",
        "flight_count": 2900
      },
      {
        "name": "LGA-NEW YORK",
        "flight_count": 1412
      },
      {
        "name": "BOS-BOSTON",
        "flight_count": 1227
      },
      {
        "name": "DFW-DALLAS-FORT WORTH",
        "flight_count": 657
      },
      {
        "name": "RDU-RALEIGH/DURHAM",
        "flight_count": 632
      },
      {
        "name": "IAH-HOUSTON",
        "flight_count": 600
      },
      {
        "name": "ATL-ATLANTA",
        "flight_count": 521
      }
    ],
    "by_month": [
      {
        "dep_year": "2005-01-01T00:00:00.000Z",
        "flight_count": 7446
      },
      {
        "dep_year": "2004-01-01T00:00:00.000Z",
        "flight_count": 5388
      },
      {
        "dep_year": "2003-01-01T00:00:00.000Z",
        "flight_count": 4321
      },
      {
        "dep_year": "2002-01-01T00:00:00.000Z",
        "flight_count": 6501
      },
      {
        "dep_year": "2001-01-01T00:00:00.000Z",
        "flight_count": 6866
      },
      {
        "dep_year": "2000-01-01T00:00:00.000Z",
        "flight_count": 7161
      }
    ]
  },
  {
    "carrier": "American",
    "flight_count": 34577,
    "by_origin": [
      {
        "name": "DFW-DALLAS-FORT WORTH",
        "flight_count": 8742
      },
      {
        "name": "ORD-CHICAGO",
        "flight_count": 5143
      },
      {
        "name": "LAX-LOS ANGELES",
        "flight_count": 1951
      },
      {
        "name": "LGA-NEW YORK",
        "flight_count": 1070
      },
      {
        "name": "MIA-MIAMI",
        "flight_count": 1063
      },
      {
        "name": "JFK-NEW YORK",
        "flight_count": 943
      },
      {
        "name": "STL-ST LOUIS",
        "flight_count": 712
      },
      {
        "name": "SFO-SAN FRANCISCO",
        "flight_count": 688
      },
      {
        "name": "AUS-AUSTIN",
        "flight_count": 671
      },
      {
        "name": "BOS-BOSTON",
        "flight_count": 605
      }
    ],
    "by_month": [
      {
        "dep_year": "2005-01-01T00:00:00.000Z",
        "flight_count": 5543
      },
      {
        "dep_year": "2004-01-01T00:00:00.000Z",
        "flight_count": 5604
      },
      {
        "dep_year": "2003-01-01T00:00:00.000Z",
        "flight_count": 5566
      },
      {
        "dep_year": "2002-01-01T00:00:00.000Z",
        "flight_count": 5963
      },
      {
        "dep_year": "2001-01-01T00:00:00.000Z",
        "flight_count": 5851
      },
      {
        "dep_year": "2000-01-01T00:00:00.000Z",
        "flight_count": 6050
      }
    ]
  },
  {
    "carrier": "Northwest",
    "flight_count": 33580,
    "by_origin": [
      {
        "name": "MSP-MINNEAPOLIS",
        "flight_count": 8662
      },
      {
        "name": "DTW-DETROIT",
        "flight_count": 6604
      },
      {
        "name": "MEM-MEMPHIS",
        "flight_count": 1984
      },
      {
        "name": "LAX-LOS ANGELES",
        "flight_count": 993
      },
      {
        "name": "MCO-ORLANDO",
        "flight_count": 847
      },
      {
        "name": "SEA-SEATTLE",
        "flight_count": 751
      },
      {
        "name": "LGA-NEW YORK",
        "flight_count": 682
      },
      {
        "name": "BOS-BOSTON",
        "flight_count": 667
      },
      {
        "name": "LAS-LAS VEGAS",
        "flight_count": 633
      },
      {
        "name": "DCA-WASHINGTON",
        "flight_count": 601
      }
    ],
    "by_month": [
      {
        "dep_year": "2005-01-01T00:00:00.000Z",
        "flight_count": 5869
      },
      {
        "dep_year": "2004-01-01T00:00:00.000Z",
        "flight_count": 6306
      },
      {
        "dep_year": "2003-01-01T00:00:00.000Z",
        "flight_count": 5187
      },
      {
        "dep_year": "2002-01-01T00:00:00.000Z",
        "flight_count": 5081
      },
      {
        "dep_year": "2001-01-01T00:00:00.000Z",
        "flight_count": 5381
      },
      {
        "dep_year": "2000-01-01T00:00:00.000Z",
        "flight_count": 5756
      }
    ]
  },
  {
    "carrier": "United",
    "flight_count": 32757,
    "by_origin": [
      {
        "name": "ORD-CHICAGO",
        "flight_count": 6802
      },
      {
        "name": "DEN-DENVER",
        "flight_count": 4997
      },
      {
        "name": "IAD-WASHINGTON",
        "flight_count": 2541
      },
      {
        "name": "SFO-SAN FRANCISCO",
        "flight_count": 2464
      },
      {
        "name": "LAX-LOS ANGELES",
        "flight_count": 2319
      },
      {
        "name": "LAS-LAS VEGAS",
        "flight_count": 1340
      },
      {
        "name": "MCO-ORLANDO",
        "flight_count": 852
      },
      {
        "name": "SEA-SEATTLE",
        "flight_count": 728
      },
      {
        "name": "PHX-PHOENIX",
        "flight_count": 662
      },
      {
        "name": "SAN-SAN DIEGO",
        "flight_count": 641
      }
    ],
    "by_month": [
      {
        "dep_year": "2005-01-01T00:00:00.000Z",
        "flight_count": 6876
      },
      {
        "dep_year": "2004-01-01T00:00:00.000Z",
        "flight_count": 6817
      },
      {
        "dep_year": "2003-01-01T00:00:00.000Z",
        "flight_count": 6068
      },
      {
        "dep_year": "2002-01-01T00:00:00.000Z",
        "flight_count": 5590
      },
      {
        "dep_year": "2001-01-01T00:00:00.000Z",
        "flight_count": 3785
      },
      {
        "dep_year": "2000-01-01T00:00:00.000Z",
        "flight_count": 3621
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    carriers_0."nickname" as "carrier__0",
    CASE WHEN group_set=0 THEN
      COUNT(1)
      END as "flight_count__0",
    CASE WHEN group_set=1 THEN
      CONCAT(origin_0."code",'-',origin_0."city")
      END as "name__1",
    CASE WHEN group_set=1 THEN
      COUNT(1)
      END as "flight_count__1",
    CASE WHEN group_set=2 THEN
      DATE_TRUNC('year', base."dep_time")
      END as "dep_year__2",
    CASE WHEN group_set=2 THEN
      COUNT(1)
      END as "flight_count__2"
  FROM '../data/flights.parquet' as base
   LEFT JOIN '../data/carriers.parquet' AS carriers_0
    ON base."carrier"=carriers_0."code"
   LEFT JOIN '../data/airports.parquet' AS origin_0
    ON base."origin"=origin_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,2,1)) as group_set  ) as group_set
  GROUP BY 1,2,4,6
)
SELECT
  "carrier__0" as "carrier",
  MAX(CASE WHEN group_set=0 THEN "flight_count__0" END) as "flight_count",
  COALESCE(LIST({
    "name": "name__1", 
    "flight_count": "flight_count__1"}  ORDER BY  "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:10],[]) as "by_origin",
  COALESCE(LIST({
    "dep_year": "dep_year__2", 
    "flight_count": "flight_count__2"}  ORDER BY  "dep_year__2" desc NULLS LAST) FILTER (WHERE group_set=2),[]) as "by_month"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST