This guide maps common dbt patterns to their Malloy equivalents, using the flights dataset.
Staging Models
In dbt, "staging models" are the first transformation layer—typically one model per source table, doing light cleanup like renaming columns or casting types. The rest of your project builds on these staged versions rather than referencing raw tables directly. In Malloy, there's no separate staging step. You define a source pointing at the raw table and add transformations inline. The source definition is your staging layer and source extensions allow you to build on the initial source.
dbt:
-- models/staging/stg_flights.sql select id2, origin, destination, carrier, distance, dep_time from {{ source('raw', 'flights') }}
Malloy:
For transformations at the source level, use extend:
source: flights2 is duckdb.table('../data/flights.parquet') extend { dimension: flight_year is year(dep_time) is_long_haul is distance > 1000 } run: flights2 -> { group_by: flight_year, is_long_haul aggregate: flight_count is count() }
[ { "flight_year": 2005, "is_long_haul": false, "flight_count": 54681 }, { "flight_year": 2004, "is_long_haul": false, "flight_count": 52213 }, { "flight_year": 2003, "is_long_haul": false, "flight_count": 43623 }, { "flight_year": 2001, "is_long_haul": false, "flight_count": 35928 }, { "flight_year": 2002, "is_long_haul": false, "flight_count": 35695 } ]
SELECT EXTRACT(year FROM base."dep_time") as "flight_year", base."distance">1000 as "is_long_haul", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1,2 ORDER BY 3 desc NULLS LAST
Type Casting
Cast columns to the correct types inline:
source: orders is duckdb.table('raw_orders.parquet') extend { dimension: order_id is id::number order_date is raw_date::date total_amount is amount::number }
Joins
dbt:
select f.*, c.nickname as carrier_name from {{ ref('stg_flights') }} f left join {{ ref('stg_carriers') }} c on f.carrier = c.code
Malloy:
source: carriers is duckdb.table('../data/carriers.parquet') extend { primary_key: code } source: flights3 is duckdb.table('../data/flights.parquet') extend { join_one: carriers with carrier } run: flights3 -> { group_by: carriers.nickname aggregate: flight_count is count() limit: 5 }
[ { "nickname": "Southwest", "flight_count": 88751 }, { "nickname": "USAir", "flight_count": 37683 }, { "nickname": "American", "flight_count": 34577 }, { "nickname": "Northwest", "flight_count": 33580 }, { "nickname": "United", "flight_count": 32757 } ]
SELECT carriers_0."nickname" as "nickname", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base LEFT JOIN '../data/carriers.parquet' AS carriers_0 ON carriers_0."code"=base."carrier" GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
join_one indicates a many-to-one relationship (one carrier per flight). Use join_many for one-to-many relationships.
Denormalizing Data
To flatten joined data into a wide table, use select: * with joined fields:
source: orders is duckdb.table('orders.parquet') extend { join_one: customers with customer_id join_one: products with product_id } run: orders -> { select: * customers.customer_name customers.region products.product_name products.category }
Reusable Transformations (Macros)
dbt:
-- macros/distance_category.sql {% macro distance_category(column_name) %} case when {{ column_name }} < 500 then 'short' when {{ column_name }} < 1500 then 'medium' else 'long' end {% endmacro %} -- models/flights.sql select {{ distance_category('distance') }} as distance_category
Malloy uses pick expressions:
source: flights4 is duckdb.table('../data/flights.parquet') extend { dimension: distance_category is pick 'short' when distance < 500 pick 'medium' when distance < 1500 else 'long' } run: flights4 -> { group_by: distance_category aggregate: flight_count is count() }
[ { "distance_category": "short", "flight_count": 154125 }, { "distance_category": "medium", "flight_count": 151531 }, { "distance_category": "long", "flight_count": 39171 } ]
SELECT CASE WHEN (base."distance"<500) THEN 'short' WHEN (base."distance"<1500) THEN 'medium' ELSE 'long' END as "distance_category", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Chained Transformations
dbt requires CTEs when columns reference other computed columns:
with with_hour as ( select *, extract(hour from dep_time) as dep_hour from flights ), with_period as ( select *, case when dep_hour < 6 then 'red-eye' when dep_hour < 12 then 'morning' when dep_hour < 18 then 'afternoon' else 'evening' end as time_of_day from with_hour ) select * from with_period
Malloy dimensions can reference earlier dimensions directly:
source: flights5 is duckdb.table('../data/flights.parquet') extend { dimension: dep_hour is hour(dep_time) time_of_day is pick 'red-eye' when dep_hour < 6 pick 'morning' when dep_hour < 12 pick 'afternoon' when dep_hour < 18 else 'evening' } run: flights5 -> { group_by: time_of_day aggregate: flight_count is count() }
[ { "time_of_day": "morning", "flight_count": 131249 }, { "time_of_day": "afternoon", "flight_count": 128362 }, { "time_of_day": "evening", "flight_count": 77275 }, { "time_of_day": "red-eye", "flight_count": 7941 } ]
SELECT CASE WHEN ((EXTRACT(hour FROM base."dep_time"))<6) THEN 'red-eye' WHEN ((EXTRACT(hour FROM base."dep_time"))<12) THEN 'morning' WHEN ((EXTRACT(hour FROM base."dep_time"))<18) THEN 'afternoon' ELSE 'evening' END as "time_of_day", COUNT(1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST
Aggregations
dbt:
select carrier, count(*) as flight_count, sum(distance) as total_distance, avg(distance) as avg_distance from {{ ref('flights') }} group by 1
Malloy:
source: flights6 is duckdb.table('../data/flights.parquet') extend { measure: flight_count is count() total_distance is sum(distance) avg_distance is avg(distance) } run: flights6 -> { group_by: carrier aggregate: flight_count, total_distance, avg_distance limit: 5 }
[ { "carrier": "WN", "flight_count": 88751, "total_distance": 54619152, "avg_distance": 615.4201304774031 }, { "carrier": "US", "flight_count": 37683, "total_distance": 23721642, "avg_distance": 629.5051349414855 }, { "carrier": "AA", "flight_count": 34577, "total_distance": 37684885, "avg_distance": 1089.8830147207682 }, { "carrier": "NW", "flight_count": 33580, "total_distance": 33376503, "avg_distance": 993.9399344848124 }, { "carrier": "UA", "flight_count": 32757, "total_distance": 38882934, "avg_distance": 1187.0114479347926 } ]
SELECT base."carrier" as "carrier", COUNT(1) as "flight_count", COALESCE(SUM(base."distance"),0) as "total_distance", AVG(base."distance") as "avg_distance" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
Measures are defined once and reused across queries.
Documentation
dbt:
models: - name: flights description: "Flight records" columns: - name: distance description: "Flight distance in miles"
Malloy uses tags for documentation:
source: flights is duckdb.table('flights.parquet') extend { # "Flight distance in miles" dimension: distance }
Imports and Refs
dbt:
from {{ ref('stg_flights') }}Malloy:
import "staging/flights.malloy" source: flight_metrics is flights extend { measure: on_time_rate is avg(pick 1 when dep_delay <= 0 else 0) }
Raw SQL
For database-specific features not covered by Malloy:
source: flight_sample is duckdb.sql(""" SELECT origin, destination, distance, dep_time FROM '../data/flights.parquet' WHERE distance > 2000 LIMIT 1000 """) extend { measure: flight_count is count() } run: flight_sample -> { group_by: origin aggregate: flight_count limit: 5 }
[ { "origin": "SFO", "flight_count": 159 }, { "origin": "SEA", "flight_count": 143 }, { "origin": "MSP", "flight_count": 137 }, { "origin": "LAX", "flight_count": 126 }, { "origin": "SLC", "flight_count": 79 } ]
SELECT base."origin" as "origin", COUNT(1) as "flight_count" FROM ( SELECT origin, destination, distance, dep_time FROM '../data/flights.parquet' WHERE distance > 2000 LIMIT 1000 ) as base GROUP BY 1 ORDER BY 2 desc NULLS LAST LIMIT 5
Materialization
To persist query results as tables, views, or files, use MalloySQL—which embeds Malloy queries in DDL:
-- connection:duckdb import "models/flights.malloy" CREATE TABLE daily_stats AS %{ flights -> { group_by: dep_time.day aggregate: flight_count, total_distance } }%
See Transform & Materialize for full details on MalloySQL syntax, exports, and views.
Running Queries
dbt:
dbt compile dbt run --select flights
Malloy:
malloy-cli run models/flights.malloy --query summary malloy-cli compile models/flights.malloy --query summary # SQL only
The VS Code extension runs queries directly with real-time error checking.
Next Steps
Transform & Materialize - CLI setup and MalloySQL patterns
Quick Start: Modeling - Build your first semantic model