Troubleshooting

VSCode

Jump to the section that matches your symptom:

Symptom	Problem
Row count multiplied after join	Non-Unique Join Key
Aggregates inflated 10x	Grain Mismatch
Query takes minutes	Slow Query
Reference to undefined object	Circular References
Date filter syntax questions	Date Arithmetic
Can't reference joined field in join	Multi-hop Joins
"Field 'x' not found in source 'y'"	Common Errors

Problem 1: Non-Unique Join Key

Symptoms

Query returns 500,000 rows when you expect 50,000
Aggregates are 10x larger than expected
Single row multiplies into many after join

Diagnosis

Test join cardinality:

source: airports_p1 is duckdb.table('../data/airports.parquet') extend {
  primary_key: code
}

source: flights_p1 is duckdb.table('../data/flights.parquet') extend {
  join_one: airports_p1 with origin
}

run: flights_p1 -> {
  where: id2 = 30272525  // ONE row
  aggregate:
    source_count is count()
    joined_count is airports_p1.count()
// If source_count=1 and joined_count=1, your join is correct
}

QUERY RESULTS

[
  {
    "source_count": 1,
    "joined_count": 1
  }
]

SELECT 
   COUNT(1) as "source_count",
   COUNT(DISTINCT airports_p1_0."code") as "joined_count"
FROM '../data/flights.parquet' as base
 LEFT JOIN '../data/airports.parquet' AS airports_p1_0
  ON airports_p1_0."code"=base."origin"
WHERE base."id2"=30272525

If joined_count > source_count for a join_one, your join key isn't unique.

The Problem

Your join key doesn't uniquely identify rows in the target table. Consider a flight segments table where each flight can have multiple segments (legs):

// WRONG - flight_segments keyed by (flight_id + segment_number)
join_one: flight_segments on id2 = flight_segments.flight_id
// One flight matches multiple segment rows → fan-out

// CORRECT - include all keys that form the unique identifier
join_one: flight_segments on id2 = flight_segments.flight_id
                          and segment_num = flight_segments.segment_number

Fix

Identify the primary key of the table you're joining to
Include ALL columns that form that key in your join condition
Re-test cardinality using the diagnostic query above

Problem 2: Grain Mismatch

Symptoms

Joining a daily/monthly summary table to raw transaction data
Every row gets the same aggregated value
Totals are wildly inflated

The Problem

You have two tables at different granularity:

Raw table: One row per event (e.g., every flight, every order)
Aggregated table: One row per time period (e.g., daily totals, monthly stats)

When you join them, every raw row matches the same summary row. Here, we join carrier-level stats to raw flights, then group by origin—each AA flight adds AA's entire total, each UA flight adds UA's total, etc:

source: carrier_stats is duckdb.sql("""
  SELECT carrier, COUNT(*) as carrier_total
  FROM '../data/flights.parquet'
  GROUP BY carrier
""")

source: flights_p2 is duckdb.table('../data/flights.parquet') extend {
  join_one: carrier_stats on carrier = carrier_stats.carrier
}

// WRONG: grouping by origin but summing carrier totals
run: flights_p2 -> {
  group_by: origin
  aggregate:
    actual_flights is count()
    wrong_total is carrier_stats.carrier_total.sum()
  limit: 5
}

QUERY RESULTS

[
  {
    "origin": "ATL",
    "actual_flights": 17875,
    "wrong_total": 244590
  },
  {
    "origin": "DFW",
    "actual_flights": 17782,
    "wrong_total": 251234
  },
  {
    "origin": "ORD",
    "actual_flights": 14214,
    "wrong_total": 223979
  },
  {
    "origin": "PHX",
    "actual_flights": 12476,
    "wrong_total": 308769
  },
  {
    "origin": "LAS",
    "actual_flights": 11096,
    "wrong_total": 285556
  }
]

SELECT 
   base."origin" as "origin",
   COUNT(1) as "actual_flights",
   COALESCE((
        SELECT SUM(a.val) as value
        FROM (
          SELECT UNNEST(list(distinct {key:carrier_stats_0."__distinct_key", val: carrier_stats_0."carrier_total"})) a
        )
      ),0) as "wrong_total"
FROM '../data/flights.parquet' as base
 LEFT JOIN (SELECT GEN_RANDOM_UUID() as "__distinct_key", x.*  FROM (
  SELECT carrier, COUNT(*) as carrier_total
  FROM '../data/flights.parquet'
  GROUP BY carrier
) as x) AS carrier_stats_0
  ON base."carrier"=carrier_stats_0."carrier"
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

Fix

Don't join aggregated tables to raw tables. Query them independently—use the raw table for granular questions and compute aggregations directly from it.

Problem 3: Slow Query

Symptoms

Queries take minutes instead of seconds
Timeouts on large datasets

Fix

Option 0: Check your join cardinality

Join fan-out is a common cause of slow queries. If a join_one isn't actually one-to-one, your dataset explodes silently. See Non-Unique Join Key to diagnose.

Option 1: Add filters to reduce data scanned

run: flights -> {
  where: dep_time ? @2003  // Filter to one year
  aggregate: flight_count
}

Option 2: Materialize a view of the data

If the same aggregation is queried repeatedly—across dashboards, by multiple users, or by LLM agents—pre-aggregate your data using MalloySQL:

-- connection:duckdb
import "models/flights.malloy"

CREATE TABLE monthly_stats AS
%{
  flights -> {
    group_by: dep_time.month
    aggregate: flight_count, total_distance
  }
}%

Then model against the materialized table for faster queries.

See Transform & Materialize for details on materializing views with the Malloy CLI.

Problem 4: Circular References

Symptoms

Can't define two sources that each join to the other

The Problem

Malloy compiles linearly. You can't define orders joining to customers while customers also joins to orders—the second source doesn't exist yet when the first is compiled.

Fix

Define base sources first, then extend them with joins. For cyclical relationships, add a third pass:

// Base sources
source: _orders is duckdb.table('orders') extend { ... }
source: _customers is duckdb.table('customers') extend { ... }

// Modeled sources with one direction of joins
source: orders_modeled is _orders extend {
  join_one: customer is _customers on customer_id = customer.id
}

source: customers_modeled is _customers extend { ... }

// Final sources with cyclical joins added
source: customers is customers_modeled extend {
  join_many: orders is orders_modeled on id = orders.customer_id
}

Problem 5: Date Arithmetic

Symptoms

"How do I filter to the last 30 days?"
"How do I do current_date - interval '50 days'?"

Fix

Use Malloy's time literals and ranges:

run: events -> {
  // Last 30 days
  where: event_date > @now - 30 days

  // Specific range
  where: event_date ? @2024-01-01 to @2024-03-31

  // Last complete month
  where: event_date ? @now.month - 1 month

  aggregate: event_count is count()
}

See Timestamp Operations for more patterns.

Problem 6: Multi-hop Joins

Symptoms

Need to join A to C through intermediate table B
"Cannot reference joined_table.joined_field in join condition"

The Problem

You can't reference a joined table's fields in another join condition:

// This doesn't work
source: orders is duckdb.table('orders') extend {
  join_one: customers with customer_id
  // Can't use customers.region in this join condition
  join_one: shipments on shipments.region = customers.region
}

Fix

Option 1: Flatten the join path

Join directly if possible:

source: orders is duckdb.table('orders') extend {
  join_one: customers with customer_id
  join_one: shipments with shipment_id  // Direct join
}

Option 2: Denormalize in SQL source

source: orders_with_region is duckdb.sql("""
  SELECT o.*, c.region
  FROM orders o
  JOIN customers c ON o.customer_id = c.customer_id
""") extend {
  join_one: shipments on region = shipments.region
}

Note: Denormalization trades off the semantic relationship. If customers.region changes in the source table, the denormalized source won't reflect it until you rebuild the SQL query.

Option 3: Use separate sources

If you need different join paths for different questions, create purpose-specific sources rather than one uber-source.

Common Error Messages

Quick reference for error strings and their likely causes:

Error Message	Likely Cause	Solution
`Field 'x' not found in source 'y'`	Typo in field name, or missing import	Check spelling; verify the import includes the source
`Cannot use aggregate in dimension`	Using a measure where a dimension is expected	Define as `measure:` not `dimension:`, or move to an `aggregate:` block
`Reference to undefined object`	Circular references	See Circular References
`Source 'x' not found`	Missing import statement	Add `import "path/to/file.malloy"`
`Cannot join on expression`	Using a joined field in another join condition	See Multi-hop Joins
`Primary key required for join`	Missing `primary_key` declaration on joined source	Add `primary_key: field_name` to the target source
Syntax error on `year`, `date`, `timestamp`	Using a reserved word as field name	Use backticks: `year`, `date`, `timestamp`