Malloy has a special query operator that builds _Dimensional Search Indexes for sources. A Dimensional Search Index is a table with 4 columns. Dimensional indexes are useful for a variety of things including filtering suggestions and LLMs.
When filtering data, you might know a term, but not necessarily which column in the one of the join data contains it. Indexing the data on field names and high cardinality fields let's you qucikly find the term and the associated value.
Indexing could be used by LLMs to find the interesting column/term mapping in the data set.
fieldName - The path to the column in the source
fieldValue - The dimensional value for the field (or range if fieldType is not a string)
fieldType - Type type of the column.
weight - a weighting to use as to the importance the distinct dimensional value. Defaults to cardinality of the field.
Simple Example
We're going to take the airports table and index it. The results are an un ordered list of distinct fieldName/fieldValue pairs appear in the table. The weight, in this case is the number of rows that partciular occurs on.
run: duckdb.table('../data/airports.parquet') -> { index: * }
fieldName | fieldPath | fieldValue | fieldType | weight |
---|---|---|---|---|
act_date | act_date | ∅ | string | 12,040 |
act_date | act_date | 02/2000 | string | 54 |
act_date | act_date | 11/1986 | string | 119 |
act_date | act_date | 11/1992 | string | 92 |
act_date | act_date | 10/1999 | string | 41 |
act_date | act_date | 12/1992 | string | 86 |
act_date | act_date | 09/1994 | string | 20 |
act_date | act_date | 08/1990 | string | 36 |
act_date | act_date | 01/1990 | string | 15 |
act_date | act_date | 08/1981 | string | 40 |
act_date | act_date | 12/1986 | string | 56 |
act_date | act_date | 11/1987 | string | 25 |
act_date | act_date | 04/1996 | string | 3 |
act_date | act_date | 01/1994 | string | 36 |
act_date | act_date | 04/1999 | string | 45 |
act_date | act_date | 03/1992 | string | 17 |
act_date | act_date | 07/1995 | string | 15 |
act_date | act_date | 04/1982 | string | 92 |
act_date | act_date | 05/1991 | string | 31 |
act_date | act_date | 03/1998 | string | 62 |
act_date | act_date | 01/1987 | string | 31 |
act_date | act_date | 05/1993 | string | 8 |
act_date | act_date | 02/1987 | string | 129 |
act_date | act_date | 10/1987 | string | 62 |
act_date | act_date | 09/1999 | string | 31 |
act_date | act_date | 07/1982 | string | 80 |
act_date | act_date | 01/1998 | string | 28 |
act_date | act_date | 04/1990 | string | 16 |
act_date | act_date | 12/1983 | string | 77 |
act_date | act_date | 12/1997 | string | 31 |
act_date | act_date | 09/1990 | string | 11 |
act_date | act_date | 07/1983 | string | 20 |
act_date | act_date | 09/1991 | string | 14 |
act_date | act_date | 09/1995 | string | 15 |
act_date | act_date | 05/1988 | string | 20 |
act_date | act_date | 06/2000 | string | 59 |
act_date | act_date | 06/1982 | string | 13 |
act_date | act_date | 11/1993 | string | 32 |
act_date | act_date | 11/1994 | string | 4 |
act_date | act_date | 10/1989 | string | 25 |
act_date | act_date | 10/1992 | string | 35 |
act_date | act_date | 02/1991 | string | 6 |
aero_cht | aero_cht | BETHEL | string | 34 |
aero_cht | aero_cht | W ALEUTIAN ISLS | string | 3 |
aero_cht | aero_cht | ATLANTA | string | 780 |
aero_cht | aero_cht | JACKSONVILLE | string | 467 |
aero_cht | aero_cht | KANSAS CITY | string | 713 |
aero_cht | aero_cht | HOUSTON | string | 765 |
aero_cht | aero_cht | LAS VEGAS | string | 136 |
aero_cht | aero_cht | LOS ANGELES | string | 498 |
aero_cht | aero_cht | KLAMATH FALLS | string | 260 |
c_ldg_rts | c_ldg_rts | Y | string | 283 |
cbd_dir | cbd_dir | E | string | 2,127 |
cbd_dir | cbd_dir | SE | string | 2,281 |
cbd_dir | cbd_dir | NE | string | 2,393 |
cbd_dir | cbd_dir | W | string | 2,065 |
cbd_dir | cbd_dir | NNW | string | 11 |
cbd_dir | cbd_dir | NNE | string | 13 |
cert | cert | LU 06/1986 | string | 1 |
cert | cert | CS 05/1973 | string | 84 |
cert | cert | LS 05/1973 | string | 1 |
cert | cert | AS 06/1987 | string | 1 |
cert | cert | BS 03/1978 | string | 1 |
cert | cert | CS 03/1973 | string | 1 |
cert | cert | LU 01/1980 | string | 1 |
cert | cert | LU 03/1999 | string | 1 |
cert | cert | LU 06/1975 | string | 2 |
cert | cert | DS 05/1973 | string | 26 |
city | city | TAORA IS MALOELAP ATOLL | string | 1 |
city | city | NAMORIK ATOLL | string | 1 |
city | city | JABOR JALUIT ATOLL | string | 1 |
city | city | ROI-NAMUR | string | 1 |
city | city | JOHNSTON ISLAND | string | 1 |
city | city | KWAJALEIN | string | 1 |
city | city | PALMER | string | 17 |
city | city | DELTA JUNCTION | string | 6 |
city | city | HOMER | string | 10 |
city | city | TRAPPER CREEK/TALKEETNA | string | 1 |
city | city | SHAKTOOLIK | string | 1 |
city | city | CAPE BEAUFORT | string | 1 |
city | city | KAKE | string | 2 |
city | city | LIME VILLAGE | string | 1 |
city | city | SUTTON | string | 5 |
city | city | DIOMEDE | string | 1 |
city | city | HEALY | string | 3 |
city | city | SHISHMAREF | string | 1 |
city | city | CHALKYITSIK | string | 1 |
city | city | NUNAPITCHUK | string | 1 |
city | city | BIG MOUNTAIN | string | 1 |
city | city | OUZINKIE | string | 1 |
city | city | ST MICHAEL | string | 1 |
city | city | BIG LAKE | string | 5 |
city | city | MC GRATH | string | 3 |
city | city | ATKA | string | 1 |
city | city | STEBBINS | string | 1 |
city | city | CHEFORNAK | string | 1 |
city | city | KODIAK | string | 5 |
city | city | BOUNDARY | string | 1 |
city | city | BREVIG MISSION | string | 1 |
city | city | NIKOLSKI | string | 1 |
[ { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": null, "weight": 12040 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "02/2000", "weight": 54 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "11/1986", "weight": 119 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "11/1992", "weight": 92 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "10/1999", "weight": 41 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "12/1992", "weight": 86 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "09/1994", "weight": 20 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "08/1990", "weight": 36 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "01/1990", "weight": 15 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "08/1981", "weight": 40 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "12/1986", "weight": 56 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "11/1987", "weight": 25 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "04/1996", "weight": 3 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "01/1994", "weight": 36 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "04/1999", "weight": 45 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "03/1992", "weight": 17 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "07/1995", "weight": 15 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "04/1982", "weight": 92 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "05/1991", "weight": 31 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "03/1998", "weight": 62 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "01/1987", "weight": 31 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "05/1993", "weight": 8 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "02/1987", "weight": 129 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "10/1987", "weight": 62 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "09/1999", "weight": 31 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "07/1982", "weight": 80 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "01/1998", "weight": 28 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "04/1990", "weight": 16 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "12/1983", "weight": 77 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "12/1997", "weight": 31 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "09/1990", "weight": 11 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "07/1983", "weight": 20 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "09/1991", "weight": 14 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "09/1995", "weight": 15 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "05/1988", "weight": 20 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "06/2000", "weight": 59 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "06/1982", "weight": 13 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "11/1993", "weight": 32 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "11/1994", "weight": 4 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "10/1989", "weight": 25 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "10/1992", "weight": 35 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": "02/1991", "weight": 6 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "BETHEL", "weight": 34 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "W ALEUTIAN ISLS", "weight": 3 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "ATLANTA", "weight": 780 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "JACKSONVILLE", "weight": 467 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "KANSAS CITY", "weight": 713 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "HOUSTON", "weight": 765 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "LAS VEGAS", "weight": 136 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "LOS ANGELES", "weight": 498 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "KLAMATH FALLS", "weight": 260 }, { "fieldName": "c_ldg_rts", "fieldPath": "c_ldg_rts", "fieldType": "string", "fieldValue": "Y", "weight": 283 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "E", "weight": 2127 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "SE", "weight": 2281 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "NE", "weight": 2393 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "W", "weight": 2065 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "NNW", "weight": 11 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "NNE", "weight": 13 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "LU 06/1986", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "CS 05/1973", "weight": 84 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "LS 05/1973", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "AS 06/1987", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "BS 03/1978", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "CS 03/1973", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "LU 01/1980", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "LU 03/1999", "weight": 1 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "LU 06/1975", "weight": 2 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": "DS 05/1973", "weight": 26 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "TAORA IS MALOELAP ATOLL", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "NAMORIK ATOLL", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "JABOR JALUIT ATOLL", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "ROI-NAMUR", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "JOHNSTON ISLAND", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "KWAJALEIN", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "PALMER", "weight": 17 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "DELTA JUNCTION", "weight": 6 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "HOMER", "weight": 10 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "TRAPPER CREEK/TALKEETNA", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SHAKTOOLIK", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "CAPE BEAUFORT", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "KAKE", "weight": 2 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "LIME VILLAGE", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SUTTON", "weight": 5 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "DIOMEDE", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "HEALY", "weight": 3 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SHISHMAREF", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "CHALKYITSIK", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "NUNAPITCHUK", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "BIG MOUNTAIN", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "OUZINKIE", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "ST MICHAEL", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "BIG LAKE", "weight": 5 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "MC GRATH", "weight": 3 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "ATKA", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "STEBBINS", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "CHEFORNAK", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "KODIAK", "weight": 5 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "BOUNDARY", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "BREVIG MISSION", "weight": 1 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "NIKOLSKI", "weight": 1 } ]
WITH __stage0 AS ( SELECT group_set, CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldName", CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldPath", CASE group_set WHEN 0 THEN 'string' WHEN 1 THEN 'string' WHEN 2 THEN 'string' WHEN 3 THEN 'string' WHEN 4 THEN 'number' WHEN 5 THEN 'string' WHEN 6 THEN 'string' WHEN 7 THEN 'string' WHEN 8 THEN 'string' WHEN 9 THEN 'string' WHEN 10 THEN 'string' WHEN 11 THEN 'number' WHEN 12 THEN 'string' WHEN 13 THEN 'string' WHEN 14 THEN 'string' WHEN 15 THEN 'string' WHEN 16 THEN 'string' WHEN 17 THEN 'string' WHEN 18 THEN 'number' WHEN 19 THEN 'string' WHEN 20 THEN 'number' WHEN 21 THEN 'number' WHEN 22 THEN 'string' WHEN 23 THEN 'string' WHEN 24 THEN 'string' WHEN 25 THEN 'string' WHEN 26 THEN 'string' END as "fieldType", CASE group_set WHEN 99999 THEN CAST(NULL as VARCHAR) WHEN 0 THEN base."act_date" WHEN 1 THEN base."aero_cht" WHEN 2 THEN base."c_ldg_rts" WHEN 3 THEN base."cbd_dir" WHEN 5 THEN base."cert" WHEN 6 THEN base."city" WHEN 7 THEN base."cntl_twr" WHEN 8 THEN base."code" WHEN 9 THEN base."county" WHEN 10 THEN base."cust_intl" WHEN 12 THEN base."faa_dist" WHEN 13 THEN base."faa_region" WHEN 14 THEN base."fac_type" WHEN 15 THEN base."fac_use" WHEN 16 THEN base."fed_agree" WHEN 17 THEN base."full_name" WHEN 19 THEN base."joint_use" WHEN 22 THEN base."major" WHEN 23 THEN base."mil_rts" WHEN 24 THEN base."own_type" WHEN 25 THEN base."site_number" WHEN 26 THEN base."state" END as "fieldValue", COUNT(*) as "weight", CASE group_set WHEN 99999 THEN '' WHEN 4 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR) WHEN 11 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR) WHEN 18 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR) WHEN 20 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR) WHEN 21 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR) END as "fieldRange" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set ) as group_set GROUP BY 1,2,3,4,5 ) SELECT "fieldName", "fieldPath", "fieldType", COALESCE("fieldValue", "fieldRange") as "fieldValue", "weight" FROM __stage0
Add Ordering
Adding a second query stage to filter on string columns and ordering by weight descending shows us the most common fieldName/fieldValue pairs in the dataset.
All Malloy queries run as a single SQL query. The index:
operator is no different. Click the SQL tab to see how this works.
run: duckdb.table('../data/airports.parquet') -> { index: * } -> { where: fieldType = 'string' select: * order_by: weight desc }
fieldName | fieldPath | fieldType | fieldValue | weight |
---|---|---|---|---|
major | major | string | N | 19,523 |
cert | cert | string | ∅ | 19,142 |
cntl_twr | cntl_twr | string | N | 19,124 |
fed_agree | fed_agree | string | ∅ | 16,252 |
cust_intl | cust_intl | string | ∅ | 15,145 |
c_ldg_rts | c_ldg_rts | string | ∅ | 15,145 |
joint_use | joint_use | string | ∅ | 14,804 |
mil_rts | mil_rts | string | ∅ | 14,716 |
fac_use | fac_use | string | PR | 14,428 |
own_type | own_type | string | PR | 14,306 |
fac_type | fac_type | string | AIRPORT | 13,925 |
act_date | act_date | string | ∅ | 12,040 |
faa_dist | faa_dist | string | NONE | 7,085 |
fac_use | fac_use | string | PU | 5,365 |
own_type | own_type | string | PU | 5,174 |
fac_type | fac_type | string | HELIPORT | 5,135 |
joint_use | joint_use | string | N | 4,779 |
cust_intl | cust_intl | string | N | 4,575 |
faa_region | faa_region | string | AGL | 4,437 |
c_ldg_rts | c_ldg_rts | string | N | 4,365 |
cbd_dir | cbd_dir | string | N | 3,694 |
faa_region | faa_region | string | ASW | 3,268 |
mil_rts | mil_rts | string | Y | 2,958 |
faa_region | faa_region | string | ASO | 2,924 |
faa_region | faa_region | string | AEA | 2,586 |
cbd_dir | cbd_dir | string | NW | 2,438 |
cbd_dir | cbd_dir | string | SW | 2,427 |
cbd_dir | cbd_dir | string | NE | 2,393 |
cbd_dir | cbd_dir | string | SE | 2,281 |
cbd_dir | cbd_dir | string | S | 2,155 |
cbd_dir | cbd_dir | string | E | 2,127 |
mil_rts | mil_rts | string | N | 2,119 |
faa_region | faa_region | string | ANM | 2,102 |
cbd_dir | cbd_dir | string | W | 2,065 |
state | state | string | TX | 1,845 |
fed_agree | fed_agree | string | NGY | 1,682 |
aero_cht | aero_cht | string | NEW YORK | 1,581 |
faa_region | faa_region | string | ACE | 1,579 |
faa_dist | faa_dist | string | CHI | 1,528 |
faa_region | faa_region | string | AWP | 1,503 |
aero_cht | aero_cht | string | CHICAGO | 1,480 |
aero_cht | aero_cht | string | DETROIT | 1,262 |
faa_dist | faa_dist | string | DET | 1,240 |
faa_dist | faa_dist | string | SEA | 1,162 |
faa_dist | faa_dist | string | MSP | 1,046 |
faa_dist | faa_dist | string | ATL | 1,029 |
state | state | string | CA | 984 |
faa_dist | faa_dist | string | NYC | 957 |
faa_dist | faa_dist | string | ORL | 905 |
state | state | string | IL | 890 |
aero_cht | aero_cht | string | DALLAS-FT WORTH | 888 |
aero_cht | aero_cht | string | WASHINGTON | 887 |
state | state | string | FL | 856 |
aero_cht | aero_cht | string | ST LOUIS | 847 |
faa_dist | faa_dist | string | HAR | 846 |
state | state | string | PA | 804 |
aero_cht | aero_cht | string | ATLANTA | 780 |
aero_cht | aero_cht | string | HOUSTON | 765 |
faa_region | faa_region | string | ANE | 763 |
state | state | string | OH | 749 |
aero_cht | aero_cht | string | SEATTLE | 749 |
aero_cht | aero_cht | string | KANSAS CITY | 713 |
aero_cht | aero_cht | string | TWIN CITIES | 699 |
faa_dist | faa_dist | string | DEN | 679 |
cntl_twr | cntl_twr | string | Y | 669 |
faa_dist | faa_dist | string | DCA | 667 |
state | state | string | IN | 643 |
aero_cht | aero_cht | string | CINCINNATI | 624 |
faa_dist | faa_dist | string | BIS | 616 |
state | state | string | AK | 608 |
faa_region | faa_region | string | AAL | 608 |
aero_cht | aero_cht | string | MEMPHIS | 605 |
faa_dist | faa_dist | string | SFO | 585 |
state | state | string | NY | 576 |
aero_cht | aero_cht | string | OMAHA | 551 |
state | state | string | WI | 543 |
state | state | string | MO | 537 |
fed_agree | fed_agree | string | N | 515 |
state | state | string | MN | 507 |
aero_cht | aero_cht | string | SAN ANTONIO | 503 |
faa_dist | faa_dist | string | JAN | 502 |
state | state | string | LA | 500 |
aero_cht | aero_cht | string | LOS ANGELES | 498 |
state | state | string | MI | 489 |
faa_dist | faa_dist | string | MEM | 487 |
state | state | string | WA | 484 |
fac_type | fac_type | string | SEAPLANE BASE | 473 |
aero_cht | aero_cht | string | JACKSONVILLE | 467 |
aero_cht | aero_cht | string | NEW ORLEANS | 444 |
state | state | string | OK | 443 |
state | state | string | OR | 441 |
state | state | string | GA | 440 |
state | state | string | ND | 436 |
aero_cht | aero_cht | string | SAN FRANCISCO | 427 |
state | state | string | CO | 425 |
state | state | string | VA | 421 |
aero_cht | aero_cht | string | GREEN BAY | 418 |
state | state | string | KS | 415 |
state | state | string | NC | 400 |
aero_cht | aero_cht | string | CHARLOTTE | 380 |
[ { "fieldName": "major", "fieldPath": "major", "fieldType": "string", "fieldValue": "N", "weight": 19523 }, { "fieldName": "cert", "fieldPath": "cert", "fieldType": "string", "fieldValue": null, "weight": 19142 }, { "fieldName": "cntl_twr", "fieldPath": "cntl_twr", "fieldType": "string", "fieldValue": "N", "weight": 19124 }, { "fieldName": "fed_agree", "fieldPath": "fed_agree", "fieldType": "string", "fieldValue": null, "weight": 16252 }, { "fieldName": "cust_intl", "fieldPath": "cust_intl", "fieldType": "string", "fieldValue": null, "weight": 15145 }, { "fieldName": "c_ldg_rts", "fieldPath": "c_ldg_rts", "fieldType": "string", "fieldValue": null, "weight": 15145 }, { "fieldName": "joint_use", "fieldPath": "joint_use", "fieldType": "string", "fieldValue": null, "weight": 14804 }, { "fieldName": "mil_rts", "fieldPath": "mil_rts", "fieldType": "string", "fieldValue": null, "weight": 14716 }, { "fieldName": "fac_use", "fieldPath": "fac_use", "fieldType": "string", "fieldValue": "PR", "weight": 14428 }, { "fieldName": "own_type", "fieldPath": "own_type", "fieldType": "string", "fieldValue": "PR", "weight": 14306 }, { "fieldName": "fac_type", "fieldPath": "fac_type", "fieldType": "string", "fieldValue": "AIRPORT", "weight": 13925 }, { "fieldName": "act_date", "fieldPath": "act_date", "fieldType": "string", "fieldValue": null, "weight": 12040 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "NONE", "weight": 7085 }, { "fieldName": "fac_use", "fieldPath": "fac_use", "fieldType": "string", "fieldValue": "PU", "weight": 5365 }, { "fieldName": "own_type", "fieldPath": "own_type", "fieldType": "string", "fieldValue": "PU", "weight": 5174 }, { "fieldName": "fac_type", "fieldPath": "fac_type", "fieldType": "string", "fieldValue": "HELIPORT", "weight": 5135 }, { "fieldName": "joint_use", "fieldPath": "joint_use", "fieldType": "string", "fieldValue": "N", "weight": 4779 }, { "fieldName": "cust_intl", "fieldPath": "cust_intl", "fieldType": "string", "fieldValue": "N", "weight": 4575 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "AGL", "weight": 4437 }, { "fieldName": "c_ldg_rts", "fieldPath": "c_ldg_rts", "fieldType": "string", "fieldValue": "N", "weight": 4365 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "N", "weight": 3694 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "ASW", "weight": 3268 }, { "fieldName": "mil_rts", "fieldPath": "mil_rts", "fieldType": "string", "fieldValue": "Y", "weight": 2958 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "ASO", "weight": 2924 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "AEA", "weight": 2586 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "NW", "weight": 2438 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "SW", "weight": 2427 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "NE", "weight": 2393 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "SE", "weight": 2281 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "S", "weight": 2155 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "E", "weight": 2127 }, { "fieldName": "mil_rts", "fieldPath": "mil_rts", "fieldType": "string", "fieldValue": "N", "weight": 2119 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "ANM", "weight": 2102 }, { "fieldName": "cbd_dir", "fieldPath": "cbd_dir", "fieldType": "string", "fieldValue": "W", "weight": 2065 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "TX", "weight": 1845 }, { "fieldName": "fed_agree", "fieldPath": "fed_agree", "fieldType": "string", "fieldValue": "NGY", "weight": 1682 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "NEW YORK", "weight": 1581 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "ACE", "weight": 1579 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "CHI", "weight": 1528 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "AWP", "weight": 1503 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "CHICAGO", "weight": 1480 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "DETROIT", "weight": 1262 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "DET", "weight": 1240 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "SEA", "weight": 1162 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "MSP", "weight": 1046 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "ATL", "weight": 1029 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "CA", "weight": 984 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "NYC", "weight": 957 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "ORL", "weight": 905 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "IL", "weight": 890 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "DALLAS-FT WORTH", "weight": 888 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "WASHINGTON", "weight": 887 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "FL", "weight": 856 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "ST LOUIS", "weight": 847 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "HAR", "weight": 846 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "PA", "weight": 804 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "ATLANTA", "weight": 780 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "HOUSTON", "weight": 765 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "ANE", "weight": 763 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "OH", "weight": 749 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "SEATTLE", "weight": 749 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "KANSAS CITY", "weight": 713 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "TWIN CITIES", "weight": 699 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "DEN", "weight": 679 }, { "fieldName": "cntl_twr", "fieldPath": "cntl_twr", "fieldType": "string", "fieldValue": "Y", "weight": 669 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "DCA", "weight": 667 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "IN", "weight": 643 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "CINCINNATI", "weight": 624 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "BIS", "weight": 616 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "AK", "weight": 608 }, { "fieldName": "faa_region", "fieldPath": "faa_region", "fieldType": "string", "fieldValue": "AAL", "weight": 608 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "MEMPHIS", "weight": 605 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "SFO", "weight": 585 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "NY", "weight": 576 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "OMAHA", "weight": 551 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "WI", "weight": 543 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "MO", "weight": 537 }, { "fieldName": "fed_agree", "fieldPath": "fed_agree", "fieldType": "string", "fieldValue": "N", "weight": 515 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "MN", "weight": 507 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "SAN ANTONIO", "weight": 503 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "JAN", "weight": 502 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "LA", "weight": 500 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "LOS ANGELES", "weight": 498 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "MI", "weight": 489 }, { "fieldName": "faa_dist", "fieldPath": "faa_dist", "fieldType": "string", "fieldValue": "MEM", "weight": 487 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "WA", "weight": 484 }, { "fieldName": "fac_type", "fieldPath": "fac_type", "fieldType": "string", "fieldValue": "SEAPLANE BASE", "weight": 473 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "JACKSONVILLE", "weight": 467 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "NEW ORLEANS", "weight": 444 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "OK", "weight": 443 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "OR", "weight": 441 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "GA", "weight": 440 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "ND", "weight": 436 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "SAN FRANCISCO", "weight": 427 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "CO", "weight": 425 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "VA", "weight": 421 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "GREEN BAY", "weight": 418 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "KS", "weight": 415 }, { "fieldName": "state", "fieldPath": "state", "fieldType": "string", "fieldValue": "NC", "weight": 400 }, { "fieldName": "aero_cht", "fieldPath": "aero_cht", "fieldType": "string", "fieldValue": "CHARLOTTE", "weight": 380 } ]
WITH __stage0 AS ( SELECT group_set, CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldName", CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldPath", CASE group_set WHEN 0 THEN 'string' WHEN 1 THEN 'string' WHEN 2 THEN 'string' WHEN 3 THEN 'string' WHEN 4 THEN 'number' WHEN 5 THEN 'string' WHEN 6 THEN 'string' WHEN 7 THEN 'string' WHEN 8 THEN 'string' WHEN 9 THEN 'string' WHEN 10 THEN 'string' WHEN 11 THEN 'number' WHEN 12 THEN 'string' WHEN 13 THEN 'string' WHEN 14 THEN 'string' WHEN 15 THEN 'string' WHEN 16 THEN 'string' WHEN 17 THEN 'string' WHEN 18 THEN 'number' WHEN 19 THEN 'string' WHEN 20 THEN 'number' WHEN 21 THEN 'number' WHEN 22 THEN 'string' WHEN 23 THEN 'string' WHEN 24 THEN 'string' WHEN 25 THEN 'string' WHEN 26 THEN 'string' END as "fieldType", CASE group_set WHEN 99999 THEN CAST(NULL as VARCHAR) WHEN 0 THEN base."act_date" WHEN 1 THEN base."aero_cht" WHEN 2 THEN base."c_ldg_rts" WHEN 3 THEN base."cbd_dir" WHEN 5 THEN base."cert" WHEN 6 THEN base."city" WHEN 7 THEN base."cntl_twr" WHEN 8 THEN base."code" WHEN 9 THEN base."county" WHEN 10 THEN base."cust_intl" WHEN 12 THEN base."faa_dist" WHEN 13 THEN base."faa_region" WHEN 14 THEN base."fac_type" WHEN 15 THEN base."fac_use" WHEN 16 THEN base."fed_agree" WHEN 17 THEN base."full_name" WHEN 19 THEN base."joint_use" WHEN 22 THEN base."major" WHEN 23 THEN base."mil_rts" WHEN 24 THEN base."own_type" WHEN 25 THEN base."site_number" WHEN 26 THEN base."state" END as "fieldValue", COUNT(*) as "weight", CASE group_set WHEN 99999 THEN '' WHEN 4 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR) WHEN 11 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR) WHEN 18 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR) WHEN 20 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR) WHEN 21 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR) END as "fieldRange" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set ) as group_set GROUP BY 1,2,3,4,5 ) , __stage1 AS ( SELECT "fieldName", "fieldPath", "fieldType", COALESCE("fieldValue", "fieldRange") as "fieldValue", "weight" FROM __stage0 ) SELECT base."fieldName" as "fieldName", base."fieldPath" as "fieldPath", base."fieldType" as "fieldType", base."fieldValue" as "fieldValue", base."weight" as "weight" FROM __stage1 as base WHERE base."fieldType"='string' ORDER BY 5 desc NULLS LAST
Index For Filtering User Interfaces
Indexes can be used find the best way to filter a dataset. For example supposed we'd like to find 'SANTA CRUZ' in the dataset. Upon approaching the dataset, but we don't which column might contain it. In a UI you might imagine that you type 'SANTA' and let have suggestons for values that might be appropriate. In the results we can see that top value, 'SANTA ROSA', appears as county on 26 rows in the table. We can also see that 'SANTA CRUZ' is both a city
and a county
..
run: duckdb.table('../data/airports.parquet') -> { index: * } -> { where: fieldValue ~ r'SANTA' select: * order_by: weight desc limit: 15 }
fieldName | fieldPath | fieldType | fieldValue | weight |
---|---|---|---|---|
county | county | string | SANTA ROSA | 26 |
county | county | string | SANTA BARBARA | 22 |
county | county | string | SANTA CLARA | 10 |
county | county | string | SANTA CRUZ | 10 |
county | county | string | SANTA FE | 9 |
city | city | string | SANTA FE | 6 |
city | city | string | SANTA ANA | 6 |
city | city | string | SANTA BARBARA | 5 |
city | city | string | SANTA ROSA | 4 |
city | city | string | SANTA MARIA | 3 |
city | city | string | SANTA YNEZ | 3 |
city | city | string | SANTA ELENA | 2 |
city | city | string | SANTA YSABEL | 2 |
city | city | string | RANCHO SANTA MARGARITA | 2 |
city | city | string | SANTA CRUZ | 2 |
[ { "fieldName": "county", "fieldPath": "county", "fieldType": "string", "fieldValue": "SANTA ROSA", "weight": 26 }, { "fieldName": "county", "fieldPath": "county", "fieldType": "string", "fieldValue": "SANTA BARBARA", "weight": 22 }, { "fieldName": "county", "fieldPath": "county", "fieldType": "string", "fieldValue": "SANTA CLARA", "weight": 10 }, { "fieldName": "county", "fieldPath": "county", "fieldType": "string", "fieldValue": "SANTA CRUZ", "weight": 10 }, { "fieldName": "county", "fieldPath": "county", "fieldType": "string", "fieldValue": "SANTA FE", "weight": 9 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA FE", "weight": 6 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA ANA", "weight": 6 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA BARBARA", "weight": 5 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA ROSA", "weight": 4 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA MARIA", "weight": 3 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA YNEZ", "weight": 3 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA ELENA", "weight": 2 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA YSABEL", "weight": 2 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "RANCHO SANTA MARGARITA", "weight": 2 }, { "fieldName": "city", "fieldPath": "city", "fieldType": "string", "fieldValue": "SANTA CRUZ", "weight": 2 } ]
WITH __stage0 AS ( SELECT group_set, CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldName", CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldPath", CASE group_set WHEN 0 THEN 'string' WHEN 1 THEN 'string' WHEN 2 THEN 'string' WHEN 3 THEN 'string' WHEN 4 THEN 'number' WHEN 5 THEN 'string' WHEN 6 THEN 'string' WHEN 7 THEN 'string' WHEN 8 THEN 'string' WHEN 9 THEN 'string' WHEN 10 THEN 'string' WHEN 11 THEN 'number' WHEN 12 THEN 'string' WHEN 13 THEN 'string' WHEN 14 THEN 'string' WHEN 15 THEN 'string' WHEN 16 THEN 'string' WHEN 17 THEN 'string' WHEN 18 THEN 'number' WHEN 19 THEN 'string' WHEN 20 THEN 'number' WHEN 21 THEN 'number' WHEN 22 THEN 'string' WHEN 23 THEN 'string' WHEN 24 THEN 'string' WHEN 25 THEN 'string' WHEN 26 THEN 'string' END as "fieldType", CASE group_set WHEN 99999 THEN CAST(NULL as VARCHAR) WHEN 0 THEN base."act_date" WHEN 1 THEN base."aero_cht" WHEN 2 THEN base."c_ldg_rts" WHEN 3 THEN base."cbd_dir" WHEN 5 THEN base."cert" WHEN 6 THEN base."city" WHEN 7 THEN base."cntl_twr" WHEN 8 THEN base."code" WHEN 9 THEN base."county" WHEN 10 THEN base."cust_intl" WHEN 12 THEN base."faa_dist" WHEN 13 THEN base."faa_region" WHEN 14 THEN base."fac_type" WHEN 15 THEN base."fac_use" WHEN 16 THEN base."fed_agree" WHEN 17 THEN base."full_name" WHEN 19 THEN base."joint_use" WHEN 22 THEN base."major" WHEN 23 THEN base."mil_rts" WHEN 24 THEN base."own_type" WHEN 25 THEN base."site_number" WHEN 26 THEN base."state" END as "fieldValue", COUNT(*) as "weight", CASE group_set WHEN 99999 THEN '' WHEN 4 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR) WHEN 11 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR) WHEN 18 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR) WHEN 20 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR) WHEN 21 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR) END as "fieldRange" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set ) as group_set GROUP BY 1,2,3,4,5 ) , __stage1 AS ( SELECT "fieldName", "fieldPath", "fieldType", COALESCE("fieldValue", "fieldRange") as "fieldValue", "weight" FROM __stage0 ) SELECT base."fieldName" as "fieldName", base."fieldPath" as "fieldPath", base."fieldType" as "fieldType", base."fieldValue" as "fieldValue", base."weight" as "weight" FROM __stage1 as base WHERE REGEXP_MATCHES(base."fieldValue", 'SANTA') ORDER BY 5 desc NULLS LAST LIMIT 15
We can then write a simple query to show the rows. It turns out that 'SANTA CRUZ' is a county in both California and Arizona.
run: duckdb.table('../data/airports.parquet') -> { where: county ~ 'SANTA CRUZ' select: * }
act_date | aero_cht | c_ldg_rts | cbd_dir | cbd_dist | cert | city | cntl_twr | code | county | cust_intl | elevation | faa_dist | faa_region | fac_type | fac_use | fed_agree | full_name | id | joint_use | latitude | longitude | major | mil_rts | own_type | site_number | state |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10/1987 | PHOENIX | ∅ | E | 1 | ∅ | TUBAC | N | 2AZ8 | SANTA CRUZ | ∅ | 3,200 | NONE | AWP | ULTRALIGHT | PR | ∅ | TUBAC ULTRALIGHT FLIGHTPARK | 1,444 | ∅ | 31.61 | -111.03 | N | ∅ | PR | 00811.*U | AZ |
∅ | PHOENIX | N | NE | 7 | ∅ | NOGALES | N | OLS | SANTA CRUZ | Y | 3,955 | NONE | AWP | AIRPORT | PU | NGY | NOGALES INTL | 1,312 | N | 31.41 | -110.84 | N | Y | PU | 00739.*A | AZ |
∅ | LOS ANGELES | ∅ | ∅ | 0 | ∅ | SAN NICOLAS ISLAND | Y | NSI | SANTA CRUZ | ∅ | 504 | NONE | AWP | AIRPORT | PR | ∅ | SAN NICOLAS ISLAND NOLF | 2,287 | ∅ | 33.23 | -119.45 | N | Y | MN | 02220.*A | CA |
07/1986 | SAN FRANCISCO | ∅ | NW | 8 | ∅ | SANTA CRUZ | N | CL77 | SANTA CRUZ | ∅ | 2,020 | SFO | AWP | AIRPORT | PR | ∅ | BONNY DOON VILLAGE | 2,307 | ∅ | 37.07 | -122.12 | N | ∅ | PR | 02241.1*A | CA |
∅ | SAN FRANCISCO | ∅ | NE | 3 | ∅ | SANTA CRUZ | N | CA37 | SANTA CRUZ | ∅ | 115 | SFO | AWP | HELIPORT | PR | ∅ | DOMINICAN SANTA CRUZ HOSPITAL | 2,308 | ∅ | 36.99 | -121.98 | N | ∅ | PR | 02241.12*H | CA |
∅ | SAN FRANCISCO | ∅ | NW | 6 | ∅ | DAVENPORT | N | 6Q6 | SANTA CRUZ | ∅ | 125 | SFO | AWP | AIRPORT | PR | ∅ | LAS TRANCAS | 1,693 | N | 37.08 | -122.27 | N | N | PR | 01486.9*A | CA |
∅ | SAN FRANCISCO | ∅ | NW | 5 | ∅ | WATSONVILLE | N | CA65 | SANTA CRUZ | ∅ | 480 | SFO | AWP | HELIPORT | PR | ∅ | ALTA VISTA | 2,441 | N | 36.97 | -121.86 | N | N | PR | 02429.1*H | CA |
09/1987 | SAN FRANCISCO | ∅ | E | 0 | ∅ | WATSONVILLE | N | CL99 | SANTA CRUZ | ∅ | 111 | SFO | AWP | HELIPORT | PR | ∅ | WATSONVILLE COMMUNITY HOSPITAL | 2,442 | ∅ | 36.93 | -121.77 | N | ∅ | PU | 02429.11*H | CA |
∅ | SAN FRANCISCO | ∅ | W | 4 | ∅ | WATSONVILLE | N | CA66 | SANTA CRUZ | ∅ | 70 | SFO | AWP | AIRPORT | PR | ∅ | MONTEREY BAY ACADEMY | 2,443 | N | 36.9 | -121.84 | N | N | PR | 02429.2*A | CA |
∅ | SAN FRANCISCO | N | NW | 3 | ∅ | WATSONVILLE | N | WVI | SANTA CRUZ | N | 160 | SFO | AWP | AIRPORT | PU | NGPRY | WATSONVILLE MUNI | 2,440 | N | 36.93 | -121.78 | N | Y | PU | 02429.*A | CA |
[ { "act_date": "10/1987", "aero_cht": "PHOENIX", "c_ldg_rts": null, "cbd_dir": "E", "cbd_dist": 1, "cert": null, "city": "TUBAC", "cntl_twr": "N", "code": "2AZ8", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 3200, "faa_dist": "NONE", "faa_region": "AWP", "fac_type": "ULTRALIGHT", "fac_use": "PR", "fed_agree": null, "full_name": "TUBAC ULTRALIGHT FLIGHTPARK", "id": 1444, "joint_use": null, "latitude": 31.61, "longitude": -111.03, "major": "N", "mil_rts": null, "own_type": "PR", "site_number": "00811.*U", "state": "AZ" }, { "act_date": null, "aero_cht": "PHOENIX", "c_ldg_rts": "N", "cbd_dir": "NE", "cbd_dist": 7, "cert": null, "city": "NOGALES", "cntl_twr": "N", "code": "OLS", "county": "SANTA CRUZ", "cust_intl": "Y", "elevation": 3955, "faa_dist": "NONE", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PU", "fed_agree": "NGY", "full_name": "NOGALES INTL", "id": 1312, "joint_use": "N", "latitude": 31.41, "longitude": -110.84, "major": "N", "mil_rts": "Y", "own_type": "PU", "site_number": "00739.*A", "state": "AZ" }, { "act_date": null, "aero_cht": "LOS ANGELES", "c_ldg_rts": null, "cbd_dir": null, "cbd_dist": 0, "cert": null, "city": "SAN NICOLAS ISLAND", "cntl_twr": "Y", "code": "NSI", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 504, "faa_dist": "NONE", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PR", "fed_agree": null, "full_name": "SAN NICOLAS ISLAND NOLF", "id": 2287, "joint_use": null, "latitude": 33.23, "longitude": -119.45, "major": "N", "mil_rts": "Y", "own_type": "MN", "site_number": "02220.*A", "state": "CA" }, { "act_date": "07/1986", "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "NW", "cbd_dist": 8, "cert": null, "city": "SANTA CRUZ", "cntl_twr": "N", "code": "CL77", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 2020, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PR", "fed_agree": null, "full_name": "BONNY DOON VILLAGE", "id": 2307, "joint_use": null, "latitude": 37.07, "longitude": -122.12, "major": "N", "mil_rts": null, "own_type": "PR", "site_number": "02241.1*A", "state": "CA" }, { "act_date": null, "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "NE", "cbd_dist": 3, "cert": null, "city": "SANTA CRUZ", "cntl_twr": "N", "code": "CA37", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 115, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "HELIPORT", "fac_use": "PR", "fed_agree": null, "full_name": "DOMINICAN SANTA CRUZ HOSPITAL", "id": 2308, "joint_use": null, "latitude": 36.99, "longitude": -121.98, "major": "N", "mil_rts": null, "own_type": "PR", "site_number": "02241.12*H", "state": "CA" }, { "act_date": null, "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "NW", "cbd_dist": 6, "cert": null, "city": "DAVENPORT", "cntl_twr": "N", "code": "6Q6", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 125, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PR", "fed_agree": null, "full_name": "LAS TRANCAS", "id": 1693, "joint_use": "N", "latitude": 37.08, "longitude": -122.27, "major": "N", "mil_rts": "N", "own_type": "PR", "site_number": "01486.9*A", "state": "CA" }, { "act_date": null, "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "NW", "cbd_dist": 5, "cert": null, "city": "WATSONVILLE", "cntl_twr": "N", "code": "CA65", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 480, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "HELIPORT", "fac_use": "PR", "fed_agree": null, "full_name": "ALTA VISTA", "id": 2441, "joint_use": "N", "latitude": 36.97, "longitude": -121.86, "major": "N", "mil_rts": "N", "own_type": "PR", "site_number": "02429.1*H", "state": "CA" }, { "act_date": "09/1987", "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "E", "cbd_dist": 0, "cert": null, "city": "WATSONVILLE", "cntl_twr": "N", "code": "CL99", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 111, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "HELIPORT", "fac_use": "PR", "fed_agree": null, "full_name": "WATSONVILLE COMMUNITY HOSPITAL", "id": 2442, "joint_use": null, "latitude": 36.93, "longitude": -121.77, "major": "N", "mil_rts": null, "own_type": "PU", "site_number": "02429.11*H", "state": "CA" }, { "act_date": null, "aero_cht": "SAN FRANCISCO", "c_ldg_rts": null, "cbd_dir": "W", "cbd_dist": 4, "cert": null, "city": "WATSONVILLE", "cntl_twr": "N", "code": "CA66", "county": "SANTA CRUZ", "cust_intl": null, "elevation": 70, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PR", "fed_agree": null, "full_name": "MONTEREY BAY ACADEMY", "id": 2443, "joint_use": "N", "latitude": 36.9, "longitude": -121.84, "major": "N", "mil_rts": "N", "own_type": "PR", "site_number": "02429.2*A", "state": "CA" }, { "act_date": null, "aero_cht": "SAN FRANCISCO", "c_ldg_rts": "N", "cbd_dir": "NW", "cbd_dist": 3, "cert": null, "city": "WATSONVILLE", "cntl_twr": "N", "code": "WVI", "county": "SANTA CRUZ", "cust_intl": "N", "elevation": 160, "faa_dist": "SFO", "faa_region": "AWP", "fac_type": "AIRPORT", "fac_use": "PU", "fed_agree": "NGPRY", "full_name": "WATSONVILLE MUNI", "id": 2440, "joint_use": "N", "latitude": 36.93, "longitude": -121.78, "major": "N", "mil_rts": "Y", "own_type": "PU", "site_number": "02429.*A", "state": "CA" } ]
SELECT base."act_date" as "act_date", base."aero_cht" as "aero_cht", base."c_ldg_rts" as "c_ldg_rts", base."cbd_dir" as "cbd_dir", base."cbd_dist" as "cbd_dist", base."cert" as "cert", base."city" as "city", base."cntl_twr" as "cntl_twr", base."code" as "code", base."county" as "county", base."cust_intl" as "cust_intl", base."elevation" as "elevation", base."faa_dist" as "faa_dist", base."faa_region" as "faa_region", base."fac_type" as "fac_type", base."fac_use" as "fac_use", base."fed_agree" as "fed_agree", base."full_name" as "full_name", base."id" as "id", base."joint_use" as "joint_use", base."latitude" as "latitude", base."longitude" as "longitude", base."major" as "major", base."mil_rts" as "mil_rts", base."own_type" as "own_type", base."site_number" as "site_number", base."state" as "state" FROM '../data/airports.parquet' as base WHERE base."county" LIKE 'SANTA CRUZ'
Indexing to show top values for each dimension
It is often difficult to approach a new dataset. The index operator provides an intersting way to quickly gain an understanding of the dataset. By piping the results of an index another stage, we can quickly see all the interesting values for each of the interesting dimesions. Again, the weight shows the number of rows for that particular dimension/value.
run: duckdb.table('../data/airports.parquet') -> { index: * } -> { group_by: fieldName nest: values is { group_by: fieldValue, weight order_by: weight desc limit: 10 } order_by: fieldName }
fieldName | values | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
act_date |
| ||||||||||||||||||||||
aero_cht |
| ||||||||||||||||||||||
c_ldg_rts |
| ||||||||||||||||||||||
cbd_dir |
| ||||||||||||||||||||||
cbd_dist |
| ||||||||||||||||||||||
cert |
| ||||||||||||||||||||||
city |
| ||||||||||||||||||||||
cntl_twr |
| ||||||||||||||||||||||
code |
| ||||||||||||||||||||||
county |
| ||||||||||||||||||||||
cust_intl |
| ||||||||||||||||||||||
elevation |
| ||||||||||||||||||||||
faa_dist |
| ||||||||||||||||||||||
faa_region |
| ||||||||||||||||||||||
fac_type |
| ||||||||||||||||||||||
fac_use |
| ||||||||||||||||||||||
fed_agree |
| ||||||||||||||||||||||
full_name |
| ||||||||||||||||||||||
id |
| ||||||||||||||||||||||
joint_use |
| ||||||||||||||||||||||
latitude |
| ||||||||||||||||||||||
longitude |
| ||||||||||||||||||||||
major |
| ||||||||||||||||||||||
mil_rts |
| ||||||||||||||||||||||
own_type |
| ||||||||||||||||||||||
site_number |
| ||||||||||||||||||||||
state |
| ||||||||||||||||||||||
∅ |
|
[ { "fieldName": "act_date", "values": [ { "fieldValue": null, "weight": 12040 }, { "fieldValue": "04/1993", "weight": 145 }, { "fieldValue": "02/1987", "weight": 129 }, { "fieldValue": "03/1993", "weight": 127 }, { "fieldValue": "11/1990", "weight": 122 }, { "fieldValue": "11/1986", "weight": 119 }, { "fieldValue": "02/1982", "weight": 118 }, { "fieldValue": "08/1988", "weight": 115 }, { "fieldValue": "11/1988", "weight": 113 }, { "fieldValue": "03/1988", "weight": 113 } ] }, { "fieldName": "aero_cht", "values": [ { "fieldValue": "NEW YORK", "weight": 1581 }, { "fieldValue": "CHICAGO", "weight": 1480 }, { "fieldValue": "DETROIT", "weight": 1262 }, { "fieldValue": "DALLAS-FT WORTH", "weight": 888 }, { "fieldValue": "WASHINGTON", "weight": 887 }, { "fieldValue": "ST LOUIS", "weight": 847 }, { "fieldValue": "ATLANTA", "weight": 780 }, { "fieldValue": "HOUSTON", "weight": 765 }, { "fieldValue": "SEATTLE", "weight": 749 }, { "fieldValue": "KANSAS CITY", "weight": 713 } ] }, { "fieldName": "c_ldg_rts", "values": [ { "fieldValue": null, "weight": 15145 }, { "fieldValue": "N", "weight": 4365 }, { "fieldValue": "Y", "weight": 283 } ] }, { "fieldName": "cbd_dir", "values": [ { "fieldValue": "N", "weight": 3694 }, { "fieldValue": "NW", "weight": 2438 }, { "fieldValue": "SW", "weight": 2427 }, { "fieldValue": "NE", "weight": 2393 }, { "fieldValue": "SE", "weight": 2281 }, { "fieldValue": "S", "weight": 2155 }, { "fieldValue": "E", "weight": 2127 }, { "fieldValue": "W", "weight": 2065 }, { "fieldValue": null, "weight": 119 }, { "fieldValue": "SSW", "weight": 20 } ] }, { "fieldName": "cbd_dist", "values": [ { "fieldValue": "0 to 73", "weight": 19793 } ] }, { "fieldName": "cert", "values": [ { "fieldValue": null, "weight": 19142 }, { "fieldValue": "AS 05/1973", "weight": 94 }, { "fieldValue": "CS 05/1973", "weight": 84 }, { "fieldValue": "BS 05/1973", "weight": 81 }, { "fieldValue": "LU 05/1973", "weight": 37 }, { "fieldValue": "LU 08/1990", "weight": 33 }, { "fieldValue": "DS 05/1973", "weight": 26 }, { "fieldValue": "AU 05/1973", "weight": 25 }, { "fieldValue": "ES 05/1973", "weight": 15 }, { "fieldValue": "LU 11/1974", "weight": 9 } ] }, { "fieldName": "city", "values": [ { "fieldValue": "HOUSTON", "weight": 108 }, { "fieldValue": "LOS ANGELES", "weight": 60 }, { "fieldValue": "COLUMBUS", "weight": 47 }, { "fieldValue": "SPRINGFIELD", "weight": 45 }, { "fieldValue": "JACKSON", "weight": 42 }, { "fieldValue": "GREENVILLE", "weight": 38 }, { "fieldValue": "WASHINGTON", "weight": 38 }, { "fieldValue": "PHOENIX", "weight": 37 }, { "fieldValue": "CLINTON", "weight": 37 }, { "fieldValue": "PHILADELPHIA", "weight": 35 } ] }, { "fieldName": "cntl_twr", "values": [ { "fieldValue": "N", "weight": 19124 }, { "fieldValue": "Y", "weight": 669 } ] }, { "fieldName": "code", "values": [ { "fieldValue": "ROR", "weight": 1 }, { "fieldValue": "34AK", "weight": 1 }, { "fieldValue": "CJX", "weight": 1 }, { "fieldValue": "AKA", "weight": 1 }, { "fieldValue": "43AK", "weight": 1 }, { "fieldValue": "6K8", "weight": 1 }, { "fieldValue": "10AK", "weight": 1 }, { "fieldValue": "EXI", "weight": 1 }, { "fieldValue": "5Z2", "weight": 1 }, { "fieldValue": "MLL", "weight": 1 } ] }, { "fieldName": "county", "values": [ { "fieldValue": "WASHINGTON", "weight": 214 }, { "fieldValue": "JEFFERSON", "weight": 199 }, { "fieldValue": "LOS ANGELES", "weight": 176 }, { "fieldValue": "MONTGOMERY", "weight": 154 }, { "fieldValue": "FRANKLIN", "weight": 140 }, { "fieldValue": "JACKSON", "weight": 140 }, { "fieldValue": "HARRIS", "weight": 137 }, { "fieldValue": "MARION", "weight": 127 }, { "fieldValue": "ORANGE", "weight": 125 }, { "fieldValue": "MARICOPA", "weight": 117 } ] }, { "fieldName": "cust_intl", "values": [ { "fieldValue": null, "weight": 15145 }, { "fieldValue": "N", "weight": 4575 }, { "fieldValue": "Y", "weight": 73 } ] }, { "fieldName": "elevation", "values": [ { "fieldValue": "-1 to 12442", "weight": 19793 } ] }, { "fieldName": "faa_dist", "values": [ { "fieldValue": "NONE", "weight": 7085 }, { "fieldValue": "CHI", "weight": 1528 }, { "fieldValue": "DET", "weight": 1240 }, { "fieldValue": "SEA", "weight": 1162 }, { "fieldValue": "MSP", "weight": 1046 }, { "fieldValue": "ATL", "weight": 1029 }, { "fieldValue": "NYC", "weight": 957 }, { "fieldValue": "ORL", "weight": 905 }, { "fieldValue": "HAR", "weight": 846 }, { "fieldValue": "DEN", "weight": 679 } ] }, { "fieldName": "faa_region", "values": [ { "fieldValue": "AGL", "weight": 4437 }, { "fieldValue": "ASW", "weight": 3268 }, { "fieldValue": "ASO", "weight": 2924 }, { "fieldValue": "AEA", "weight": 2586 }, { "fieldValue": "ANM", "weight": 2102 }, { "fieldValue": "ACE", "weight": 1579 }, { "fieldValue": "AWP", "weight": 1503 }, { "fieldValue": "ANE", "weight": 763 }, { "fieldValue": "AAL", "weight": 608 }, { "fieldValue": null, "weight": 23 } ] }, { "fieldName": "fac_type", "values": [ { "fieldValue": "AIRPORT", "weight": 13925 }, { "fieldValue": "HELIPORT", "weight": 5135 }, { "fieldValue": "SEAPLANE BASE", "weight": 473 }, { "fieldValue": "ULTRALIGHT", "weight": 125 }, { "fieldValue": "STOLPORT", "weight": 86 }, { "fieldValue": "GLIDERPORT", "weight": 37 }, { "fieldValue": "BALLOONPORT", "weight": 12 } ] }, { "fieldName": "fac_use", "values": [ { "fieldValue": "PR", "weight": 14428 }, { "fieldValue": "PU", "weight": 5365 } ] }, { "fieldName": "fed_agree", "values": [ { "fieldValue": null, "weight": 16252 }, { "fieldValue": "NGY", "weight": 1682 }, { "fieldValue": "N", "weight": 515 }, { "fieldValue": "NGY3", "weight": 219 }, { "fieldValue": "NGPY", "weight": 171 }, { "fieldValue": "NY1", "weight": 123 }, { "fieldValue": "N1", "weight": 97 }, { "fieldValue": "1", "weight": 96 }, { "fieldValue": "NGSY", "weight": 87 }, { "fieldValue": "NGPY3", "weight": 84 } ] }, { "fieldName": "full_name", "values": [ { "fieldValue": "MEMORIAL HOSPITAL", "weight": 21 }, { "fieldValue": "SMITH", "weight": 14 }, { "fieldValue": "JOHNSON", "weight": 13 }, { "fieldValue": "ST MARY'S HOSPITAL", "weight": 11 }, { "fieldValue": "MILLER", "weight": 10 }, { "fieldValue": "DAVIS FIELD", "weight": 10 }, { "fieldValue": "HILLTOP", "weight": 10 }, { "fieldValue": "WILSON", "weight": 9 }, { "fieldValue": "WILLIAMS", "weight": 9 }, { "fieldValue": "ST FRANCIS HOSPITAL", "weight": 9 } ] }, { "fieldName": "id", "values": [ { "fieldValue": "1 to 19793", "weight": 19793 } ] }, { "fieldName": "joint_use", "values": [ { "fieldValue": null, "weight": 14804 }, { "fieldValue": "N", "weight": 4779 }, { "fieldValue": "Y", "weight": 210 } ] }, { "fieldName": "latitude", "values": [ { "fieldValue": "-14.18 to 71.28", "weight": 19793 } ] }, { "fieldName": "longitude", "values": [ { "fieldValue": "-100.0 to 174.11", "weight": 19793 } ] }, { "fieldName": "major", "values": [ { "fieldValue": "N", "weight": 19523 }, { "fieldValue": "Y", "weight": 270 } ] }, { "fieldName": "mil_rts", "values": [ { "fieldValue": null, "weight": 14716 }, { "fieldValue": "Y", "weight": 2958 }, { "fieldValue": "N", "weight": 2119 } ] }, { "fieldName": "own_type", "values": [ { "fieldValue": "PR", "weight": 14306 }, { "fieldValue": "PU", "weight": 5174 }, { "fieldValue": "MR", "weight": 128 }, { "fieldValue": "MA", "weight": 107 }, { "fieldValue": "MN", "weight": 78 } ] }, { "fieldName": "site_number", "values": [ { "fieldValue": "51518.6*A", "weight": 1 }, { "fieldValue": "51541.*A", "weight": 1 }, { "fieldValue": "50410.13*C", "weight": 1 }, { "fieldValue": "50720.04*A", "weight": 1 }, { "fieldValue": "50080.5*A", "weight": 1 }, { "fieldValue": "50870.21*A", "weight": 1 }, { "fieldValue": "50530.*A", "weight": 1 }, { "fieldValue": "50178.01*A", "weight": 1 }, { "fieldValue": "50427.61*A", "weight": 1 }, { "fieldValue": "50425.2*H", "weight": 1 } ] }, { "fieldName": "state", "values": [ { "fieldValue": "TX", "weight": 1845 }, { "fieldValue": "CA", "weight": 984 }, { "fieldValue": "IL", "weight": 890 }, { "fieldValue": "FL", "weight": 856 }, { "fieldValue": "PA", "weight": 804 }, { "fieldValue": "OH", "weight": 749 }, { "fieldValue": "IN", "weight": 643 }, { "fieldValue": "AK", "weight": 608 }, { "fieldValue": "NY", "weight": 576 }, { "fieldValue": "WI", "weight": 543 } ] }, { "fieldName": null, "values": [ { "fieldValue": null, "weight": 19793 } ] } ]
WITH __stage0 AS ( SELECT group_set, CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldName", CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26 THEN 'state' END as "fieldPath", CASE group_set WHEN 0 THEN 'string' WHEN 1 THEN 'string' WHEN 2 THEN 'string' WHEN 3 THEN 'string' WHEN 4 THEN 'number' WHEN 5 THEN 'string' WHEN 6 THEN 'string' WHEN 7 THEN 'string' WHEN 8 THEN 'string' WHEN 9 THEN 'string' WHEN 10 THEN 'string' WHEN 11 THEN 'number' WHEN 12 THEN 'string' WHEN 13 THEN 'string' WHEN 14 THEN 'string' WHEN 15 THEN 'string' WHEN 16 THEN 'string' WHEN 17 THEN 'string' WHEN 18 THEN 'number' WHEN 19 THEN 'string' WHEN 20 THEN 'number' WHEN 21 THEN 'number' WHEN 22 THEN 'string' WHEN 23 THEN 'string' WHEN 24 THEN 'string' WHEN 25 THEN 'string' WHEN 26 THEN 'string' END as "fieldType", CASE group_set WHEN 99999 THEN CAST(NULL as VARCHAR) WHEN 0 THEN base."act_date" WHEN 1 THEN base."aero_cht" WHEN 2 THEN base."c_ldg_rts" WHEN 3 THEN base."cbd_dir" WHEN 5 THEN base."cert" WHEN 6 THEN base."city" WHEN 7 THEN base."cntl_twr" WHEN 8 THEN base."code" WHEN 9 THEN base."county" WHEN 10 THEN base."cust_intl" WHEN 12 THEN base."faa_dist" WHEN 13 THEN base."faa_region" WHEN 14 THEN base."fac_type" WHEN 15 THEN base."fac_use" WHEN 16 THEN base."fed_agree" WHEN 17 THEN base."full_name" WHEN 19 THEN base."joint_use" WHEN 22 THEN base."major" WHEN 23 THEN base."mil_rts" WHEN 24 THEN base."own_type" WHEN 25 THEN base."site_number" WHEN 26 THEN base."state" END as "fieldValue", COUNT(*) as "weight", CASE group_set WHEN 99999 THEN '' WHEN 4 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR) WHEN 11 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR) WHEN 18 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR) WHEN 20 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR) WHEN 21 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR) END as "fieldRange" FROM '../data/airports.parquet' as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set ) as group_set GROUP BY 1,2,3,4,5 ) , __stage1 AS ( SELECT "fieldName", "fieldPath", "fieldType", COALESCE("fieldValue", "fieldRange") as "fieldValue", "weight" FROM __stage0 ) , __stage2 AS ( SELECT group_set, base."fieldName" as "fieldName__0", CASE WHEN group_set=1 THEN base."fieldValue" END as "fieldValue__1", CASE WHEN group_set=1 THEN base."weight" END as "weight__1" FROM __stage1 as base CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set ) as group_set GROUP BY 1,2,3,4 ) SELECT "fieldName__0" as "fieldName", COALESCE(LIST({ "fieldValue": "fieldValue__1", "weight": "weight__1"} ORDER BY "weight__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:10],[]) as "values" FROM __stage2 GROUP BY 1 ORDER BY 1 ASC NULLS LAST
Sampling
With large datasets, you can also sample a small subsection using the sample:
parameter. Sampled indexes are great at identifing the important low cardinality fields.
run: duckdb.table('../data/airports.parquet') -> { index: * sample: 5000 // sample only 5000 rows } -> { group_by: fieldName nest: values is { group_by: fieldValue, weight order_by: weight desc limit: 10 } order_by: fieldName }
fieldName | values | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
act_date |
| ||||||||||||||||||||||
aero_cht |
| ||||||||||||||||||||||
c_ldg_rts |
| ||||||||||||||||||||||
cbd_dir |
| ||||||||||||||||||||||
cbd_dist |
| ||||||||||||||||||||||
cert |
| ||||||||||||||||||||||
city |
| ||||||||||||||||||||||
cntl_twr |
| ||||||||||||||||||||||
code |
| ||||||||||||||||||||||
county |
| ||||||||||||||||||||||
cust_intl |
| ||||||||||||||||||||||
elevation |
| ||||||||||||||||||||||
faa_dist |
| ||||||||||||||||||||||
faa_region |
| ||||||||||||||||||||||
fac_type |
| ||||||||||||||||||||||
fac_use |
| ||||||||||||||||||||||
fed_agree |
| ||||||||||||||||||||||
full_name |
| ||||||||||||||||||||||
id |
| ||||||||||||||||||||||
joint_use |
| ||||||||||||||||||||||
latitude |
| ||||||||||||||||||||||
longitude |
| ||||||||||||||||||||||
major |
| ||||||||||||||||||||||
mil_rts |
| ||||||||||||||||||||||
own_type |
| ||||||||||||||||||||||
site_number |
| ||||||||||||||||||||||
state |
| ||||||||||||||||||||||
∅ |
|
[ { "fieldName": "act_date", "values": [ { "fieldValue": null, "weight": 3039 }, { "fieldValue": "04/1993", "weight": 43 }, { "fieldValue": "02/1987", "weight": 41 }, { "fieldValue": "03/1993", "weight": 33 }, { "fieldValue": "08/1985", "weight": 31 }, { "fieldValue": "08/1988", "weight": 30 }, { "fieldValue": "11/1988", "weight": 29 }, { "fieldValue": "02/1982", "weight": 27 }, { "fieldValue": "11/1990", "weight": 26 }, { "fieldValue": "11/1989", "weight": 26 } ] }, { "fieldName": "aero_cht", "values": [ { "fieldValue": "NEW YORK", "weight": 406 }, { "fieldValue": "CHICAGO", "weight": 345 }, { "fieldValue": "DETROIT", "weight": 329 }, { "fieldValue": "DALLAS-FT WORTH", "weight": 247 }, { "fieldValue": "ST LOUIS", "weight": 226 }, { "fieldValue": "SEATTLE", "weight": 221 }, { "fieldValue": "WASHINGTON", "weight": 215 }, { "fieldValue": "HOUSTON", "weight": 210 }, { "fieldValue": "ATLANTA", "weight": 183 }, { "fieldValue": "KANSAS CITY", "weight": 178 } ] }, { "fieldName": "c_ldg_rts", "values": [ { "fieldValue": null, "weight": 3801 }, { "fieldValue": "N", "weight": 1122 }, { "fieldValue": "Y", "weight": 77 } ] }, { "fieldName": "cbd_dir", "values": [ { "fieldValue": "N", "weight": 972 }, { "fieldValue": "SW", "weight": 607 }, { "fieldValue": "NW", "weight": 603 }, { "fieldValue": "SE", "weight": 592 }, { "fieldValue": "NE", "weight": 580 }, { "fieldValue": "S", "weight": 552 }, { "fieldValue": "W", "weight": 539 }, { "fieldValue": "E", "weight": 508 }, { "fieldValue": null, "weight": 25 }, { "fieldValue": "SSW", "weight": 5 } ] }, { "fieldName": "cbd_dist", "values": [ { "fieldValue": "0 to 55", "weight": 5000 } ] }, { "fieldName": "cert", "values": [ { "fieldValue": null, "weight": 4837 }, { "fieldValue": "CS 05/1973", "weight": 26 }, { "fieldValue": "AS 05/1973", "weight": 24 }, { "fieldValue": "BS 05/1973", "weight": 19 }, { "fieldValue": "LU 08/1990", "weight": 10 }, { "fieldValue": "AU 05/1973", "weight": 9 }, { "fieldValue": "LU 05/1973", "weight": 7 }, { "fieldValue": "ES 05/1973", "weight": 6 }, { "fieldValue": "DS 05/1973", "weight": 5 }, { "fieldValue": "LU 08/1976", "weight": 2 } ] }, { "fieldName": "city", "values": [ { "fieldValue": "HOUSTON", "weight": 22 }, { "fieldValue": "LOS ANGELES", "weight": 17 }, { "fieldValue": "COLUMBUS", "weight": 17 }, { "fieldValue": "RICHMOND", "weight": 11 }, { "fieldValue": "SPRINGFIELD", "weight": 10 }, { "fieldValue": "MARION", "weight": 10 }, { "fieldValue": "CLINTON", "weight": 10 }, { "fieldValue": "SAN ANTONIO", "weight": 10 }, { "fieldValue": "COLUMBIA", "weight": 10 }, { "fieldValue": "LEXINGTON", "weight": 9 } ] }, { "fieldName": "cntl_twr", "values": [ { "fieldValue": "N", "weight": 4825 }, { "fieldValue": "Y", "weight": 175 } ] }, { "fieldName": "code", "values": [ { "fieldValue": "M86", "weight": 1 }, { "fieldValue": "TE55", "weight": 1 }, { "fieldValue": "CT93", "weight": 1 }, { "fieldValue": "8D1", "weight": 1 }, { "fieldValue": "0CT1", "weight": 1 }, { "fieldValue": "OH54", "weight": 1 }, { "fieldValue": "2LA8", "weight": 1 }, { "fieldValue": "CT74", "weight": 1 }, { "fieldValue": "17Y", "weight": 1 }, { "fieldValue": "TX24", "weight": 1 } ] }, { "fieldName": "county", "values": [ { "fieldValue": "WASHINGTON", "weight": 55 }, { "fieldValue": "JEFFERSON", "weight": 53 }, { "fieldValue": "FRANKLIN", "weight": 43 }, { "fieldValue": "LOS ANGELES", "weight": 43 }, { "fieldValue": "MONTGOMERY", "weight": 38 }, { "fieldValue": "JACKSON", "weight": 36 }, { "fieldValue": "LINCOLN", "weight": 32 }, { "fieldValue": "ADAMS", "weight": 31 }, { "fieldValue": "MADISON", "weight": 28 }, { "fieldValue": "POLK", "weight": 27 } ] }, { "fieldName": "cust_intl", "values": [ { "fieldValue": null, "weight": 3802 }, { "fieldValue": "N", "weight": 1183 }, { "fieldValue": "Y", "weight": 15 } ] }, { "fieldName": "elevation", "values": [ { "fieldValue": "-55 to 11294", "weight": 5000 } ] }, { "fieldName": "faa_dist", "values": [ { "fieldValue": "NONE", "weight": 1792 }, { "fieldValue": "CHI", "weight": 375 }, { "fieldValue": "DET", "weight": 348 }, { "fieldValue": "SEA", "weight": 330 }, { "fieldValue": "MSP", "weight": 254 }, { "fieldValue": "ATL", "weight": 249 }, { "fieldValue": "NYC", "weight": 223 }, { "fieldValue": "HAR", "weight": 217 }, { "fieldValue": "ORL", "weight": 204 }, { "fieldValue": "DEN", "weight": 170 } ] }, { "fieldName": "faa_region", "values": [ { "fieldValue": "AGL", "weight": 1116 }, { "fieldValue": "ASW", "weight": 865 }, { "fieldValue": "ASO", "weight": 727 }, { "fieldValue": "AEA", "weight": 637 }, { "fieldValue": "ANM", "weight": 569 }, { "fieldValue": "ACE", "weight": 395 }, { "fieldValue": "AWP", "weight": 352 }, { "fieldValue": "ANE", "weight": 197 }, { "fieldValue": "AAL", "weight": 141 }, { "fieldValue": null, "weight": 1 } ] }, { "fieldName": "fac_type", "values": [ { "fieldValue": "AIRPORT", "weight": 3586 }, { "fieldValue": "HELIPORT", "weight": 1258 }, { "fieldValue": "SEAPLANE BASE", "weight": 95 }, { "fieldValue": "ULTRALIGHT", "weight": 27 }, { "fieldValue": "STOLPORT", "weight": 23 }, { "fieldValue": "GLIDERPORT", "weight": 8 }, { "fieldValue": "BALLOONPORT", "weight": 3 } ] }, { "fieldName": "fac_use", "values": [ { "fieldValue": "PR", "weight": 3650 }, { "fieldValue": "PU", "weight": 1350 } ] }, { "fieldName": "fed_agree", "values": [ { "fieldValue": null, "weight": 4113 }, { "fieldValue": "NGY", "weight": 433 }, { "fieldValue": "N", "weight": 126 }, { "fieldValue": "NGY3", "weight": 60 }, { "fieldValue": "NGPY", "weight": 43 }, { "fieldValue": "N1", "weight": 23 }, { "fieldValue": "NY1", "weight": 22 }, { "fieldValue": "NGSY", "weight": 21 }, { "fieldValue": "NGPY3", "weight": 19 }, { "fieldValue": "1", "weight": 16 } ] }, { "fieldName": "full_name", "values": [ { "fieldValue": "FLYING S RANCH", "weight": 6 }, { "fieldValue": "MILLER", "weight": 5 }, { "fieldValue": "FLYING W", "weight": 5 }, { "fieldValue": "TAYLOR", "weight": 5 }, { "fieldValue": "MEMORIAL HOSPITAL", "weight": 4 }, { "fieldValue": "ST MARY'S HOSPITAL", "weight": 4 }, { "fieldValue": "MATTHEWS", "weight": 3 }, { "fieldValue": "LANCASTER", "weight": 3 }, { "fieldValue": "MERCY HOSPITAL", "weight": 3 }, { "fieldValue": "ANDERSON", "weight": 3 } ] }, { "fieldName": "id", "values": [ { "fieldValue": "10 to 19781", "weight": 5000 } ] }, { "fieldName": "joint_use", "values": [ { "fieldValue": null, "weight": 3734 }, { "fieldValue": "N", "weight": 1219 }, { "fieldValue": "Y", "weight": 47 } ] }, { "fieldName": "latitude", "values": [ { "fieldValue": "-14.22 to 71.05", "weight": 5000 } ] }, { "fieldName": "longitude", "values": [ { "fieldValue": "-100.0 to 174.11", "weight": 5000 } ] }, { "fieldName": "major", "values": [ { "fieldValue": "N", "weight": 4921 }, { "fieldValue": "Y", "weight": 79 } ] }, { "fieldName": "mil_rts", "values": [ { "fieldValue": null, "weight": 3697 }, { "fieldValue": "Y", "weight": 751 }, { "fieldValue": "N", "weight": 552 } ] }, { "fieldName": "own_type", "values": [ { "fieldValue": "PR", "weight": 3643 }, { "fieldValue": "PU", "weight": 1276 }, { "fieldValue": "MR", "weight": 38 }, { "fieldValue": "MA", "weight": 26 }, { "fieldValue": "MN", "weight": 17 } ] }, { "fieldName": "site_number", "values": [ { "fieldValue": "12809.122*A", "weight": 1 }, { "fieldValue": "24835.1*A", "weight": 1 }, { "fieldValue": "17399.58*A", "weight": 1 }, { "fieldValue": "09123.*A", "weight": 1 }, { "fieldValue": "02978.15*H", "weight": 1 }, { "fieldValue": "07939.*A", "weight": 1 }, { "fieldValue": "23522.*A", "weight": 1 }, { "fieldValue": "09450.2*H", "weight": 1 }, { "fieldValue": "03308.35*A", "weight": 1 }, { "fieldValue": "03025.1*A", "weight": 1 } ] }, { "fieldName": "state", "values": [ { "fieldValue": "TX", "weight": 500 }, { "fieldValue": "CA", "weight": 225 }, { "fieldValue": "OH", "weight": 218 }, { "fieldValue": "IL", "weight": 218 }, { "fieldValue": "PA", "weight": 204 }, { "fieldValue": "FL", "weight": 195 }, { "fieldValue": "IN", "weight": 157 }, { "fieldValue": "WA", "weight": 144 }, { "fieldValue": "AK", "weight": 141 }, { "fieldValue": "MO", "weight": 135 } ] }, { "fieldName": null, "values": [ { "fieldValue": null, "weight": 5000 } ] } ]
WITH __stage0 AS ( SELECT * from (SELECT * FROM '../data/airports.parquet' USING SAMPLE 5000) as x limit 100000 ) , __stage1 AS ( SELECT group_set, CASE group_set WHEN 0 THEN 'act_date' WHEN 1 THEN 'aero_cht' WHEN 2 THEN 'c_ldg_rts' WHEN 3 THEN 'cbd_dir' WHEN 4 THEN 'cbd_dist' WHEN 5 THEN 'cert' WHEN 6 THEN 'city' WHEN 7 THEN 'cntl_twr' WHEN 8 THEN 'code' WHEN 9 THEN 'county' WHEN 10 THEN 'cust_intl' WHEN 11 THEN 'elevation' WHEN 12 THEN 'faa_dist' WHEN 13 THEN 'faa_region' WHEN 14 THEN 'fac_type' WHEN 15 THEN 'fac_use' WHEN 16 THEN 'fed_agree' WHEN 17 THEN 'full_name' WHEN 18 THEN 'id' WHEN 19 THEN 'joint_use' WHEN 20 THEN 'latitude' WHEN 21 THEN 'longitude' WHEN 22 THEN 'major' WHEN 23 THEN 'mil_rts' WHEN 24 THEN 'own_type' WHEN 25 THEN 'site_number' WHEN 26