SELECT e.id, e.name, s.skill
FROM emp_skills e
LATERAL VIEW EXPLODE(e.skills) s AS skill
ORDER BY e.id, s.skill;
执行结果:
+----+-------------+--------+
| id | name | skill |
+----+-------------+--------+
| 1 | John Doe | Java |
| 1 | John Doe | Python |
| 1 | John Doe | SQL |
| 2 | Jane Smith | C++ |
| 2 | Jane Smith | Hadoop |
| 2 | Jane Smith | SQL |
| 3 | Bob Johnson | Docker |
| 3 | Bob Johnson | Python |
+----+-------------+--------+
注意:
id=4
id=4
(Alice Wang,skills 为 NULL)的行被过滤掉了。
示例 2:LATERAL VIEW OUTER — 保留空数组/NULL 行
SELECT e.id, e.name, s.skill
FROM emp_skills e
LATERAL VIEW OUTER EXPLODE(e.skills) s AS skill
ORDER BY e.id, s.skill;
执行结果:
+----+-------------+--------+
| id | name | skill |
+----+-------------+--------+
| 1 | John Doe | Java |
| 1 | John Doe | Python |
| 1 | John Doe | SQL |
| 2 | Jane Smith | C++ |
| 2 | Jane Smith | Hadoop |
| 2 | Jane Smith | SQL |
| 3 | Bob Johnson | Docker |
| 3 | Bob Johnson | Python |
| 4 | Alice Wang | NULL |
+----+-------------+--------+
加上
OUTER
OUTER
后,
id=4
id=4
的行被保留,
skill
skill
列为 NULL。
示例 3:POSEXPLODE — 展开数组并附带位置索引
POSEXPLODE
POSEXPLODE
在展开数组的同时返回每个元素的位置索引(从 0 开始)。
SELECT e.id, e.name, ps.pos, ps.skill
FROM emp_skills e
LATERAL VIEW POSEXPLODE(e.skills) ps AS pos, skill
ORDER BY e.id, ps.pos;
执行结果:
+----+-------------+-----+--------+
| id | name | pos | skill |
+----+-------------+-----+--------+
| 1 | John Doe | 0 | Java |
| 1 | John Doe | 1 | Python |
| 1 | John Doe | 2 | SQL |
| 2 | Jane Smith | 0 | C++ |
| 2 | Jane Smith | 1 | Hadoop |
| 2 | Jane Smith | 2 | SQL |
| 3 | Bob Johnson | 0 | Python |
| 3 | Bob Johnson | 1 | Docker |
+----+-------------+-----+--------+
示例 4:多个 LATERAL VIEW — 展开多列数组
一个查询中可以链式使用多个
LATERAL VIEW
LATERAL VIEW
,分别展开不同的数组列。
-- 假设表中同时有 skills 和 languages 两个数组列
SELECT e.id, e.name, s.skill, l.lang
FROM emp_skills e
LATERAL VIEW EXPLODE(e.skills) s AS skill
LATERAL VIEW EXPLODE(ARRAY('Chinese', 'English')) l AS lang
ORDER BY e.id, s.skill, l.lang
LIMIT 6;
示例 5:展开 MAP 类型
EXPLODE
EXPLODE
也支持 MAP 类型,展开后生成
key
key
和
value
value
两列。
-- 假设 scores 列类型为 MAP<STRING, INT>
SELECT id, name, kv.subject, kv.score
FROM student_scores
LATERAL VIEW EXPLODE(scores) kv AS subject, score
ORDER BY id, subject;
注意事项
LATERAL VIEW
LATERAL VIEW
必须紧跟
FROM
FROM
子句中的表引用,不能放在
WHERE
WHERE
之后。
不加
OUTER
OUTER
时,数组为空(
ARRAY()
ARRAY()
)或 NULL 的行会被过滤,行为类似 INNER JOIN。
POSEXPLODE
POSEXPLODE
输出两列(位置索引和值),
AS
AS
后需要提供两个列别名。
多个
LATERAL VIEW
LATERAL VIEW
之间是笛卡尔积关系,展开多个数组时结果行数会成倍增加,使用时需注意数据量。
JOIN LATERAL
JOIN LATERAL
JOIN LATERAL
与
LATERAL VIEW
LATERAL VIEW
不同——它允许在 JOIN 的右侧使用相关子查询,子查询可以引用左侧表的列。
语法
SELECT ...
FROM table1 t1
[INNER | LEFT] JOIN LATERAL (subquery) t2 ON condition
INNER JOIN LATERAL
INNER JOIN LATERAL
:左表行在子查询返回空时被过滤(默认)
LEFT JOIN LATERAL
LEFT JOIN LATERAL
(或
LEFT OUTER JOIN LATERAL
LEFT OUTER JOIN LATERAL
):左表行即使子查询返回空也保留,用 NULL 填充
使用示例
-- 找出每个分类中价值最高的商品(相关子查询)
SELECT c.cat, top.id, top.v
FROM (SELECT DISTINCT cat FROM doc_sel2) c
JOIN LATERAL (
SELECT id, v FROM doc_sel2
WHERE doc_sel2.cat = c.cat
ORDER BY v DESC
LIMIT 1
) top ON true;