累计计算(Running Total / Cumulative Aggregation)是数据分析中的高频需求:统计截至某时间点的累计销售额、计算滚动 N 日均值、追踪指标相对起点的变化幅度。云器 Lakehouse 通过窗口函数的
ORDER BY
ORDER BY
子句和帧规范(
ROWS BETWEEN
ROWS BETWEEN
)实现这类计算,无需自连接或子查询。
涉及的 SQL 语法
语法
用途
SUM(...) OVER (ORDER BY ...)
SUM(...) OVER (ORDER BY ...)
累计求和(默认帧:从第一行到当前行)
ROWS BETWEEN N PRECEDING AND CURRENT ROW
ROWS BETWEEN N PRECEDING AND CURRENT ROW
滚动 N 行窗口
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
从分区起点到当前行
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
整个分区
FIRST_VALUE / LAST_VALUE
FIRST_VALUE / LAST_VALUE
取窗口内第一行/最后一行的值
LAG(col, n)
LAG(col, n)
取前 n 行的值,用于环比计算
前置数据
以下示例使用两张测试表:
-- 日销售明细表
CREATE TABLE IF NOT EXISTS doc_running_sales (
sale_date DATE,
region VARCHAR(20),
amount DECIMAL(10,2)
);
INSERT INTO doc_running_sales VALUES
(CAST('2024-01-01' AS DATE), '华东', 1200.00),
(CAST('2024-01-02' AS DATE), '华东', 850.00),
(CAST('2024-01-03' AS DATE), '华东', 1500.00),
(CAST('2024-01-04' AS DATE), '华东', 600.00),
(CAST('2024-01-05' AS DATE), '华东', 2100.00),
(CAST('2024-01-01' AS DATE), '华西', 900.00),
(CAST('2024-01-02' AS DATE), '华西', 1100.00),
(CAST('2024-01-03' AS DATE), '华西', 750.00),
(CAST('2024-01-04' AS DATE), '华西', 1300.00),
(CAST('2024-01-05' AS DATE), '华西', 980.00);
-- 月度收入表
CREATE TABLE IF NOT EXISTS doc_running_monthly (
month_date DATE,
category VARCHAR(20),
revenue DECIMAL(12,2)
);
INSERT INTO doc_running_monthly VALUES
(CAST('2024-01-01' AS DATE), '服装', 50000.00),
(CAST('2024-02-01' AS DATE), '服装', 45000.00),
(CAST('2024-03-01' AS DATE), '服装', 62000.00),
(CAST('2024-04-01' AS DATE), '服装', 58000.00),
(CAST('2024-01-01' AS DATE), '电子', 120000.00),
(CAST('2024-02-01' AS DATE), '电子', 135000.00),
(CAST('2024-03-01' AS DATE), '电子', 98000.00),
(CAST('2024-04-01' AS DATE), '电子', 145000.00);
场景一:按分区累计求和
计算每个大区每天的累计销售额。
SELECT
sale_date,
region,
amount,
SUM(amount) OVER (
PARTITION BY region
ORDER BY sale_date
) AS running_total
FROM doc_running_sales
ORDER BY region, sale_date;
结果:
sale_date
region
amount
running_total
2024-01-01
华东
1200.00
1200.00
2024-01-02
华东
850.00
2050.00
2024-01-03
华东
1500.00
3550.00
2024-01-04
华东
600.00
4150.00
2024-01-05
华东
2100.00
6250.00
2024-01-01
华西
900.00
900.00
2024-01-02
华西
1100.00
2000.00
2024-01-03
华西
750.00
2750.00
2024-01-04
华西
1300.00
4050.00
2024-01-05
华西
980.00
5030.00
ORDER BY
ORDER BY
不带
ROWS BETWEEN
ROWS BETWEEN
时,默认帧是
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
,即从分区起点到当前行。
场景二:累计总量占比(YTD 进度)
计算每个品类每月的年累计收入及其占全年总收入的比例。
SELECT
month_date,
category,
revenue,
SUM(revenue) OVER (
PARTITION BY category
ORDER BY month_date
) AS ytd_revenue,
SUM(revenue) OVER (PARTITION BY category) AS annual_total,
ROUND(
SUM(revenue) OVER (PARTITION BY category ORDER BY month_date)
/ SUM(revenue) OVER (PARTITION BY category) * 100,
2
) AS ytd_pct
FROM doc_running_monthly
ORDER BY category, month_date;
结果:
month_date
category
revenue
ytd_revenue
annual_total
ytd_pct
2024-01-01
服装
50000.00
50000.00
215000.00
23.26
2024-02-01
服装
45000.00
95000.00
215000.00
44.19
2024-03-01
服装
62000.00
157000.00
215000.00
73.02
2024-04-01
服装
58000.00
215000.00
215000.00
100.00
2024-01-01
电子
120000.00
120000.00
498000.00
24.10
2024-02-01
电子
135000.00
255000.00
498000.00
51.20
2024-03-01
电子
98000.00
353000.00
498000.00
70.88
2024-04-01
电子
145000.00
498000.00
498000.00
100.00
场景三:滚动 N 行窗口(移动平均)
计算每个大区最近 3 天的移动平均销售额。
SELECT
sale_date,
region,
amount,
AVG(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day
FROM doc_running_sales
ORDER BY region, sale_date;
结果:
sale_date
region
amount
moving_avg_3day
2024-01-01
华东
1200.00
1200.000000
2024-01-02
华东
850.00
1025.000000
2024-01-03
华东
1500.00
1183.333333
2024-01-04
华东
600.00
983.333333
2024-01-05
华东
2100.00
1400.000000
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
表示当前行及前 2 行,共 3 行。分区起点处不足 3 行时,只对已有行求均值。
场景四:先聚合再滚动(两层窗口)
先按天汇总全部大区的销售额,再计算 7 天滚动总和。
SELECT
sale_date,
SUM(amount) AS daily_total,
SUM(SUM(amount)) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_sum,
AVG(SUM(amount)) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM doc_running_sales
GROUP BY sale_date
ORDER BY sale_date;
结果:
sale_date
daily_total
rolling_7day_sum
rolling_7day_avg
2024-01-01
2100.00
2100.00
2100.000000
2024-01-02
1950.00
4050.00
2025.000000
2024-01-03
2250.00
6300.00
2100.000000
2024-01-04
1900.00
8200.00
2050.000000
2024-01-05
3080.00
11280.00
2256.000000
对聚合结果再做窗口计算时,窗口函数的参数必须是聚合表达式(
SUM(SUM(amount))
SUM(SUM(amount))
),不能直接写列名。
场景五:累计最大值 / 最小值
追踪每个大区历史最高和最低单日销售额。
SELECT
sale_date,
region,
amount,
MAX(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_max,
MIN(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_min
FROM doc_running_sales
ORDER BY region, sale_date;
结果(华东部分):
sale_date
region
amount
running_max
running_min
2024-01-01
华东
1200.00
1200.00
1200.00
2024-01-02
华东
850.00
1200.00
850.00
2024-01-03
华东
1500.00
1500.00
850.00
2024-01-04
华东
600.00
1500.00
600.00
2024-01-05
华东
2100.00
2100.00
600.00
场景六:环比变化(MoM / DoD)
计算每个品类每月的环比变化额和变化率。
SELECT
month_date,
category,
revenue,
LAG(revenue, 1) OVER (
PARTITION BY category
ORDER BY month_date
) AS prev_month,
revenue - LAG(revenue, 1) OVER (
PARTITION BY category
ORDER BY month_date
) AS mom_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date))
/ LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date) * 100,
2
) AS mom_pct
FROM doc_running_monthly
ORDER BY category, month_date;
结果(服装品类):
month_date
category
revenue
prev_month
mom_change
mom_pct
2024-01-01
服装
50000.00
NULL
NULL
NULL
2024-02-01
服装
45000.00
50000.00
-5000.00
-10.00
2024-03-01
服装
62000.00
45000.00
17000.00
37.78
2024-04-01
服装
58000.00
62000.00
-4000.00
-6.45
第一行没有前一期数据,
LAG
LAG
返回 NULL,环比计算结果也为 NULL。如需替换为 0,用
COALESCE(LAG(...), 0)
COALESCE(LAG(...), 0)
。
场景七:相对起点的变化量(FIRST_VALUE)
计算每个大区每天相对于第一天的销售额变化,以及最终一天的销售额。
SELECT
sale_date,
region,
amount,
FIRST_VALUE(amount) OVER (
PARTITION BY region
ORDER BY sale_date
) AS first_day_amount,
LAST_VALUE(amount) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_day_amount,
amount - FIRST_VALUE(amount) OVER (
PARTITION BY region
ORDER BY sale_date
) AS change_from_start
FROM doc_running_sales
ORDER BY region, sale_date;
结果(华东部分):
sale_date
region
amount
first_day_amount
last_day_amount
change_from_start
2024-01-01
华东
1200.00
1200.00
2100.00
0.00
2024-01-02
华东
850.00
1200.00
2100.00
-350.00
2024-01-03
华东
1500.00
1200.00
2100.00
300.00
2024-01-04
华东
600.00
1200.00
2100.00
-600.00
2024-01-05
华东
2100.00
1200.00
2100.00
900.00
LAST_VALUE
LAST_VALUE
默认帧只到当前行,必须显式指定
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
才能取到分区最后一行的值。
注意事项
帧规范默认值:
ORDER BY
ORDER BY
存在时,默认帧是
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
(基于值范围)。对于日期列,如果有重复日期,
RANGE
RANGE
和
ROWS
ROWS
的结果可能不同。需要精确控制时,显式写
ROWS BETWEEN
ROWS BETWEEN
。
两层窗口:对
GROUP BY
GROUP BY
聚合结果再做窗口计算时,窗口函数参数必须包裹聚合函数(如
SUM(SUM(col))
SUM(SUM(col))
),否则报错。
LAST_VALUE 注意事项:不加
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING