累计计算与运行总计

概述

累计计算(Running Total / Cumulative Aggregation)是数据分析中的高频需求:统计截至某时间点的累计销售额、计算滚动 N 日均值、追踪指标相对起点的变化幅度。云器 Lakehouse 通过窗口函数的

ORDER BY
ORDER BY
子句和帧规范(
ROWS BETWEEN
ROWS BETWEEN
)实现这类计算,无需自连接或子查询。

涉及的 SQL 语法

语法用途
SUM(...) OVER (ORDER BY ...)
SUM(...) OVER (ORDER BY ...)
累计求和(默认帧:从第一行到当前行)
ROWS BETWEEN N PRECEDING AND CURRENT ROW
ROWS BETWEEN N PRECEDING AND CURRENT ROW
滚动 N 行窗口
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
从分区起点到当前行
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
整个分区
FIRST_VALUE / LAST_VALUE
FIRST_VALUE / LAST_VALUE
取窗口内第一行/最后一行的值
LAG(col, n)
LAG(col, n)
取前 n 行的值,用于环比计算

前置数据

以下示例使用两张测试表:

-- 日销售明细表 CREATE TABLE IF NOT EXISTS doc_running_sales ( sale_date DATE, region VARCHAR(20), amount DECIMAL(10,2) ); INSERT INTO doc_running_sales VALUES (CAST('2024-01-01' AS DATE), '华东', 1200.00), (CAST('2024-01-02' AS DATE), '华东', 850.00), (CAST('2024-01-03' AS DATE), '华东', 1500.00), (CAST('2024-01-04' AS DATE), '华东', 600.00), (CAST('2024-01-05' AS DATE), '华东', 2100.00), (CAST('2024-01-01' AS DATE), '华西', 900.00), (CAST('2024-01-02' AS DATE), '华西', 1100.00), (CAST('2024-01-03' AS DATE), '华西', 750.00), (CAST('2024-01-04' AS DATE), '华西', 1300.00), (CAST('2024-01-05' AS DATE), '华西', 980.00); -- 月度收入表 CREATE TABLE IF NOT EXISTS doc_running_monthly ( month_date DATE, category VARCHAR(20), revenue DECIMAL(12,2) ); INSERT INTO doc_running_monthly VALUES (CAST('2024-01-01' AS DATE), '服装', 50000.00), (CAST('2024-02-01' AS DATE), '服装', 45000.00), (CAST('2024-03-01' AS DATE), '服装', 62000.00), (CAST('2024-04-01' AS DATE), '服装', 58000.00), (CAST('2024-01-01' AS DATE), '电子', 120000.00), (CAST('2024-02-01' AS DATE), '电子', 135000.00), (CAST('2024-03-01' AS DATE), '电子', 98000.00), (CAST('2024-04-01' AS DATE), '电子', 145000.00);


场景一:按分区累计求和

计算每个大区每天的累计销售额。

SELECT sale_date, region, amount, SUM(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS running_total FROM doc_running_sales ORDER BY region, sale_date;

结果:

sale_dateregionamountrunning_total
2024-01-01华东1200.001200.00
2024-01-02华东850.002050.00
2024-01-03华东1500.003550.00
2024-01-04华东600.004150.00
2024-01-05华东2100.006250.00
2024-01-01华西900.00900.00
2024-01-02华西1100.002000.00
2024-01-03华西750.002750.00
2024-01-04华西1300.004050.00
2024-01-05华西980.005030.00

场景二:累计总量占比(YTD 进度)

计算每个品类每月的年累计收入及其占全年总收入的比例。

SELECT month_date, category, revenue, SUM(revenue) OVER ( PARTITION BY category ORDER BY month_date ) AS ytd_revenue, SUM(revenue) OVER (PARTITION BY category) AS annual_total, ROUND( SUM(revenue) OVER (PARTITION BY category ORDER BY month_date) / SUM(revenue) OVER (PARTITION BY category) * 100, 2 ) AS ytd_pct FROM doc_running_monthly ORDER BY category, month_date;

结果:

month_datecategoryrevenueytd_revenueannual_totalytd_pct
2024-01-01服装50000.0050000.00215000.0023.26
2024-02-01服装45000.0095000.00215000.0044.19
2024-03-01服装62000.00157000.00215000.0073.02
2024-04-01服装58000.00215000.00215000.00100.00
2024-01-01电子120000.00120000.00498000.0024.10
2024-02-01电子135000.00255000.00498000.0051.20
2024-03-01电子98000.00353000.00498000.0070.88
2024-04-01电子145000.00498000.00498000.00100.00

场景三:滚动 N 行窗口(移动平均)

计算每个大区最近 3 天的移动平均销售额。

SELECT sale_date, region, amount, AVG(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_avg_3day FROM doc_running_sales ORDER BY region, sale_date;

结果:

sale_dateregionamountmoving_avg_3day
2024-01-01华东1200.001200.000000
2024-01-02华东850.001025.000000
2024-01-03华东1500.001183.333333
2024-01-04华东600.00983.333333
2024-01-05华东2100.001400.000000

场景四:先聚合再滚动(两层窗口)

先按天汇总全部大区的销售额,再计算 7 天滚动总和。

SELECT sale_date, SUM(amount) AS daily_total, SUM(SUM(amount)) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS rolling_7day_sum, AVG(SUM(amount)) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS rolling_7day_avg FROM doc_running_sales GROUP BY sale_date ORDER BY sale_date;

结果:

sale_datedaily_totalrolling_7day_sumrolling_7day_avg
2024-01-012100.002100.002100.000000
2024-01-021950.004050.002025.000000
2024-01-032250.006300.002100.000000
2024-01-041900.008200.002050.000000
2024-01-053080.0011280.002256.000000

场景五:累计最大值 / 最小值

追踪每个大区历史最高和最低单日销售额。

SELECT sale_date, region, amount, MAX(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_max, MIN(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_min FROM doc_running_sales ORDER BY region, sale_date;

结果(华东部分):

sale_dateregionamountrunning_maxrunning_min
2024-01-01华东1200.001200.001200.00
2024-01-02华东850.001200.00850.00
2024-01-03华东1500.001500.00850.00
2024-01-04华东600.001500.00600.00
2024-01-05华东2100.002100.00600.00

场景六:环比变化(MoM / DoD)

计算每个品类每月的环比变化额和变化率。

SELECT month_date, category, revenue, LAG(revenue, 1) OVER ( PARTITION BY category ORDER BY month_date ) AS prev_month, revenue - LAG(revenue, 1) OVER ( PARTITION BY category ORDER BY month_date ) AS mom_change, ROUND( (revenue - LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date)) / LAG(revenue, 1) OVER (PARTITION BY category ORDER BY month_date) * 100, 2 ) AS mom_pct FROM doc_running_monthly ORDER BY category, month_date;

结果(服装品类):

month_datecategoryrevenueprev_monthmom_changemom_pct
2024-01-01服装50000.00NULLNULLNULL
2024-02-01服装45000.0050000.00-5000.00-10.00
2024-03-01服装62000.0045000.0017000.0037.78
2024-04-01服装58000.0062000.00-4000.00-6.45

场景七:相对起点的变化量(FIRST_VALUE)

计算每个大区每天相对于第一天的销售额变化,以及最终一天的销售额。

SELECT sale_date, region, amount, FIRST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS first_day_amount, LAST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_day_amount, amount - FIRST_VALUE(amount) OVER ( PARTITION BY region ORDER BY sale_date ) AS change_from_start FROM doc_running_sales ORDER BY region, sale_date;

结果(华东部分):

sale_dateregionamountfirst_day_amountlast_day_amountchange_from_start
2024-01-01华东1200.001200.002100.000.00
2024-01-02华东850.001200.002100.00-350.00
2024-01-03华东1500.001200.002100.00300.00
2024-01-04华东600.001200.002100.00-600.00
2024-01-05华东2100.001200.002100.00900.00

注意事项

  • 帧规范默认值
    ORDER BY
    ORDER BY
    存在时,默认帧是
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    (基于值范围)。对于日期列,如果有重复日期,
    RANGE
    RANGE
    ROWS
    ROWS
    的结果可能不同。需要精确控制时,显式写
    ROWS BETWEEN
    ROWS BETWEEN
  • 两层窗口:对
    GROUP BY
    GROUP BY
    聚合结果再做窗口计算时,窗口函数参数必须包裹聚合函数(如
    SUM(SUM(col))
    SUM(SUM(col))
    ),否则报错。
  • LAST_VALUE 注意事项:不加
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    时,
    LAST_VALUE
    LAST_VALUE
    只返回当前行的值(因为默认帧到当前行为止)。
  • NULL 处理
    LAG
    LAG
    /
    LEAD
    LEAD
    在边界处返回 NULL,可用
    COALESCE(LAG(col, 1), 0)
    COALESCE(LAG(col, 1), 0)
    替换默认值。
联系我们
预约咨询
微信咨询
电话咨询