构建零售连锁门店运营数仓

整合全国 100 家连锁门店的 POS 交易数据与库存数据,构建门店运营分析与 SKU 周转数仓。本文使用 Retail Data Warehouse 12-Table Dataset,端到端演示从 ODS 原始 POS 交易到 ADS 畅滞销 SKU 分析和门店排行的完整构建过程,覆盖 MySQL CDC 多表全库镜像、Dynamic Table 增量聚合、按 store_id 静态分区、Bloomfilter Index、External Schema(Hive)和 Time Travel 月末对账六项核心能力。


概述

连锁零售数仓面临的核心挑战:

问题云器解决方案
全国各区域门店 POS 系统分库分表,需统一汇聚MySQL CDC 多表全库镜像,分库合并写入同一 ODS 层
ODS→DWD→DWS→ADS 多层聚合,手工调度易出错Dynamic Table 级联刷新,声明式 SQL,系统自动维护依赖链
单门店历史订单查询慢,跨分区全扫代价高
PARTITIONED BY (store_id)
PARTITIONED BY (store_id)
+
SESSION_CONFIGS
SESSION_CONFIGS
参数化分区,按门店刷新
订单明细中
product_id
product_id
是高基数列,点查频繁
Bloomfilter Index,毫秒级跳过不匹配数据块
数年前存量历史数据仍在 Hive 集群,不想迁移External Schema(Hive),SQL 直接透视外部历史表
月末对账时需核对历史销售快照与系统当前数据Time Travel,
TIMESTAMP AS OF
TIMESTAMP AS OF
回溯任意历史版本

涉及的 SQL 命令

命令 / 功能用途说明
CREATE TABLE
CREATE TABLE
创建 ODS 原始 POS 交易表和维度表普通表,作为 Dynamic Table 上游
CREATE BLOOMFILTER INDEX
CREATE BLOOMFILTER INDEX
product_id
product_id
列创建 Bloomfilter 索引
适合高基数列的点查过滤
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 DWD / DWS / ADS 增量计算表声明式 SQL,系统自动增量刷新
PARTITIONED BY
PARTITIONED BY
+
SESSION_CONFIGS
SESSION_CONFIGS
按门店 ID 静态分区刷新 DWS 层参数化分区,按门店精确回刷历史数据
REFRESH DYNAMIC TABLE ... PARTITION
REFRESH DYNAMIC TABLE ... PARTITION
刷新 DWS 指定门店分区Studio Task 按门店 + 日期粒度调度
CREATE EXTERNAL SCHEMA
CREATE EXTERNAL SCHEMA
挂载 Hive 存量历史订单数据两层命名查询,无需迁移数据
DESC HISTORY
DESC HISTORY
查看表历史版本列表返回每个版本的时间戳、行数变化
SELECT ... TIMESTAMP AS OF
SELECT ... TIMESTAMP AS OF
月末对账:回溯历史快照定位月末截止时刻的订单总量与收入

前置准备

所有示例在

best_practice_retail_pos
best_practice_retail_pos
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_retail_pos;

下载数据集并解压到本地:

kaggle datasets download -d datarspectrum/retail-data-warehouse-12-table-1m-rows-dataset \ --unzip -p /tmp/retail_pos/

解压后得到 12 个 CSV 文件:

stores.csv
stores.csv
(100 行)、
orders.csv
orders.csv
(30 万行)、
order_items.csv
order_items.csv
(60 万行)、
products.csv
products.csv
(1 万行)等。本文使用其中 100 家门店、100 笔订单、200 条订单明细作为演示数据集。


ODS 层:多门店 POS 原始数据

ODS 层直接承接各区域 POS 系统的原始数据,建表时按业务主键自然分布,不做汇总变换。

MySQL CDC 接入方式

生产环境中,每个区域 POS 系统通常为独立 MySQL 实例,通过 Studio 多表实时同步任务(task_type=281)以 CDC 方式将 Binlog 变更实时同步到 ODS。

源端 MySQL 准备

在源端 MySQL 上确认以下参数已正确配置:

参数要求值查询方法
log_bin
log_bin
ON
SHOW GLOBAL VARIABLES LIKE 'log_bin'
SHOW GLOBAL VARIABLES LIKE 'log_bin'
binlog_format
binlog_format
ROW
SHOW GLOBAL VARIABLES LIKE 'binlog_format'
SHOW GLOBAL VARIABLES LIKE 'binlog_format'
binlog_row_image
binlog_row_image
FULL
SHOW GLOBAL VARIABLES LIKE 'binlog_row_image'
SHOW GLOBAL VARIABLES LIKE 'binlog_row_image'

同步账号需具备以下权限:

GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cdc_user'@'%';

Studio 多表实时同步配置步骤

  1. 在 Studio 数据源管理中添加 MySQL 数据源(每个区域库一个数据源,如
    ds_mysql_pos_north
    ds_mysql_pos_north
    ds_mysql_pos_south
    ds_mysql_pos_south
  2. 在 Studio 开发 -> 多表实时同步中新建任务,选择同步模式:
    • 整库镜像:以数据库为粒度,自动适配新增表,适合首次接入整库
    • 多表镜像:选定指定表同步,适合只需部分表的场景
  3. 配置来源数据源,选择目标 workspace 和 schema(
    best_practice_retail_pos
    best_practice_retail_pos
  4. sync_mode
    sync_mode
    选择全量+增量,首次启动先拉取历史数据再切换 CDC
  5. 提交任务,在 Studio 中手动启动

任务启动后经历:初始化 → 全量同步 → 增量 CDC 三个阶段,增量阶段端到端延迟秒级。多个区域库可分别创建任务,目标端写入同一个 schema 下各自的表。

建表

门店主数据表(分区表)

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_stores ( store_id INT, city STRING );

商品维度表

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_categories ( category_id INT, category_name STRING ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_products ( product_id INT, category_id INT, supplier_id INT, price DOUBLE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_promotions ( promotion_id INT, discount INT -- 折扣百分比,如 24 表示打 76 折 );

客户与订单表

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_customers ( customer_id INT, city STRING, signup_date DATE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_orders ( order_id INT, customer_id INT, store_id INT, order_date DATE, promotion_id INT ) PARTITIONED BY (store_id); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_order_items ( order_item_id INT, order_id INT, product_id INT, qty INT, price DOUBLE );

支付与退货表

CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_payments ( payment_id INT, order_id INT, amount DOUBLE ); CREATE TABLE IF NOT EXISTS best_practice_retail_pos.doc_ods_returns ( return_id INT, order_item_id INT, refund DOUBLE );

写入数据

将 Kaggle CSV 文件通过 cz-cli 导入(实际项目替换为 PIPE 自动摄取)。

从本地 CSV 导入(推荐)

将各表数据保存为 CSV 文件后,通过 User Volume 批量导入:

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/stores.csv' TO USER VOLUME FILE 'stores.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_retail_pos.doc_ods_stores FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('stores.csv');

对其余各表(

doc_ods_orders
doc_ods_orders
doc_ods_order_items
doc_ods_order_items
doc_ods_products
doc_ods_products
doc_ods_customers
doc_ods_customers
doc_ods_payments
doc_ods_payments
doc_ods_returns
doc_ods_returns
等)重复上述步骤,分别上传对应 CSV 文件并执行 COPY INTO。

也可直接内联插入小批量测试数据(不需要 CSV 文件):

-- 示例:写入门店主数据(100 家门店) INSERT INTO best_practice_retail_pos.doc_ods_stores VALUES (1,'Pune'),(2,'Pune'),(3,'Delhi'),(4,'Mumbai'),(5,'Mumbai'), -- ... 共 100 行 (100,'Delhi'); -- 写入订单(100 笔)及订单明细(200 条) INSERT INTO best_practice_retail_pos.doc_ods_orders VALUES (1,45,33,CAST('2021-08-26' AS DATE),24), (2,10,81,CAST('2022-03-19' AS DATE),3), -- ... (100,31,63,CAST('2022-03-30' AS DATE),33);

验证 ODS 各表行数:

SELECT (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_stores) AS stores, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_orders) AS orders, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_order_items) AS items, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_products) AS products, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_payments) AS payments, (SELECT COUNT(*) FROM best_practice_retail_pos.doc_ods_returns) AS returns;

stores | orders | items | products | payments | returns -------+--------+-------+----------+----------+-------- 100 | 100 | 200 | 100 | 100 | 20

创建 Bloomfilter Index

订单明细表按

product_id
product_id
过滤是高频操作(SKU 销量排行、畅滞销分析),
product_id
product_id
基数约为商品数量级,适合 Bloomfilter Index:

CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_product_id ON TABLE doc_ods_order_items (product_id);

External Schema:接入 Hive 存量历史数据

历史归档数据仍在 Hive 集群时,通过 External Schema 直接查询,无需迁移:

-- 步骤 1:创建指向 Hive Metastore 的 Catalog Connection CREATE CATALOG CONNECTION IF NOT EXISTS conn_hive_pos TYPE HMS HIVE_METASTORE_URIS = 'thrift://hive-metastore:9083' STORAGE_CONNECTION = 'conn_oss_archive'; -- 步骤 2:挂载 Hive 数据库为 External Schema CREATE EXTERNAL SCHEMA IF NOT EXISTS pos_hive_archive CONNECTION conn_hive_pos OPTIONS (SCHEMA = 'pos_archive_db'); -- 步骤 3:直接查询历史归档订单(两层命名,像本地表一样) SELECT order_id, store_id, order_date, total_amount FROM pos_hive_archive.historical_orders WHERE store_id = 33 AND order_date >= CAST('2019-01-01' AS DATE) ORDER BY order_date DESC LIMIT 5;


DWD 层:标准化销售明细

DWD 层将 ODS 的多张事实表与维度表 JOIN 打宽,派生折后金额和退货标记,形成每笔交易的完整视图。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dwd_sales_detail AS SELECT oi.order_item_id, o.order_id, o.store_id, s.city AS store_city, o.order_date, o.customer_id, c.city AS customer_city, oi.product_id, p.category_id, cat.category_name, p.price AS list_price, oi.qty, oi.price AS unit_price, CAST(oi.qty * oi.price AS DOUBLE) AS gross_amount, COALESCE(pr.discount, 0) AS discount_pct, ROUND(oi.qty * oi.price * (1.0 - COALESCE(pr.discount, 0) / 100.0), 2) AS net_amount, CASE WHEN r.return_id IS NOT NULL THEN 1 ELSE 0 END AS is_returned, COALESCE(r.refund, 0) AS refund_amount FROM best_practice_retail_pos.doc_ods_order_items oi JOIN best_practice_retail_pos.doc_ods_orders o ON oi.order_id = o.order_id JOIN best_practice_retail_pos.doc_ods_stores s ON o.store_id = s.store_id JOIN best_practice_retail_pos.doc_ods_customers c ON o.customer_id = c.customer_id JOIN best_practice_retail_pos.doc_ods_products p ON oi.product_id = p.product_id JOIN best_practice_retail_pos.doc_ods_categories cat ON p.category_id = cat.category_id LEFT JOIN best_practice_retail_pos.doc_ods_promotions pr ON o.promotion_id = pr.promotion_id LEFT JOIN best_practice_retail_pos.doc_ods_returns r ON oi.order_item_id = r.order_item_id;

手动触发首次刷新:

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dwd_sales_detail;

验证结果:

SELECT COUNT(*) AS dwd_rows FROM best_practice_retail_pos.doc_dwd_sales_detail;

dwd_rows -------- 200

查看前几行,确认

net_amount
net_amount
折扣计算正确:

SELECT order_item_id, store_id, store_city, order_date, product_id, category_name, qty, unit_price, gross_amount, discount_pct, net_amount, is_returned FROM best_practice_retail_pos.doc_dwd_sales_detail ORDER BY order_item_id LIMIT 5;

order_item_id | store_id | store_city | order_date | product_id | category_name | qty | unit_price | gross_amount | discount_pct | net_amount | is_returned --------------+----------+------------+------------+------------+---------------+-----+------------+--------------+--------------+------------+------------- 1 | 33 | Delhi | 2021-08-26 | 72 | Cat_12 | 3 | 176 | 528 | 5 | 501.6 | 0 2 | 33 | Delhi | 2021-08-26 | 10 | Cat_29 | 2 | 316 | 632 | 5 | 600.4 | 0 3 | 81 | Mumbai | 2022-03-19 | 45 | Cat_16 | 1 | 1345 | 1345 | 27 | 981.85 | 0 4 | 17 | Delhi | 2021-01-21 | 23 | Cat_15 | 4 | 2116 | 8464 | 34 | 5586.24 | 0 5 | 85 | Hyderabad | 2021-01-16 | 87 | Cat_24 | 2 | 4567 | 9134 | 22 | 7124.52 | 0

结果解读:第 4 行(product 23,促销折扣 34%)的

net_amount
net_amount
(5586.24)=
8464 × (1 - 0.34)
8464 × (1 - 0.34)
,折扣计算正确。第 3 行
is_returned = 0
is_returned = 0
说明该订单明细未发生退货,LEFT JOIN
doc_ods_returns
doc_ods_returns
返回 NULL 后被
CASE WHEN
CASE WHEN
转为 0。


DWS 层:门店日销售汇总

DWS 层以门店 + 交易日期为粒度汇总 DWD 明细,支持日报表、周趋势、环比分析。

门店日汇总 Dynamic Table

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dws_store_daily_sales AS SELECT store_id, store_city, order_date, COUNT(DISTINCT order_id) AS order_count, COUNT(order_item_id) AS item_count, SUM(qty) AS total_qty, ROUND(SUM(gross_amount), 2) AS gross_revenue, ROUND(SUM(net_amount), 2) AS net_revenue, ROUND(AVG(discount_pct), 2) AS avg_discount_pct, SUM(is_returned) AS return_count, ROUND(SUM(refund_amount), 2) AS total_refund, COUNT(DISTINCT product_id) AS sku_count FROM best_practice_retail_pos.doc_dwd_sales_detail GROUP BY store_id, store_city, order_date;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_daily_sales; SELECT store_id, store_city, order_date, order_count, total_qty, gross_revenue, net_revenue, avg_discount_pct, return_count, sku_count FROM best_practice_retail_pos.doc_dws_store_daily_sales ORDER BY net_revenue DESC LIMIT 8;

store_id | store_city | order_date | order_count | total_qty | gross_revenue | net_revenue | avg_discount_pct | return_count | sku_count ---------+------------+------------+-------------+-----------+---------------+-------------+------------------+--------------+---------- 17 | Delhi | 2021-12-09 | 1 | 6 | 25089 | 22830.99 | 9 | 0 | 2 24 | Pune | 2021-08-05 | 1 | 6 | 22694 | 21105.42 | 7 | 0 | 2 68 | Pune | 2022-01-14 | 1 | 5 | 21725 | 20638.75 | 5 | 0 | 2 54 | Bangalore | 2021-04-26 | 1 | 7 | 21983 | 19564.87 | 11 | 0 | 2 43 | Bangalore | 2022-08-17 | 1 | 5 | 20948 | 18853.2 | 10 | 0 | 2 1 | Pune | 2023-11-27 | 1 | 6 | 22167 | 18398.61 | 17 | 0 | 2 99 | Bangalore | 2022-02-04 | 1 | 5 | 18178 | 17087.32 | 6 | 0 | 2 55 | Hyderabad | 2022-11-06 | 1 | 9 | 31919 | 16934.26 | 25.5 | 0 | 2

结果解读:Delhi 门店 17(2021-12-09)以 ¥22,831 净收入居单日冠军,折扣率仅 9%,说明该门店在节假日前后促销力度小但客单价高。Hyderabad 门店 55(折扣率 25.5%)交易量更大(qty=9),但净收入受折扣压低。

按门店静态分区刷新

当需要精确回刷某门店某月数据时,使用

PARTITIONED BY (store_id)
PARTITIONED BY (store_id)
+
SESSION_CONFIGS
SESSION_CONFIGS
参数化分区:

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_dws_store_date_partition ( store_id, order_date, order_count, item_count, total_qty, net_revenue ) PARTITIONED BY (store_id) AS SELECT store_id, order_date, COUNT(DISTINCT order_id) AS order_count, COUNT(order_item_id) AS item_count, SUM(qty) AS total_qty, ROUND(SUM(net_amount), 2) AS net_revenue FROM best_practice_retail_pos.doc_dwd_sales_detail WHERE store_id = CAST(SESSION_CONFIGS()['dt.args.store_id'] AS INT) GROUP BY store_id, order_date;

刷新门店 33 的分区数据:

SET dt.args.store_id = 33; REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_date_partition PARTITION (store_id = 33);

SELECT * FROM best_practice_retail_pos.doc_dws_store_date_partition WHERE store_id = 33;

store_id | order_date | order_count | item_count | total_qty | net_revenue ---------+------------+-------------+------------+-----------+------------ 33 | 2021-08-26 | 1 | 3 | 6 | 5335.2

每次只刷新指定门店分区,不影响其他门店数据,适合多门店并行补数场景。

配置 Studio 刷新任务

DWD 和 DWS 层的刷新通过 Studio Task 调度,不在 DDL 里写

REFRESH INTERVAL
REFRESH INTERVAL

# 创建 DWD 刷新任务 cz-cli task create refresh_dwd_sales_detail --type SQL -p skill_test # 返回示例: {"data":{"id":10353698,...}} # 保存 SQL 内容 cz-cli task save-content 10353698 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dwd_sales_detail;" \ -p skill_test # 设置调度:每天凌晨 1:30 执行 cz-cli task save-cron 10353698 --cron "0 30 1 * * ?" -p skill_test # 创建 DWS 刷新任务(同样步骤) cz-cli task create refresh_dws_store_daily --type SQL -p skill_test # task id: 10354652 cz-cli task save-content 10354652 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_dws_store_daily_sales;" \ -p skill_test cz-cli task save-cron 10354652 --cron "0 30 1 * * ?" -p skill_test


ADS 层:畅滞销 SKU 分析与门店排行

SKU 销售速度分析

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_ads_sku_velocity AS SELECT product_id, category_id, category_name, SUM(qty) AS total_sold_qty, COUNT(DISTINCT order_id) AS order_count, ROUND(SUM(net_amount), 2) AS total_net_revenue, COUNT(DISTINCT store_id) AS store_coverage, SUM(is_returned) AS return_count, ROUND(SUM(is_returned) * 100.0 / NULLIF(COUNT(*), 0), 2) AS return_rate_pct, CASE WHEN SUM(qty) >= 10 THEN 'fast_moving' WHEN SUM(qty) >= 5 THEN 'normal' ELSE 'slow_moving' END AS velocity_label, ROUND(SUM(net_amount) / NULLIF(COUNT(DISTINCT store_id), 0), 2) AS revenue_per_store FROM best_practice_retail_pos.doc_dwd_sales_detail GROUP BY product_id, category_id, category_name;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_sku_velocity; -- 查看各速度档位 SKU 数和收入分布 SELECT velocity_label, COUNT(*) AS sku_count, ROUND(SUM(total_net_revenue), 2) AS label_revenue FROM best_practice_retail_pos.doc_ads_sku_velocity GROUP BY velocity_label ORDER BY label_revenue DESC;

velocity_label | sku_count | label_revenue ---------------+-----------+-------------- normal | 37 | 520537.33 slow_moving | 61 | 392062.88

结果解读:在 100 个 SKU 中,37 个属于正常动销(qty 5–9),61 个为滞销(qty < 5)。滞销 SKU 贡献了 ¥392,062 的净收入,占总收入约 43%,说明单价高的滞销品依然贡献了相当规模——采购决策时需同时考虑

total_net_revenue
total_net_revenue
total_sold_qty
total_sold_qty
,不能仅凭销量定性。

查看畅销前 10 SKU:

SELECT product_id, category_name, total_sold_qty, order_count, total_net_revenue, store_coverage, velocity_label FROM best_practice_retail_pos.doc_ads_sku_velocity ORDER BY total_sold_qty DESC LIMIT 10;

product_id | category_name | total_sold_qty | order_count | total_net_revenue | store_coverage | velocity_label -----------+---------------+----------------+-------------+-------------------+----------------+--------------- 60 | Cat_30 | 9 | 3 | 7922.28 | 3 | normal 12 | Cat_14 | 9 | 3 | 27547.1 | 3 | normal 37 | Cat_24 | 8 | 3 | 12089 | 3 | normal 87 | Cat_24 | 8 | 4 | 31740.65 | 4 | normal 23 | Cat_15 | 8 | 3 | 10833.92 | 3 | normal 65 | Cat_11 | 7 | 2 | 16566.66 | 2 | normal 56 | Cat_2 | 7 | 3 | 23981.55 | 3 | normal 30 | Cat_4 | 7 | 2 | 11288.1 | 2 | normal 55 | Cat_13 | 7 | 3 | 20519.61 | 3 | normal 14 | Cat_10 | 7 | 3 | 15307.11 | 3 | normal

查看最滞销 SKU(重点清货候选):

SELECT product_id, category_name, total_sold_qty, total_net_revenue, store_coverage, return_rate_pct, velocity_label FROM best_practice_retail_pos.doc_ads_sku_velocity WHERE velocity_label = 'slow_moving' ORDER BY total_sold_qty ASC, total_net_revenue ASC LIMIT 8;

product_id | category_name | total_sold_qty | total_net_revenue | store_coverage | return_rate_pct | velocity_label -----------+---------------+----------------+-------------------+----------------+-----------------+--------------- 54 | Cat_20 | 1 | 2360.82 | 1 | 0.00 | slow_moving 66 | Cat_22 | 1 | 2599.1 | 1 | 0.00 | slow_moving 2 | Cat_18 | 1 | 2835.88 | 1 | 0.00 | slow_moving 20 | Cat_27 | 2 | 954.8 | 2 | 0.00 | slow_moving 89 | Cat_29 | 2 | 2218.36 | 1 | 0.00 | slow_moving 49 | Cat_18 | 2 | 2468.82 | 1 | 0.00 | slow_moving 70 | Cat_3 | 2 | 2620.8 | 2 | 0.00 | slow_moving 98 | Cat_3 | 2 | 2649.92 | 2 | 0.00 | slow_moving

结果解读:product 20(Cat_27)仅售 2 件,每件单价约 ¥477,是典型的低价低销量双滞产品,优先纳入促销清货名单。product 54 / 66 / 2 各仅在 1 家门店完成 1 次销售,建议下架或集中调配到高流量门店试销。

门店收入排行

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_retail_pos.doc_ads_store_ranking AS SELECT store_id, store_city, SUM(order_count) AS total_orders, SUM(item_count) AS total_items, SUM(total_qty) AS total_qty, ROUND(SUM(gross_revenue), 2) AS gross_revenue, ROUND(SUM(net_revenue), 2) AS net_revenue, ROUND(AVG(avg_discount_pct), 2) AS avg_discount_pct, SUM(return_count) AS total_returns, ROUND(SUM(return_count) * 100.0 / NULLIF(SUM(item_count), 0), 2) AS return_rate_pct, RANK() OVER (ORDER BY SUM(net_revenue) DESC) AS revenue_rank FROM best_practice_retail_pos.doc_dws_store_daily_sales GROUP BY store_id, store_city;

REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_store_ranking; SELECT store_id, store_city, total_orders, total_qty, gross_revenue, net_revenue, avg_discount_pct, total_returns, return_rate_pct, revenue_rank FROM best_practice_retail_pos.doc_ads_store_ranking ORDER BY revenue_rank LIMIT 10;

store_id | store_city | total_orders | total_qty | gross_revenue | net_revenue | avg_discount_pct | total_returns | return_rate_pct | revenue_rank ---------+------------+--------------+-----------+---------------+-------------+------------------+---------------+-----------------+------------- 85 | Hyderabad | 3 | 11 | 41227 | 31331.48 | 24.33 | 1 | 16.67 | 1 17 | Delhi | 2 | 11 | 37009 | 30698.19 | 21.5 | 0 | 0.00 | 2 54 | Bangalore | 2 | 12 | 30081 | 25719.35 | 17.5 | 0 | 0.00 | 3 55 | Hyderabad | 2 | 9 | 31919 | 24637.9 | 25.5 | 0 | 0.00 | 4 87 | Delhi | 2 | 12 | 36889 | 24199.49 | 34 | 1 | 25.00 | 5 81 | Mumbai | 2 | 9 | 29513 | 21114.41 | 28 | 0 | 0.00 | 6 24 | Pune | 1 | 6 | 22694 | 21105.42 | 7 | 0 | 0.00 | 7 68 | Pune | 1 | 5 | 21725 | 20638.75 | 5 | 0 | 0.00 | 8 11 | Delhi | 2 | 10 | 22606 | 19327.32 | 18 | 0 | 0.00 | 9 43 | Bangalore | 1 | 5 | 20948 | 18853.2 | 10 | 0 | 0.00 | 10

结果解读

  • Hyderabad 门店 85(排名第 1)净收入 ¥31,331,但
    avg_discount_pct = 24.33
    avg_discount_pct = 24.33
    (折扣率偏高),且退货率 16.67%——高营收背后存在品质隐患,建议审查高退货 SKU 组合。
  • Delhi 门店 17(排名第 2)折扣率仅 21.5%、退货率 0%,是真正的高质量门店,可作为运营标杆推广运营经验。
  • 排名第 5 的 Delhi 门店 87 折扣率高达 34%,退货率 25%,高促销驱动的高营收不可持续,建议逐步调整促销策略。

配置 ADS 层刷新任务:

cz-cli task create refresh_ads_sku_velocity --type SQL -p skill_test # task id: 10353699 cz-cli task save-content 10353699 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_sku_velocity;" \ -p skill_test cz-cli task save-cron 10353699 --cron "0 0 2 * * ?" -p skill_test cz-cli task create refresh_ads_store_ranking --type SQL -p skill_test # task id: 10354653 cz-cli task save-content 10354653 \ --content "REFRESH DYNAMIC TABLE best_practice_retail_pos.doc_ads_store_ranking;" \ -p skill_test cz-cli task save-cron 10354653 --cron "0 0 2 * * ?" -p skill_test

完整调度链:

refresh_dwd_sales_detail
refresh_dwd_sales_detail
(01:30)→
refresh_dws_store_daily
refresh_dws_store_daily
(01:30,依赖前者)→
refresh_ads_*
refresh_ads_*
(02:00)。


Time Travel:月末对账

月末财务核对时,需要回溯到月末截止时刻的订单快照,对比当前数据检测是否有后补录入的订单。

-- 查看 ODS 订单表的历史版本 DESC HISTORY best_practice_retail_pos.doc_ods_orders;

version | time | total_rows | operation | stats --------+-----------------------------+------------+-------------+---------------------------- 4 | 2026-06-06T14:41:30.348 | 100 | INSERT_INTO | rows_inserted: 40 3 | 2026-06-06T14:41:15.488 | 60 | INSERT_INTO | rows_inserted: 30 2 | 2026-06-06T14:41:02.329 | 30 | INSERT_INTO | rows_inserted: 30 1 | 2026-06-06T14:38:18.807 | 0 | CREATE | —

-- 回溯月末截止时刻的门店收入快照 SELECT snap.order_date, snap.store_id, COUNT(*) AS orders_in_snapshot, SUM(p.amount) AS snapshot_revenue FROM best_practice_retail_pos.doc_ods_orders TIMESTAMP AS OF '2026-06-06 14:41:15.488' snap JOIN best_practice_retail_pos.doc_ods_payments p ON snap.order_id = p.order_id GROUP BY snap.order_date, snap.store_id ORDER BY snapshot_revenue DESC LIMIT 5;

order_date | store_id | orders_in_snapshot | snapshot_revenue -----------+----------+--------------------+----------------- 2021-04-26 | 54 | 1 | 13688 2022-03-08 | 87 | 1 | 12036 2023-11-15 | 100 | 1 | 11156 2020-11-14 | 57 | 1 | 9702 2023-11-27 | 1 | 1 | 9465

-- 找出月末截止后补录的订单(在当前库中有,但历史快照中没有) SELECT o_cur.order_id, o_cur.store_id, o_cur.order_date FROM best_practice_retail_pos.doc_ods_orders o_cur LEFT JOIN ( SELECT order_id FROM best_practice_retail_pos.doc_ods_orders TIMESTAMP AS OF '2026-06-06 14:41:15.488' ) snap ON o_cur.order_id = snap.order_id WHERE snap.order_id IS NULL ORDER BY o_cur.order_id LIMIT 10;

order_id | store_id | order_date ---------+----------+----------- 61 | 46 | 2022-04-07 62 | 89 | 2023-09-22 63 | 13 | 2021-05-16 64 | 71 | 2022-12-30 65 | 38 | 2023-04-14 66 | 92 | 2021-06-28 67 | 27 | 2022-01-12 68 | 50 | 2023-05-27 69 | 65 | 2021-07-11 70 | 3 | 2022-02-25

结果解读

order_id
order_id
61–100 共 40 笔订单是在月末截止后补录进来的。将这些 order_id 提交给财务审核,决定是纳入本月核算还是顺延至下月——这正是 Time Travel 在月末对账场景的核心价值。


数仓对象总览

best_practice_retail_pos
best_practice_retail_pos
Schema 中的全部对象:

SHOW TABLES IN best_practice_retail_pos;

schema_name | table_name | is_dynamic -------------------------+---------------------------------+----------- best_practice_retail_pos | doc_ods_stores | false best_practice_retail_pos | doc_ods_categories | false best_practice_retail_pos | doc_ods_products | false best_practice_retail_pos | doc_ods_promotions | false best_practice_retail_pos | doc_ods_customers | false best_practice_retail_pos | doc_ods_orders | false best_practice_retail_pos | doc_ods_order_items | false best_practice_retail_pos | doc_ods_payments | false best_practice_retail_pos | doc_ods_returns | false best_practice_retail_pos | doc_dwd_sales_detail | true best_practice_retail_pos | doc_dws_store_daily_sales | true best_practice_retail_pos | doc_dws_store_date_partition | true best_practice_retail_pos | doc_ads_sku_velocity | true best_practice_retail_pos | doc_ads_store_ranking | true


注意事项

  • Bloomfilter Index 对存量数据不自动生效

    CREATE BLOOMFILTER INDEX
    CREATE BLOOMFILTER INDEX
    只加速创建后写入的新数据。已有存量数据需先建好 Index 再重新写入,或接受存量数据不走 Bloomfilter 加速的现状。

  • 分区 Dynamic Table 必须显式声明

    PARTITIONED BY
    PARTITIONED BY
    :不能依赖系统自动推断分区列。
    SESSION_CONFIGS()['dt.args.xxx']
    SESSION_CONFIGS()['dt.args.xxx']
    返回 STRING,必须
    CAST
    CAST
    到目标类型后才能与分区列匹配,否则刷新时报类型不兼容错误。

  • REFRESH DYNAMIC TABLE
    REFRESH DYNAMIC TABLE
    不写 REFRESH INTERVAL:所有 Dynamic Table 的定期刷新通过 Studio Task 管理。Studio Task 支持在同一任务上附加数据质量规则和告警,DDL 里写
    REFRESH INTERVAL
    REFRESH INTERVAL
    会绕过这套管理机制。

  • Time Travel

    TIMESTAMP AS OF
    TIMESTAMP AS OF
    仅接受常量:不支持
    NOW() - INTERVAL N DAY
    NOW() - INTERVAL N DAY
    等运行时表达式,调用前先计算好目标时间戳。
    DESC HISTORY
    DESC HISTORY
    返回 UTC 时间,换算为本地时间(UTC+8)需加 8 小时。

  • External Schema 只读

    pos_hive_archive
    pos_hive_archive
    下的外部表不支持 INSERT / UPDATE / DELETE。如需将历史数据导入 Lakehouse 本地表,用
    INSERT INTO ... SELECT ... FROM pos_hive_archive.xxx
    INSERT INTO ... SELECT ... FROM pos_hive_archive.xxx
    显式迁移。

  • ODS 层

    doc_ods_orders
    doc_ods_orders
    使用
    PARTITIONED BY (store_id, order_date)
    PARTITIONED BY (store_id, order_date)
    :如果上游 CDC 使用
    INSERT OVERWRITE
    INSERT OVERWRITE
    写入(而非
    INSERT INTO
    INSERT INTO
    ),会导致 Dynamic Table 退化为全量刷新。推荐使用追加写入模式(仅 INSERT),保留 Dynamic Table 的增量刷新能力。


相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询