游戏运营数据仓库最佳实践
将全量游戏行为事件流构建为多层数仓,输出用户 LTV 分层、付费转化漏斗和留存矩阵。本文以 Steam 平台游戏目录(15 款主流游戏维表)为基础,模拟玩家登录、关卡完成、付费等行为事件,端到端演示 Kafka PIPE → Bronze → Silver → Gold 的完整搭建过程,覆盖 Dynamic Table、BITMAP 函数、LAG/LEAD 窗口函数、Bloomfilter Index 四项关键平台能力的落地用法。
概述
游戏运营数仓的核心挑战是将高频、多类型的行为事件流转化为可决策的业务指标。
| 问题 | 云器解决方案 |
|---|---|
| 游戏事件(登录/付费/关卡)实时高频写入,单日千万级 | Kafka PIPE 持续摄取,无需手写消费者 |
| 客户端日志每日批量上传 OSS,需自动入库 | OSS PIPE(LIST_PURGE),文件落地即触发导入 |
| Bronze → Silver(会话化)→ Gold(LTV/漏斗)自动增量更新 | Dynamic Table 声明式 SQL,系统自动识别上游变更 |
| user_id 和 app_id 是高基数列,漏斗查询按用户过滤频繁 | Bloomfilter Index,快速判断用户是否在数据块中 |
| 付费路径分析需要还原前后行为序列 | LAG/LEAD 窗口函数,还原付费决策上下文 |
| 用户日活去重、N 日留存跨天 COUNT DISTINCT | GROUP_BITMAP 系列函数,精确基数统计 |
涉及的 SQL 命令
| 命令 / 函数 | 用途 | 说明 |
|---|---|---|
CREATE TABLE | 建 Bronze 层事件表和游戏维表 | 普通表,作为 Dynamic Table 上游 |
CREATE BLOOMFILTER INDEX | 在 user_id 、app_id 列创建布隆过滤索引 | 高基数列点查过滤,减少扫描数据块 |
CREATE PIPE | 创建 Kafka 持续摄取管道 | BATCH_INTERVAL 控制摄取延迟 |
CREATE DYNAMIC TABLE | 创建 Silver / Gold 层增量计算表 | 系统自动按依赖链刷新 |
LAG / LEAD | 取同会话前/后一条事件 | 还原付费路径(前一步、后一步) |
GROUP_BITMAP | 精确基数统计(DAU 计算) | 返回集合基数,不是 bitmap 对象本身 |
GROUP_BITMAP_STATE | 生成每日 bitmap 状态快照 | 用于跨天合并去重(MAU) |
GROUP_BITMAP_MERGE | 合并多个 bitmap 状态 | 与 GROUP_BITMAP_STATE 配合使用 |
REFRESH DYNAMIC TABLE | 手动触发一次刷新 | 首次建表或调试时使用 |
前置准备
本文所有示例在
best_practice_gaming_dwCREATE SCHEMA IF NOT EXISTS best_practice_gaming_dw COMMENT 'Gaming Operations DW Best Practices';
维度表:游戏目录与分类
建表
游戏维表来自 Steam Games Dataset(Kaggle,~1GB),包含游戏名称、标签、价格、评测数等元数据。本文抽取 15 款主流游戏作为代表。
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_game ( app_id BIGINT COMMENT 'Steam App ID', name STRING COMMENT 'Game title', release_date DATE COMMENT 'Release date', developer STRING COMMENT 'Developer name', publisher STRING COMMENT 'Publisher name', genres STRING COMMENT 'Comma-separated genre tags', tags STRING COMMENT 'User-defined tags (top 20)', price_usd DOUBLE COMMENT 'Current price in USD', is_free BOOLEAN COMMENT 'True if free-to-play', positive_reviews BIGINT COMMENT 'Number of positive reviews', negative_reviews BIGINT COMMENT 'Number of negative reviews', estimated_owners STRING COMMENT 'Estimated owner range', avg_playtime_forever INT COMMENT 'Average playtime in minutes (lifetime)', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Dim: Steam game catalog (sourced from Steam Games Dataset on Kaggle)';
CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_genre ( genre_id INT COMMENT 'Genre surrogate key', genre_name STRING COMMENT 'Genre name', category STRING COMMENT 'Meta-category: Core / Casual / Strategy' ) COMMENT 'Dim: Game genre taxonomy';
写入游戏维表数据(15 款 Steam 热门游戏)后,查询 TOP 5:
SELECT app_id, name, genres, price_usd, is_free, positive_reviews FROM best_practice_gaming_dw.doc_dim_game ORDER BY positive_reviews DESC LIMIT 5;
app_id | name | genres | price_usd | is_free | positive_reviews ---------+-----------------------+-----------------------------+-----------+---------+----------------- 730 | Counter-Strike 2 | Action,Free to Play | 0 | true | 1200000 570 | Dota 2 | Action,Free to Play,Strategy| 0 | true | 950000 578080 | PUBG: BATTLEGROUNDS | Action | 29.99 | false | 680000 1151640 | Baldurs Gate 3 | RPG | 59.99 | false | 600000 413150 | Stardew Valley | RPG,Simulation | 14.99 | false | 520000
Steam 游戏目录覆盖了免费(CS2、Dota 2)和付费(PUBG、BG3、Stardew Valley、Cyberpunk 2077)两类主要模式,评测数量级分布在 14 万到 120 万之间。
Bronze 层:原始游戏事件表
建表
Bronze 层接收来自 Kafka PIPE 的实时事件流,字段设计覆盖所有事件类型的公共属性,事件特有属性以 JSON 形式存储在
extra_propsCREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_bronze_game_events ( event_id STRING COMMENT 'Unique event identifier', user_id STRING COMMENT 'Player user ID', app_id BIGINT COMMENT 'Steam App ID (FK to dim_game)', event_type STRING COMMENT 'login / logout / level_complete / purchase / achievement', event_time TIMESTAMP COMMENT 'Client-side event timestamp (UTC)', session_id STRING COMMENT 'Session identifier (pre-assigned by client SDK)', level_id INT COMMENT 'Level/map number (null if not applicable)', level_name STRING COMMENT 'Level name or map name', amount_usd DOUBLE COMMENT 'Payment amount in USD (null if not a purchase)', item_id STRING COMMENT 'Item SKU for purchase events', item_name STRING COMMENT 'Item display name', country_code STRING COMMENT 'ISO 3166-1 alpha-2 country code', device_type STRING COMMENT 'PC / Console / Mobile', client_version STRING COMMENT 'Game client version string', extra_props STRING COMMENT 'JSON blob for event-specific extra properties', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Bronze: raw in-game event stream (Kafka PIPE target)';
创建 Bloomfilter Index
后续 Silver 和 Gold 层都会按
user_idapp_idCREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_user_id ON TABLE doc_bronze_game_events (user_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_app_id ON TABLE doc_bronze_game_events (app_id);
⚠️ 注意:
CREATE BLOOMFILTER INDEXUSE SCHEMA best_practice_gaming_dw-s best_practice_gaming_dw配置 Kafka PIPE
游戏行为事件经客户端 SDK 采集后发往 Kafka topic,PIPE 负责持续消费并写入 Bronze 表。
-- 先建原始字符串接收表(Kafka PIPE 写入 JSON 字符串) CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.kafka_raw_game_events (value STRING); -- 创建 Kafka PIPE CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_game_events VIRTUAL_CLUSTER = 'DEFAULT' BATCH_INTERVAL_IN_SECONDS = '60' AS COPY INTO best_practice_gaming_dw.kafka_raw_game_events FROM ( SELECT CAST(value AS STRING) AS value FROM READ_KAFKA( '<kafka-broker>:9092', -- 替换为实际 broker 地址 'game_tracking_events', -- topic 名称 '', 'cz_gaming_consumer', -- consumer group ID '','','','', 'raw', 'raw', 0, map() ) );
💡 提示:游戏事件的 Kafka topic 一般按事件类型分拆(
login_eventspurchase_eventsevent_typeBATCH_INTERVAL_IN_SECONDS = 6010配置 OSS PIPE(每日离线日志)
客户端在弱网环境下的缓存日志每日批量上传到阿里云 OSS,需要先创建 Storage Connection,再挂载 External Volume,最后配置 PIPE 自动扫描导入。
第一步:创建 OSS Storage Connection
-- 替换 ACCESS_ID / ACCESS_KEY 为有 OSS 读写权限的 RAM 用户凭证 -- ENDPOINT 替换为 OSS bucket 所在地域的访问地址 CREATE STORAGE CONNECTION IF NOT EXISTS best_practice_gaming_dw.gaming_oss_conn TYPE oss ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com' ACCESS_ID = '<your_access_key_id>' ACCESS_KEY = '<your_access_key_secret>';
第二步:创建挂载该 OSS 路径的 External Volume
-- LOCATION 替换为实际的 OSS bucket 路径 CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_gaming_dw.gaming_offline_logs LOCATION 'oss://<your-bucket>/game_offline_logs/' USING CONNECTION best_practice_gaming_dw.gaming_oss_conn DIRECTORY = (enable = true, auto_refresh = true) RECURSIVE = true;
第三步:创建 OSS PIPE
CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_offline_logs VIRTUAL_CLUSTER = 'DEFAULT' INGEST_MODE = 'LIST_PURGE' AS COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM VOLUME best_practice_gaming_dw.gaming_offline_logs USING json;
💡 提示:LIST_PURGE 模式会在导入成功后删除 Volume 中的原始文件,避免重复导入。如果需要保留原文件,使用 LIST(不删除)模式。
写入模拟行为数据
本文模拟 7 个玩家、45 条事件,覆盖 login/logout/level_complete/purchase/achievement 五种类型。
从本地 CSV 导入(推荐)
将事件数据保存为 CSV 文件后,通过 User Volume 批量导入:
-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/game_events.csv' TO USER VOLUME FILE 'game_events.csv';
-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('game_events.csv');
也可直接内联插入模拟测试数据(不需要 CSV 文件):
INSERT INTO best_practice_gaming_dw.doc_bronze_game_events (event_id, user_id, app_id, event_type, event_time, session_id, level_id, level_name, amount_usd, item_id, item_name, country_code, device_type, client_version, extra_props) VALUES ('E0001','U001',730,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0002','U001',730,'level_complete',CAST('2026-05-20 09:12:00' AS TIMESTAMP),'S001',1,'de_dust2',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":18,"deaths":5,"mvp":true}'), ('E0003','U001',730,'purchase',CAST('2026-05-20 09:15:00' AS TIMESTAMP),'S001',NULL,NULL,1.99,'skin_001','AK-47 | Redline','CN','PC','v1.40.1','{}'), ('E0004','U001',730,'level_complete',CAST('2026-05-20 09:30:00' AS TIMESTAMP),'S001',2,'de_inferno',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":22,"deaths":8,"mvp":false}'), ('E0005','U001',730,'logout',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0006','U001',730,'login',CAST('2026-05-21 09:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0007','U001',730,'level_complete',CAST('2026-05-21 09:30:00' AS TIMESTAMP),'S005',3,'de_mirage',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":15,"deaths":6}'), ('E0008','U001',730,'purchase',CAST('2026-05-21 09:36:00' AS TIMESTAMP),'S005',NULL,NULL,14.99,'skin_002','M4A4 | Howl','CN','PC','v1.40.1','{}'), ('E0009','U001',730,'logout',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0010','U002',570,'login',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0011','U002',570,'level_complete',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S002',1,'tutorial',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0012','U002',570,'purchase',CAST('2026-05-20 11:10:00' AS TIMESTAMP),'S002',NULL,NULL,4.99,'hero_001','Crystal Maiden Set','US','PC','v7.34','{}'), ('E0013','U002',570,'logout',CAST('2026-05-20 12:30:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0014','U002',570,'login',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0015','U002',570,'level_complete',CAST('2026-05-21 11:00:00' AS TIMESTAMP),'S010',2,'ranked',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0016','U002',570,'logout',CAST('2026-05-21 12:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0017','U003',1172470,'login',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0018','U003',1172470,'level_complete',CAST('2026-05-20 11:20:00' AS TIMESTAMP),'S003',1,'Kings Canyon',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0019','U003',1172470,'level_complete',CAST('2026-05-20 11:40:00' AS TIMESTAMP),'S003',2,'World Edge',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0020','U003',1172470,'purchase',CAST('2026-05-20 12:10:00' AS TIMESTAMP),'S003',NULL,NULL,9.99,'legend_001','Revenant Legend Skin','KR','PC','v3.24','{}'), ('E0021','U003',1172470,'logout',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0022','U004',578080,'login',CAST('2026-05-10 10:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0023','U004',578080,'logout',CAST('2026-05-10 11:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0024','U004',578080,'login',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0025','U004',578080,'level_complete',CAST('2026-05-20 14:30:00' AS TIMESTAMP),'S004',1,'Erangel',NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0026','U004',578080,'logout',CAST('2026-05-20 15:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0027','U005',1091500,'login',CAST('2026-05-18 09:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0028','U005',1091500,'level_complete',CAST('2026-05-18 10:00:00' AS TIMESTAMP),'S006',1,'Prologue',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0029','U005',1091500,'purchase',CAST('2026-05-18 10:05:00' AS TIMESTAMP),'S006',NULL,NULL,59.99,'dlc_phantom','Phantom Liberty DLC','DE','PC','v2.12','{}'), ('E0030','U005',1091500,'logout',CAST('2026-05-18 11:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0031','U005',1091500,'login',CAST('2026-05-19 09:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0032','U005',1091500,'level_complete',CAST('2026-05-19 11:00:00' AS TIMESTAMP),'S007',1,'Heist',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0033','U005',1091500,'logout',CAST('2026-05-19 12:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0034','U005',1091500,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0035','U005',1091500,'purchase',CAST('2026-05-20 09:05:00' AS TIMESTAMP),'S008',NULL,NULL,9.99,'stash_001','Stash Tab','DE','PC','v2.12','{}'), ('E0036','U005',1091500,'logout',CAST('2026-05-20 10:30:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0037','U005',1091500,'achievement',CAST('2026-05-20 09:10:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{"trophy":"platinum"}'), ('E0038','U006',892970,'login',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0039','U006',892970,'level_complete',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',1,'Limgrave',NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0040','U006',892970,'logout',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0041','U007',2512000,'login',CAST('2026-05-21 14:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0042','U007',2512000,'purchase',CAST('2026-05-21 14:25:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_tank','Reinhardt Skin','BR','PC','v2.1.0','{}'), ('E0043','U007',2512000,'purchase',CAST('2026-05-21 14:30:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_healer','Mercy Skin','BR','PC','v2.1.0','{}'), ('E0044','U007',2512000,'level_complete',CAST('2026-05-21 14:20:00' AS TIMESTAMP),'S011',1,'Control Point',NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0045','U007',2512000,'logout',CAST('2026-05-21 16:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}');
验证 Bronze 层数据分布:
SELECT event_type, COUNT(*) AS cnt FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY event_type ORDER BY cnt DESC;
event_type | cnt ----------------+---- login | 12 level_complete | 12 logout | 12 purchase | 8 achievement | 1
Bronze 层共 45 条事件,涵盖 5 种类型,付费事件 8 条(换算付费率:8 个付费次数 / 12 次登录 ≈ 67%,注意这是模拟数据场景下的高付费率,真实场景一般为 2-5%)。
Silver 层:会话化用户行为序列
Silver 层有两张 Dynamic Table:
silver_user_sessionssilver_event_sequence会话聚合表
以
session_id💡 提示:真实场景中"会话"通常由客户端 SDK 生成
session_idsession_idsession_idLAGCREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: sessionized user behavior with 30-min idle cutoff' AS SELECT e.session_id, e.user_id, e.app_id, g.name AS game_name, g.genres AS game_genres, g.is_free AS game_is_free, e.country_code, e.device_type, MIN(e.event_time) AS session_start, MAX(e.event_time) AS session_end, TIMESTAMPDIFF(SECOND, MIN(e.event_time), MAX(e.event_time)) / 60.0 AS session_duration_min, COUNT(*) AS total_events, SUM(CASE WHEN e.event_type = 'level_complete' THEN 1 ELSE 0 END) AS levels_completed, SUM(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count, SUM(COALESCE(e.amount_usd, 0.0)) AS session_revenue_usd, MAX(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, CAST(MIN(e.event_time) AS DATE) AS session_date FROM best_practice_gaming_dw.doc_bronze_game_events e LEFT JOIN best_practice_gaming_dw.doc_dim_game g ON e.app_id = g.app_id GROUP BY e.session_id, e.user_id, e.app_id, g.name, g.genres, g.is_free, e.country_code, e.device_type;
REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions;
查询会话汇总:
SELECT session_id, user_id, game_name, session_duration_min, levels_completed, purchase_count, session_revenue_usd FROM best_practice_gaming_dw.doc_silver_user_sessions ORDER BY session_start;
session_id | user_id | game_name | duration_min | levels | purchases | revenue -----------+---------+--------------------+--------------+--------+-----------+-------- S012 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 0 | 0 | 0 S006 | U005 | Cyberpunk 2077 | 120.00 | 1 | 1 | 59.99 S007 | U005 | Cyberpunk 2077 | 180.00 | 1 | 0 | 0 S008 | U005 | Cyberpunk 2077 | 90.00 | 0 | 1 | 9.99 S001 | U001 | Counter-Strike 2 | 60.00 | 2 | 1 | 1.99 S002 | U002 | Dota 2 | 150.00 | 1 | 1 | 4.99 S003 | U003 | Apex Legends | 120.00 | 2 | 1 | 9.99 S009 | U006 | ELDEN RING | 60.00 | 1 | 0 | 0 S004 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 1 | 0 | 0 S005 | U001 | Counter-Strike 2 | 60.00 | 1 | 1 | 14.99 S010 | U002 | Dota 2 | 120.00 | 1 | 0 | 0 S011 | U007 | Overwatch 2 | 120.00 | 1 | 2 | 59.98
12 个会话中 7 个产生付费,会话付费率 58.3%;总会话时长从 60 分钟(单局场景)到 180 分钟(RPG 深度游戏)。
事件序列表(LAG/LEAD 付费路径分析)
使用
LAGLEADCREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: event sequence with LAG/LEAD for payment path analysis' AS SELECT event_id, user_id, app_id, event_type, event_time, session_id, amount_usd, item_id, item_name, country_code, -- 同会话中前一个事件 LAG(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_type, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_time, -- 同会话中后一个事件 LEAD(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_type, LEAD(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_time, -- 距离前一个事件的秒数 TIMESTAMPDIFF(SECOND, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time), event_time ) AS seconds_since_prev, -- 付费前累计完成的关卡数(跨会话,按用户全局计算) SUM(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) OVER (PARTITION BY user_id ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS levels_before, CAST(event_time AS DATE) AS event_date FROM best_practice_gaming_dw.doc_bronze_game_events;
⚠️ 注意:
LAST_VALUEROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGREFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence;
查询所有付费事件的前后步骤:
SELECT user_id, event_type, prev_event_type AS prev_step, next_event_type AS next_step, levels_before AS levels_done_before_pay, seconds_since_prev AS secs_to_decide FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' ORDER BY event_time;
user_id | event_type | prev_step | next_step | levels_before | secs_to_decide --------+------------+----------------+----------------+---------------+--------------- U005 | purchase | level_complete | logout | 1 | 300 U005 | purchase | login | achievement | 2 | 300 U001 | purchase | level_complete | level_complete | 1 | 180 U002 | purchase | level_complete | logout | 1 | 600 U003 | purchase | level_complete | logout | 2 | 1800 U001 | purchase | level_complete | logout | 3 | 360 U007 | purchase | level_complete | purchase | 1 | 300 U007 | purchase | purchase | logout | 1 | 300
进一步聚合——付费前置步骤分布:
SELECT prev_event_type AS prev_step, COUNT(*) AS pay_count, ROUND(AVG(seconds_since_prev), 0) AS avg_decision_secs FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' GROUP BY prev_event_type ORDER BY pay_count DESC;
prev_step | pay_count | avg_decision_secs ----------------+-----------+------------------ level_complete | 6 | 590 purchase | 1 | 300 login | 1 | 300
结果解读:75% 的付费发生在完成关卡之后(
level_complete → purchaselevel_completeGold 层:LTV 分层、漏斗、留存矩阵
用户 LTV 分层
以
user_idCREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: user LTV segmentation - total spend, sessions, active days' AS SELECT s.user_id, COUNT(DISTINCT s.session_id) AS total_sessions, COUNT(DISTINCT s.session_date) AS active_days, SUM(s.session_duration_min) AS total_playtime_min, SUM(s.session_revenue_usd) AS total_revenue_usd, COUNT(DISTINCT s.app_id) AS distinct_games_played, MAX(s.session_date) AS last_active_date, MIN(s.session_date) AS first_active_date, -- LTV 分层:按累计充值金额 CASE WHEN SUM(s.session_revenue_usd) >= 50 THEN 'Whale' WHEN SUM(s.session_revenue_usd) >= 10 THEN 'Dolphin' WHEN SUM(s.session_revenue_usd) > 0 THEN 'Minnow' ELSE 'Free' END AS ltv_tier, -- 参与度评分:活跃天数 40% + 游戏时长 40% + 付费 20% ROUND( 0.4 * LEAST(COUNT(DISTINCT s.session_date) / 7.0, 1.0) * 100 + 0.4 * LEAST(SUM(s.session_duration_min) / 300.0, 1.0) * 100 + 0.2 * LEAST(SUM(s.session_revenue_usd) / 20.0, 1.0) * 100, 1) AS engagement_score FROM best_practice_gaming_dw.doc_silver_user_sessions s GROUP BY s.user_id;
REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv;
查询各用户 LTV 分层:
SELECT user_id, total_sessions, active_days, total_playtime_min, total_revenue_usd, ltv_tier, engagement_score FROM best_practice_gaming_dw.doc_gold_user_ltv ORDER BY total_revenue_usd DESC;
user_id | sessions | active_days | playtime_min | revenue_usd | ltv_tier | engagement_score --------+----------+-------------+--------------+-------------+----------+----------------- U005 | 3 | 3 | 390.00 | 69.98 | Whale | 77.1 U007 | 1 | 1 | 120.00 | 59.98 | Whale | 41.7 U001 | 2 | 2 | 120.00 | 16.98 | Dolphin | 44.4 U003 | 1 | 1 | 120.00 | 9.99 | Minnow | 31.7 U002 | 2 | 2 | 270.00 | 4.99 | Minnow | 52.4 U004 | 2 | 2 | 120.00 | 0 | Free | 27.4 U006 | 1 | 1 | 60.00 | 0 | Free | 13.7
结果解读:7 个用户中 2 个 Whale(累计付费 ≥50 美元),贡献了 71.3% 的总收入(129.96/161.94)——这是 F2P 游戏的典型"二八效应"。U005 参与度评分 77.1 最高,因为他/她连续 3 天活跃、游戏时长最长,说明高参与度与高付费存在正相关。
LTV 分层汇总:
SELECT ltv_tier, COUNT(*) AS user_count, ROUND(SUM(total_revenue_usd), 2) AS total_rev, ROUND(AVG(total_revenue_usd), 2) AS avg_rev FROM best_practice_gaming_dw.doc_gold_user_ltv GROUP BY ltv_tier ORDER BY avg_rev DESC;
ltv_tier | user_count | total_rev | avg_rev ---------+------------+-----------+-------- Whale | 2 | 129.96 | 64.98 Dolphin | 1 | 16.98 | 16.98 Minnow | 2 | 14.98 | 7.49 Free | 2 | 0 | 0
使用 BITMAP 函数计算 DAU
BITMAP 函数适合需要按天统计 DAU,并跨天合并计算 MAU 的场景。前提:用户 ID 需要转换为整数(
CAST(SUBSTR(user_id, 2) AS INT)方案一:GROUP_BITMAP 直接计算每日唯一用户数
SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP(CAST(SUBSTR(user_id, 2) AS INT)) AS dau FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ORDER BY event_date;
event_date | dau ------------+---- 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3
方案二:GROUP_BITMAP_STATE + GROUP_BITMAP_MERGE 跨天去重(MAU 场景)
GROUP_BITMAP_STATEGROUP_BITMAP_MERGEWITH daily_bitmaps AS ( SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP_STATE(CAST(SUBSTR(user_id, 2) AS INT)) AS bm FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ) SELECT event_date, GROUP_BITMAP_MERGE(bm) AS cumulative_unique_users FROM daily_bitmaps GROUP BY event_date ORDER BY event_date;
event_date | cumulative_unique_users ------------+------------------------ 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3
⚠️ 注意:
GROUP_BITMAPGROUP_BITMAPGROUP_BITMAP_ANDGROUP_BITMAP_ORGROUP_BITMAP_MERGEGROUP_BITMAP_STATE付费转化漏斗
按日统计"登录 → 参与关卡 → 完成付费"三步漏斗的转化率。
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: payment conversion funnel - login → engage → purchase' AS WITH user_daily AS ( SELECT user_id, event_date, MAX(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS has_login, MAX(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) AS has_level, MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, SUM(COALESCE(amount_usd, 0.0)) AS day_revenue FROM best_practice_gaming_dw.doc_silver_event_sequence GROUP BY user_id, event_date ), funnel_agg AS ( SELECT event_date, SUM(has_login) AS step1_login, SUM(has_level) AS step2_engage, SUM(has_purchase) AS step3_purchase, SUM(day_revenue) AS total_revenue FROM user_daily GROUP BY event_date ) SELECT event_date, step1_login, step2_engage, step3_purchase, ROUND(step2_engage * 100.0 / NULLIF(step1_login, 0), 1) AS engage_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step2_engage, 0), 1) AS purchase_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step1_login, 0), 1) AS overall_conversion_pct, total_revenue FROM funnel_agg ORDER BY event_date;
REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel;
SELECT * FROM best_practice_gaming_dw.doc_gold_payment_funnel ORDER BY event_date;
event_date | login | engage | purchase | engage_rate | pay_rate | cvr_overall | revenue ------------+-------+--------+----------+-------------+----------+-------------+-------- 2026-05-10 | 1 | 0 | 0 | 0.0 | null | 0.0 | 0 2026-05-18 | 1 | 1 | 1 | 100.0 | 100.0 | 100.0 | 59.99 2026-05-19 | 1 | 1 | 0 | 100.0 | 0.0 | 0.0 | 0 2026-05-20 | 6 | 5 | 4 | 83.3 | 80.0 | 66.7 | 26.96 2026-05-21 | 3 | 3 | 2 | 100.0 | 66.7 | 66.7 | 74.97
💡 提示:5 月 10 日只有登录事件(U004 登录后未完成任何关卡),
engage_rate = 0.0purchase_rateNULLIFcvr_overall = 66.7%用户留存矩阵
以用户首次登录日期为队列(Cohort),统计 D1、D7 留存率。
CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: N-day retention cohort matrix' AS WITH first_login AS ( SELECT user_id, MIN(CAST(event_time AS DATE)) AS cohort_date FROM best_practice_gaming_dw.doc_bronze_game_events WHERE event_type = 'login' GROUP BY user_id ), active_days AS ( SELECT DISTINCT e.user_id, CAST(e.event_time AS DATE) AS active_date, f.cohort_date, DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) AS day_offset FROM best_practice_gaming_dw.doc_bronze_game_events e JOIN first_login f ON e.user_id = f.user_id WHERE DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) BETWEEN 0 AND 30 ) SELECT cohort_date, COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END) AS d0_users, COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) AS d1_retained, COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) AS d7_retained, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d1_retention_pct, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d7_retention_pct FROM active_days GROUP BY cohort_date ORDER BY cohort_date;
REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix;
SELECT * FROM best_practice_gaming_dw.doc_gold_retention_matrix ORDER BY cohort_date;
cohort_date | d0_users | d1_retained | d7_retained | d1_pct | d7_pct ------------+----------+-------------+-------------+--------+------- 2026-05-10 | 1 | 0 | 0 | 0.0 | 0.0 2026-05-18 | 1 | 1 | 0 | 100.0 | 0.0 2026-05-20 | 4 | 2 | 0 | 50.0 | 0.0 2026-05-21 | 1 | 0 | 0 | 0.0 | 0.0
结果解读:5 月 18 日入队的用户(U005)次日留存 100%(连续 3 天登录);5 月 20 日入队的 4 个用户中 2 个次日回归(D1 留存 50%);模拟数据时间跨度较短(10 天内),所有队列 D7 留存均为 0,生产环境下需要积累至少 7 天的用户行为数据。
注意事项
-
Kafka PIPE DDL 阶段会尝试连接 broker:
CREATE PIPECREATE PIPE时系统会验证 broker 地址和 topic 是否可达。开发环境没有 Kafka 时,先创建目标表,PIPE 建立后通过直接 INSERT 模拟摄取流程。 -
BLOOMFILTER INDEX 创建后对存量数据不生效:
CREATE BLOOMFILTER INDEXCREATE BLOOMFILTER INDEX只对新写入的数据块生效,存量数据需执行BUILD INDEX ON TABLE ... (column)BUILD INDEX ON TABLE ... (column)重建(BLOOMFILTER 类型不支持BUILD INDEXBUILD INDEX,因此存量数据不能补充建索引;考虑用 INVERTED INDEX 替代需要范围查询的场景)。 -
GROUP_BITMAP 要求整数类型输入:用户 ID 如果是字符串(如 "U001"),需要先转换为整数后才能使用 BITMAP 函数。常见做法是在 Bronze 层维护一张
user_id → int_iduser_id → int_id映射表,在 Silver 层 JOIN 后使用int_idint_id参与 BITMAP 运算。 -
漏斗"付费率 > 100%"是正常现象:当天同一用户多次付费时,按"付费次数/登录人数"的计算方式会超过 100%。报表中建议同时提供"付费人数"(COUNT DISTINCT user_id)和"付费次数"两个口径,避免误读。
-
Dynamic Table 首次刷新需要手动触发:
CREATE DYNAMIC TABLECREATE DYNAMIC TABLE后系统不会立即执行首次计算,需要REFRESH DYNAMIC TABLE <table>REFRESH DYNAMIC TABLE <table>触发。后续按REFRESH INTERVALREFRESH INTERVAL自动增量刷新。 -
LAST_VALUE 窗口帧陷阱:
LAST_VALUELAST_VALUE的默认帧是ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,导致在每行返回当前行的值而不是最后一行。需要显式指定ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING。
相关文档
- CREATE DYNAMIC TABLE — 动态表创建语法和刷新模式说明
- CREATE PIPE — PIPE 创建语法,含 Kafka 和 OSS 两种来源
- 窗口函数参考 — LAG、LEAD、SUM OVER 完整参数说明
- BITMAP 函数参考 — GROUP_BITMAP 系列函数语法
- CREATE BLOOMFILTER INDEX — Bloomfilter 索引创建和使用限制
- Medallion 架构实践:纯 SQL Dynamic Table 方案 — 三层数仓完整示例(NHL 数据集)
