游戏运营数据仓库最佳实践

将全量游戏行为事件流构建为多层数仓,输出用户 LTV 分层、付费转化漏斗和留存矩阵。本文以 Steam 平台游戏目录(15 款主流游戏维表)为基础,模拟玩家登录、关卡完成、付费等行为事件,端到端演示 Kafka PIPE → Bronze → Silver → Gold 的完整搭建过程,覆盖 Dynamic Table、BITMAP 函数、LAG/LEAD 窗口函数、Bloomfilter Index 四项关键平台能力的落地用法。


概述

游戏运营数仓的核心挑战是将高频、多类型的行为事件流转化为可决策的业务指标。

问题云器解决方案
游戏事件(登录/付费/关卡)实时高频写入,单日千万级Kafka PIPE 持续摄取,无需手写消费者
客户端日志每日批量上传 OSS,需自动入库OSS PIPE(LIST_PURGE),文件落地即触发导入
Bronze → Silver(会话化)→ Gold(LTV/漏斗)自动增量更新Dynamic Table 声明式 SQL,系统自动识别上游变更
user_id 和 app_id 是高基数列,漏斗查询按用户过滤频繁Bloomfilter Index,快速判断用户是否在数据块中
付费路径分析需要还原前后行为序列LAG/LEAD 窗口函数,还原付费决策上下文
用户日活去重、N 日留存跨天 COUNT DISTINCTGROUP_BITMAP 系列函数,精确基数统计

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
建 Bronze 层事件表和游戏维表普通表,作为 Dynamic Table 上游
CREATE BLOOMFILTER INDEX
CREATE BLOOMFILTER INDEX
user_id
user_id
app_id
app_id
列创建布隆过滤索引
高基数列点查过滤,减少扫描数据块
CREATE PIPE
CREATE PIPE
创建 Kafka 持续摄取管道BATCH_INTERVAL 控制摄取延迟
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 Silver / Gold 层增量计算表系统自动按依赖链刷新
LAG
LAG
/
LEAD
LEAD
取同会话前/后一条事件还原付费路径(前一步、后一步)
GROUP_BITMAP
GROUP_BITMAP
精确基数统计(DAU 计算)返回集合基数,不是 bitmap 对象本身
GROUP_BITMAP_STATE
GROUP_BITMAP_STATE
生成每日 bitmap 状态快照用于跨天合并去重(MAU)
GROUP_BITMAP_MERGE
GROUP_BITMAP_MERGE
合并多个 bitmap 状态
GROUP_BITMAP_STATE
GROUP_BITMAP_STATE
配合使用
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新首次建表或调试时使用

前置准备

本文所有示例在

best_practice_gaming_dw
best_practice_gaming_dw
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_gaming_dw COMMENT 'Gaming Operations DW Best Practices';


维度表:游戏目录与分类

建表

游戏维表来自 Steam Games Dataset(Kaggle,~1GB),包含游戏名称、标签、价格、评测数等元数据。本文抽取 15 款主流游戏作为代表。

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_game ( app_id BIGINT COMMENT 'Steam App ID', name STRING COMMENT 'Game title', release_date DATE COMMENT 'Release date', developer STRING COMMENT 'Developer name', publisher STRING COMMENT 'Publisher name', genres STRING COMMENT 'Comma-separated genre tags', tags STRING COMMENT 'User-defined tags (top 20)', price_usd DOUBLE COMMENT 'Current price in USD', is_free BOOLEAN COMMENT 'True if free-to-play', positive_reviews BIGINT COMMENT 'Number of positive reviews', negative_reviews BIGINT COMMENT 'Number of negative reviews', estimated_owners STRING COMMENT 'Estimated owner range', avg_playtime_forever INT COMMENT 'Average playtime in minutes (lifetime)', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Dim: Steam game catalog (sourced from Steam Games Dataset on Kaggle)';

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_dim_genre ( genre_id INT COMMENT 'Genre surrogate key', genre_name STRING COMMENT 'Genre name', category STRING COMMENT 'Meta-category: Core / Casual / Strategy' ) COMMENT 'Dim: Game genre taxonomy';

写入游戏维表数据(15 款 Steam 热门游戏)后,查询 TOP 5:

SELECT app_id, name, genres, price_usd, is_free, positive_reviews FROM best_practice_gaming_dw.doc_dim_game ORDER BY positive_reviews DESC LIMIT 5;

app_id | name | genres | price_usd | is_free | positive_reviews ---------+-----------------------+-----------------------------+-----------+---------+----------------- 730 | Counter-Strike 2 | Action,Free to Play | 0 | true | 1200000 570 | Dota 2 | Action,Free to Play,Strategy| 0 | true | 950000 578080 | PUBG: BATTLEGROUNDS | Action | 29.99 | false | 680000 1151640 | Baldurs Gate 3 | RPG | 59.99 | false | 600000 413150 | Stardew Valley | RPG,Simulation | 14.99 | false | 520000

Steam 游戏目录覆盖了免费(CS2、Dota 2)和付费(PUBG、BG3、Stardew Valley、Cyberpunk 2077)两类主要模式,评测数量级分布在 14 万到 120 万之间。


Bronze 层:原始游戏事件表

建表

Bronze 层接收来自 Kafka PIPE 的实时事件流,字段设计覆盖所有事件类型的公共属性,事件特有属性以 JSON 形式存储在

extra_props
extra_props
中。

CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.doc_bronze_game_events ( event_id STRING COMMENT 'Unique event identifier', user_id STRING COMMENT 'Player user ID', app_id BIGINT COMMENT 'Steam App ID (FK to dim_game)', event_type STRING COMMENT 'login / logout / level_complete / purchase / achievement', event_time TIMESTAMP COMMENT 'Client-side event timestamp (UTC)', session_id STRING COMMENT 'Session identifier (pre-assigned by client SDK)', level_id INT COMMENT 'Level/map number (null if not applicable)', level_name STRING COMMENT 'Level name or map name', amount_usd DOUBLE COMMENT 'Payment amount in USD (null if not a purchase)', item_id STRING COMMENT 'Item SKU for purchase events', item_name STRING COMMENT 'Item display name', country_code STRING COMMENT 'ISO 3166-1 alpha-2 country code', device_type STRING COMMENT 'PC / Console / Mobile', client_version STRING COMMENT 'Game client version string', extra_props STRING COMMENT 'JSON blob for event-specific extra properties', ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() ) COMMENT 'Bronze: raw in-game event stream (Kafka PIPE target)';

创建 Bloomfilter Index

后续 Silver 和 Gold 层都会按

user_id
user_id
app_id
app_id
过滤,这两列基数高(用户 ID 数百万级、游戏 ID 数万级),适合 Bloomfilter Index。

CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_user_id ON TABLE doc_bronze_game_events (user_id); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_app_id ON TABLE doc_bronze_game_events (app_id);

配置 Kafka PIPE

游戏行为事件经客户端 SDK 采集后发往 Kafka topic,PIPE 负责持续消费并写入 Bronze 表。

-- 先建原始字符串接收表(Kafka PIPE 写入 JSON 字符串) CREATE TABLE IF NOT EXISTS best_practice_gaming_dw.kafka_raw_game_events (value STRING); -- 创建 Kafka PIPE CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_game_events VIRTUAL_CLUSTER = 'DEFAULT' BATCH_INTERVAL_IN_SECONDS = '60' AS COPY INTO best_practice_gaming_dw.kafka_raw_game_events FROM ( SELECT CAST(value AS STRING) AS value FROM READ_KAFKA( '<kafka-broker>:9092', -- 替换为实际 broker 地址 'game_tracking_events', -- topic 名称 '', 'cz_gaming_consumer', -- consumer group ID '','','','', 'raw', 'raw', 0, map() ) );

配置 OSS PIPE(每日离线日志)

客户端在弱网环境下的缓存日志每日批量上传到阿里云 OSS,需要先创建 Storage Connection,再挂载 External Volume,最后配置 PIPE 自动扫描导入。

第一步:创建 OSS Storage Connection

-- 替换 ACCESS_ID / ACCESS_KEY 为有 OSS 读写权限的 RAM 用户凭证 -- ENDPOINT 替换为 OSS bucket 所在地域的访问地址 CREATE STORAGE CONNECTION IF NOT EXISTS best_practice_gaming_dw.gaming_oss_conn TYPE oss ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com' ACCESS_ID = '<your_access_key_id>' ACCESS_KEY = '<your_access_key_secret>';

第二步:创建挂载该 OSS 路径的 External Volume

-- LOCATION 替换为实际的 OSS bucket 路径 CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_gaming_dw.gaming_offline_logs LOCATION 'oss://<your-bucket>/game_offline_logs/' USING CONNECTION best_practice_gaming_dw.gaming_oss_conn DIRECTORY = (enable = true, auto_refresh = true) RECURSIVE = true;

第三步:创建 OSS PIPE

CREATE PIPE IF NOT EXISTS best_practice_gaming_dw.pipe_offline_logs VIRTUAL_CLUSTER = 'DEFAULT' INGEST_MODE = 'LIST_PURGE' AS COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM VOLUME best_practice_gaming_dw.gaming_offline_logs USING json;

写入模拟行为数据

本文模拟 7 个玩家、45 条事件,覆盖 login/logout/level_complete/purchase/achievement 五种类型。

从本地 CSV 导入(推荐)

将事件数据保存为 CSV 文件后,通过 User Volume 批量导入:

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/game_events.csv' TO USER VOLUME FILE 'game_events.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_gaming_dw.doc_bronze_game_events FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('game_events.csv');

也可直接内联插入模拟测试数据(不需要 CSV 文件):

INSERT INTO best_practice_gaming_dw.doc_bronze_game_events (event_id, user_id, app_id, event_type, event_time, session_id, level_id, level_name, amount_usd, item_id, item_name, country_code, device_type, client_version, extra_props) VALUES ('E0001','U001',730,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0002','U001',730,'level_complete',CAST('2026-05-20 09:12:00' AS TIMESTAMP),'S001',1,'de_dust2',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":18,"deaths":5,"mvp":true}'), ('E0003','U001',730,'purchase',CAST('2026-05-20 09:15:00' AS TIMESTAMP),'S001',NULL,NULL,1.99,'skin_001','AK-47 | Redline','CN','PC','v1.40.1','{}'), ('E0004','U001',730,'level_complete',CAST('2026-05-20 09:30:00' AS TIMESTAMP),'S001',2,'de_inferno',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":22,"deaths":8,"mvp":false}'), ('E0005','U001',730,'logout',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S001',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0006','U001',730,'login',CAST('2026-05-21 09:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0007','U001',730,'level_complete',CAST('2026-05-21 09:30:00' AS TIMESTAMP),'S005',3,'de_mirage',NULL,NULL,NULL,'CN','PC','v1.40.1','{"kills":15,"deaths":6}'), ('E0008','U001',730,'purchase',CAST('2026-05-21 09:36:00' AS TIMESTAMP),'S005',NULL,NULL,14.99,'skin_002','M4A4 | Howl','CN','PC','v1.40.1','{}'), ('E0009','U001',730,'logout',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S005',NULL,NULL,NULL,NULL,NULL,'CN','PC','v1.40.1','{}'), ('E0010','U002',570,'login',CAST('2026-05-20 10:00:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0011','U002',570,'level_complete',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S002',1,'tutorial',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0012','U002',570,'purchase',CAST('2026-05-20 11:10:00' AS TIMESTAMP),'S002',NULL,NULL,4.99,'hero_001','Crystal Maiden Set','US','PC','v7.34','{}'), ('E0013','U002',570,'logout',CAST('2026-05-20 12:30:00' AS TIMESTAMP),'S002',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0014','U002',570,'login',CAST('2026-05-21 10:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0015','U002',570,'level_complete',CAST('2026-05-21 11:00:00' AS TIMESTAMP),'S010',2,'ranked',NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0016','U002',570,'logout',CAST('2026-05-21 12:00:00' AS TIMESTAMP),'S010',NULL,NULL,NULL,NULL,NULL,'US','PC','v7.34','{}'), ('E0017','U003',1172470,'login',CAST('2026-05-20 11:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0018','U003',1172470,'level_complete',CAST('2026-05-20 11:20:00' AS TIMESTAMP),'S003',1,'Kings Canyon',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0019','U003',1172470,'level_complete',CAST('2026-05-20 11:40:00' AS TIMESTAMP),'S003',2,'World Edge',NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0020','U003',1172470,'purchase',CAST('2026-05-20 12:10:00' AS TIMESTAMP),'S003',NULL,NULL,9.99,'legend_001','Revenant Legend Skin','KR','PC','v3.24','{}'), ('E0021','U003',1172470,'logout',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S003',NULL,NULL,NULL,NULL,NULL,'KR','PC','v3.24','{}'), ('E0022','U004',578080,'login',CAST('2026-05-10 10:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0023','U004',578080,'logout',CAST('2026-05-10 11:00:00' AS TIMESTAMP),'S012',NULL,NULL,NULL,NULL,NULL,'US','PC','v30.5','{}'), ('E0024','U004',578080,'login',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0025','U004',578080,'level_complete',CAST('2026-05-20 14:30:00' AS TIMESTAMP),'S004',1,'Erangel',NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0026','U004',578080,'logout',CAST('2026-05-20 15:00:00' AS TIMESTAMP),'S004',NULL,NULL,NULL,NULL,NULL,'US','PC','v31.0','{}'), ('E0027','U005',1091500,'login',CAST('2026-05-18 09:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0028','U005',1091500,'level_complete',CAST('2026-05-18 10:00:00' AS TIMESTAMP),'S006',1,'Prologue',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0029','U005',1091500,'purchase',CAST('2026-05-18 10:05:00' AS TIMESTAMP),'S006',NULL,NULL,59.99,'dlc_phantom','Phantom Liberty DLC','DE','PC','v2.12','{}'), ('E0030','U005',1091500,'logout',CAST('2026-05-18 11:00:00' AS TIMESTAMP),'S006',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0031','U005',1091500,'login',CAST('2026-05-19 09:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0032','U005',1091500,'level_complete',CAST('2026-05-19 11:00:00' AS TIMESTAMP),'S007',1,'Heist',NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0033','U005',1091500,'logout',CAST('2026-05-19 12:00:00' AS TIMESTAMP),'S007',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0034','U005',1091500,'login',CAST('2026-05-20 09:00:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0035','U005',1091500,'purchase',CAST('2026-05-20 09:05:00' AS TIMESTAMP),'S008',NULL,NULL,9.99,'stash_001','Stash Tab','DE','PC','v2.12','{}'), ('E0036','U005',1091500,'logout',CAST('2026-05-20 10:30:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{}'), ('E0037','U005',1091500,'achievement',CAST('2026-05-20 09:10:00' AS TIMESTAMP),'S008',NULL,NULL,NULL,NULL,NULL,'DE','PC','v2.12','{"trophy":"platinum"}'), ('E0038','U006',892970,'login',CAST('2026-05-20 13:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0039','U006',892970,'level_complete',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',1,'Limgrave',NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0040','U006',892970,'logout',CAST('2026-05-20 14:00:00' AS TIMESTAMP),'S009',NULL,NULL,NULL,NULL,NULL,'JP','Console','v1.15','{}'), ('E0041','U007',2512000,'login',CAST('2026-05-21 14:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0042','U007',2512000,'purchase',CAST('2026-05-21 14:25:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_tank','Reinhardt Skin','BR','PC','v2.1.0','{}'), ('E0043','U007',2512000,'purchase',CAST('2026-05-21 14:30:00' AS TIMESTAMP),'S011',NULL,NULL,29.99,'hero_healer','Mercy Skin','BR','PC','v2.1.0','{}'), ('E0044','U007',2512000,'level_complete',CAST('2026-05-21 14:20:00' AS TIMESTAMP),'S011',1,'Control Point',NULL,NULL,NULL,'BR','PC','v2.1.0','{}'), ('E0045','U007',2512000,'logout',CAST('2026-05-21 16:00:00' AS TIMESTAMP),'S011',NULL,NULL,NULL,NULL,NULL,'BR','PC','v2.1.0','{}');

验证 Bronze 层数据分布:

SELECT event_type, COUNT(*) AS cnt FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY event_type ORDER BY cnt DESC;

event_type | cnt ----------------+---- login | 12 level_complete | 12 logout | 12 purchase | 8 achievement | 1

Bronze 层共 45 条事件,涵盖 5 种类型,付费事件 8 条(换算付费率:8 个付费次数 / 12 次登录 ≈ 67%,注意这是模拟数据场景下的高付费率,真实场景一般为 2-5%)。


Silver 层:会话化用户行为序列

Silver 层有两张 Dynamic Table:

silver_user_sessions
silver_user_sessions
(会话聚合)和
silver_event_sequence
silver_event_sequence
(带前后序列的事件流)。

会话聚合表

session_id
session_id
为粒度聚合:统计会话时长、完成关卡数、付费金额,并关联游戏维表补充游戏名称和类型信息。

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: sessionized user behavior with 30-min idle cutoff' AS SELECT e.session_id, e.user_id, e.app_id, g.name AS game_name, g.genres AS game_genres, g.is_free AS game_is_free, e.country_code, e.device_type, MIN(e.event_time) AS session_start, MAX(e.event_time) AS session_end, TIMESTAMPDIFF(SECOND, MIN(e.event_time), MAX(e.event_time)) / 60.0 AS session_duration_min, COUNT(*) AS total_events, SUM(CASE WHEN e.event_type = 'level_complete' THEN 1 ELSE 0 END) AS levels_completed, SUM(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_count, SUM(COALESCE(e.amount_usd, 0.0)) AS session_revenue_usd, MAX(CASE WHEN e.event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, CAST(MIN(e.event_time) AS DATE) AS session_date FROM best_practice_gaming_dw.doc_bronze_game_events e LEFT JOIN best_practice_gaming_dw.doc_dim_game g ON e.app_id = g.app_id GROUP BY e.session_id, e.user_id, e.app_id, g.name, g.genres, g.is_free, e.country_code, e.device_type;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_user_sessions;

查询会话汇总:

SELECT session_id, user_id, game_name, session_duration_min, levels_completed, purchase_count, session_revenue_usd FROM best_practice_gaming_dw.doc_silver_user_sessions ORDER BY session_start;

session_id | user_id | game_name | duration_min | levels | purchases | revenue -----------+---------+--------------------+--------------+--------+-----------+-------- S012 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 0 | 0 | 0 S006 | U005 | Cyberpunk 2077 | 120.00 | 1 | 1 | 59.99 S007 | U005 | Cyberpunk 2077 | 180.00 | 1 | 0 | 0 S008 | U005 | Cyberpunk 2077 | 90.00 | 0 | 1 | 9.99 S001 | U001 | Counter-Strike 2 | 60.00 | 2 | 1 | 1.99 S002 | U002 | Dota 2 | 150.00 | 1 | 1 | 4.99 S003 | U003 | Apex Legends | 120.00 | 2 | 1 | 9.99 S009 | U006 | ELDEN RING | 60.00 | 1 | 0 | 0 S004 | U004 | PUBG: BATTLEGROUNDS| 60.00 | 1 | 0 | 0 S005 | U001 | Counter-Strike 2 | 60.00 | 1 | 1 | 14.99 S010 | U002 | Dota 2 | 120.00 | 1 | 0 | 0 S011 | U007 | Overwatch 2 | 120.00 | 1 | 2 | 59.98

12 个会话中 7 个产生付费,会话付费率 58.3%;总会话时长从 60 分钟(单局场景)到 180 分钟(RPG 深度游戏)。

事件序列表(LAG/LEAD 付费路径分析)

使用

LAG
LAG
LEAD
LEAD
还原每条事件的前置步骤和后续步骤,以及累计完成关卡数,为付费决策路径分析提供基础数据。

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence REFRESH INTERVAL 5 MINUTE VCLUSTER DEFAULT COMMENT 'Silver: event sequence with LAG/LEAD for payment path analysis' AS SELECT event_id, user_id, app_id, event_type, event_time, session_id, amount_usd, item_id, item_name, country_code, -- 同会话中前一个事件 LAG(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_type, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS prev_event_time, -- 同会话中后一个事件 LEAD(event_type, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_type, LEAD(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS next_event_time, -- 距离前一个事件的秒数 TIMESTAMPDIFF(SECOND, LAG(event_time, 1) OVER (PARTITION BY user_id, session_id ORDER BY event_time), event_time ) AS seconds_since_prev, -- 付费前累计完成的关卡数(跨会话,按用户全局计算) SUM(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) OVER (PARTITION BY user_id ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS levels_before, CAST(event_time AS DATE) AS event_date FROM best_practice_gaming_dw.doc_bronze_game_events;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_silver_event_sequence;

查询所有付费事件的前后步骤:

SELECT user_id, event_type, prev_event_type AS prev_step, next_event_type AS next_step, levels_before AS levels_done_before_pay, seconds_since_prev AS secs_to_decide FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' ORDER BY event_time;

user_id | event_type | prev_step | next_step | levels_before | secs_to_decide --------+------------+----------------+----------------+---------------+--------------- U005 | purchase | level_complete | logout | 1 | 300 U005 | purchase | login | achievement | 2 | 300 U001 | purchase | level_complete | level_complete | 1 | 180 U002 | purchase | level_complete | logout | 1 | 600 U003 | purchase | level_complete | logout | 2 | 1800 U001 | purchase | level_complete | logout | 3 | 360 U007 | purchase | level_complete | purchase | 1 | 300 U007 | purchase | purchase | logout | 1 | 300

进一步聚合——付费前置步骤分布:

SELECT prev_event_type AS prev_step, COUNT(*) AS pay_count, ROUND(AVG(seconds_since_prev), 0) AS avg_decision_secs FROM best_practice_gaming_dw.doc_silver_event_sequence WHERE event_type = 'purchase' GROUP BY prev_event_type ORDER BY pay_count DESC;

prev_step | pay_count | avg_decision_secs ----------------+-----------+------------------ level_complete | 6 | 590 purchase | 1 | 300 login | 1 | 300

结果解读:75% 的付费发生在完成关卡之后(

level_complete → purchase
level_complete → purchase
),平均决策时间 590 秒(约 10 分钟);另有 12.5% 的付费发生在登录后直接进行,这类用户是主动回来充值的高意愿玩家,平均等待 300 秒。该分布可用于触发时机优化:在
level_complete
level_complete
后 5-10 分钟推送商品卡片。


Gold 层:LTV 分层、漏斗、留存矩阵

用户 LTV 分层

user_id
user_id
为粒度,统计累计付费、活跃天数、游戏种类,并输出 LTV 分层(Whale/Dolphin/Minnow/Free)和综合参与度评分。

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: user LTV segmentation - total spend, sessions, active days' AS SELECT s.user_id, COUNT(DISTINCT s.session_id) AS total_sessions, COUNT(DISTINCT s.session_date) AS active_days, SUM(s.session_duration_min) AS total_playtime_min, SUM(s.session_revenue_usd) AS total_revenue_usd, COUNT(DISTINCT s.app_id) AS distinct_games_played, MAX(s.session_date) AS last_active_date, MIN(s.session_date) AS first_active_date, -- LTV 分层:按累计充值金额 CASE WHEN SUM(s.session_revenue_usd) >= 50 THEN 'Whale' WHEN SUM(s.session_revenue_usd) >= 10 THEN 'Dolphin' WHEN SUM(s.session_revenue_usd) > 0 THEN 'Minnow' ELSE 'Free' END AS ltv_tier, -- 参与度评分:活跃天数 40% + 游戏时长 40% + 付费 20% ROUND( 0.4 * LEAST(COUNT(DISTINCT s.session_date) / 7.0, 1.0) * 100 + 0.4 * LEAST(SUM(s.session_duration_min) / 300.0, 1.0) * 100 + 0.2 * LEAST(SUM(s.session_revenue_usd) / 20.0, 1.0) * 100, 1) AS engagement_score FROM best_practice_gaming_dw.doc_silver_user_sessions s GROUP BY s.user_id;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_user_ltv;

查询各用户 LTV 分层:

SELECT user_id, total_sessions, active_days, total_playtime_min, total_revenue_usd, ltv_tier, engagement_score FROM best_practice_gaming_dw.doc_gold_user_ltv ORDER BY total_revenue_usd DESC;

user_id | sessions | active_days | playtime_min | revenue_usd | ltv_tier | engagement_score --------+----------+-------------+--------------+-------------+----------+----------------- U005 | 3 | 3 | 390.00 | 69.98 | Whale | 77.1 U007 | 1 | 1 | 120.00 | 59.98 | Whale | 41.7 U001 | 2 | 2 | 120.00 | 16.98 | Dolphin | 44.4 U003 | 1 | 1 | 120.00 | 9.99 | Minnow | 31.7 U002 | 2 | 2 | 270.00 | 4.99 | Minnow | 52.4 U004 | 2 | 2 | 120.00 | 0 | Free | 27.4 U006 | 1 | 1 | 60.00 | 0 | Free | 13.7

结果解读:7 个用户中 2 个 Whale(累计付费 ≥50 美元),贡献了 71.3% 的总收入(129.96/161.94)——这是 F2P 游戏的典型"二八效应"。U005 参与度评分 77.1 最高,因为他/她连续 3 天活跃、游戏时长最长,说明高参与度与高付费存在正相关。

LTV 分层汇总:

SELECT ltv_tier, COUNT(*) AS user_count, ROUND(SUM(total_revenue_usd), 2) AS total_rev, ROUND(AVG(total_revenue_usd), 2) AS avg_rev FROM best_practice_gaming_dw.doc_gold_user_ltv GROUP BY ltv_tier ORDER BY avg_rev DESC;

ltv_tier | user_count | total_rev | avg_rev ---------+------------+-----------+-------- Whale | 2 | 129.96 | 64.98 Dolphin | 1 | 16.98 | 16.98 Minnow | 2 | 14.98 | 7.49 Free | 2 | 0 | 0

使用 BITMAP 函数计算 DAU

BITMAP 函数适合需要按天统计 DAU,并跨天合并计算 MAU 的场景。前提:用户 ID 需要转换为整数(

CAST(SUBSTR(user_id, 2) AS INT)
CAST(SUBSTR(user_id, 2) AS INT)
将 "U001" 转为 1)。

方案一:GROUP_BITMAP 直接计算每日唯一用户数

SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP(CAST(SUBSTR(user_id, 2) AS INT)) AS dau FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ORDER BY event_date;

event_date | dau ------------+---- 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3

方案二:GROUP_BITMAP_STATE + GROUP_BITMAP_MERGE 跨天去重(MAU 场景)

GROUP_BITMAP_STATE
GROUP_BITMAP_STATE
生成每天的 bitmap 状态对象,
GROUP_BITMAP_MERGE
GROUP_BITMAP_MERGE
将多天的 bitmap 合并后统计跨天去重用户数:

WITH daily_bitmaps AS ( SELECT CAST(event_time AS DATE) AS event_date, GROUP_BITMAP_STATE(CAST(SUBSTR(user_id, 2) AS INT)) AS bm FROM best_practice_gaming_dw.doc_bronze_game_events GROUP BY CAST(event_time AS DATE) ) SELECT event_date, GROUP_BITMAP_MERGE(bm) AS cumulative_unique_users FROM daily_bitmaps GROUP BY event_date ORDER BY event_date;

event_date | cumulative_unique_users ------------+------------------------ 2026-05-10 | 1 2026-05-18 | 1 2026-05-19 | 1 2026-05-20 | 6 2026-05-21 | 3

付费转化漏斗

按日统计"登录 → 参与关卡 → 完成付费"三步漏斗的转化率。

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: payment conversion funnel - login → engage → purchase' AS WITH user_daily AS ( SELECT user_id, event_date, MAX(CASE WHEN event_type = 'login' THEN 1 ELSE 0 END) AS has_login, MAX(CASE WHEN event_type = 'level_complete' THEN 1 ELSE 0 END) AS has_level, MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS has_purchase, SUM(COALESCE(amount_usd, 0.0)) AS day_revenue FROM best_practice_gaming_dw.doc_silver_event_sequence GROUP BY user_id, event_date ), funnel_agg AS ( SELECT event_date, SUM(has_login) AS step1_login, SUM(has_level) AS step2_engage, SUM(has_purchase) AS step3_purchase, SUM(day_revenue) AS total_revenue FROM user_daily GROUP BY event_date ) SELECT event_date, step1_login, step2_engage, step3_purchase, ROUND(step2_engage * 100.0 / NULLIF(step1_login, 0), 1) AS engage_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step2_engage, 0), 1) AS purchase_rate_pct, ROUND(step3_purchase * 100.0 / NULLIF(step1_login, 0), 1) AS overall_conversion_pct, total_revenue FROM funnel_agg ORDER BY event_date;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_payment_funnel;

SELECT * FROM best_practice_gaming_dw.doc_gold_payment_funnel ORDER BY event_date;

event_date | login | engage | purchase | engage_rate | pay_rate | cvr_overall | revenue ------------+-------+--------+----------+-------------+----------+-------------+-------- 2026-05-10 | 1 | 0 | 0 | 0.0 | null | 0.0 | 0 2026-05-18 | 1 | 1 | 1 | 100.0 | 100.0 | 100.0 | 59.99 2026-05-19 | 1 | 1 | 0 | 100.0 | 0.0 | 0.0 | 0 2026-05-20 | 6 | 5 | 4 | 83.3 | 80.0 | 66.7 | 26.96 2026-05-21 | 3 | 3 | 2 | 100.0 | 66.7 | 66.7 | 74.97

用户留存矩阵

以用户首次登录日期为队列(Cohort),统计 D1、D7 留存率。

CREATE OR REPLACE DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix REFRESH INTERVAL 10 MINUTE VCLUSTER DEFAULT COMMENT 'Gold: N-day retention cohort matrix' AS WITH first_login AS ( SELECT user_id, MIN(CAST(event_time AS DATE)) AS cohort_date FROM best_practice_gaming_dw.doc_bronze_game_events WHERE event_type = 'login' GROUP BY user_id ), active_days AS ( SELECT DISTINCT e.user_id, CAST(e.event_time AS DATE) AS active_date, f.cohort_date, DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) AS day_offset FROM best_practice_gaming_dw.doc_bronze_game_events e JOIN first_login f ON e.user_id = f.user_id WHERE DATEDIFF(CAST(e.event_time AS DATE), f.cohort_date) BETWEEN 0 AND 30 ) SELECT cohort_date, COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END) AS d0_users, COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) AS d1_retained, COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) AS d7_retained, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 1 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d1_retention_pct, ROUND(COUNT(DISTINCT CASE WHEN day_offset = 7 THEN user_id END) * 100.0 / NULLIF(COUNT(DISTINCT CASE WHEN day_offset = 0 THEN user_id END), 0), 1) AS d7_retention_pct FROM active_days GROUP BY cohort_date ORDER BY cohort_date;

REFRESH DYNAMIC TABLE best_practice_gaming_dw.doc_gold_retention_matrix;

SELECT * FROM best_practice_gaming_dw.doc_gold_retention_matrix ORDER BY cohort_date;

cohort_date | d0_users | d1_retained | d7_retained | d1_pct | d7_pct ------------+----------+-------------+-------------+--------+------- 2026-05-10 | 1 | 0 | 0 | 0.0 | 0.0 2026-05-18 | 1 | 1 | 0 | 100.0 | 0.0 2026-05-20 | 4 | 2 | 0 | 50.0 | 0.0 2026-05-21 | 1 | 0 | 0 | 0.0 | 0.0

结果解读:5 月 18 日入队的用户(U005)次日留存 100%(连续 3 天登录);5 月 20 日入队的 4 个用户中 2 个次日回归(D1 留存 50%);模拟数据时间跨度较短(10 天内),所有队列 D7 留存均为 0,生产环境下需要积累至少 7 天的用户行为数据。


注意事项

  • Kafka PIPE DDL 阶段会尝试连接 broker

    CREATE PIPE
    CREATE PIPE
    时系统会验证 broker 地址和 topic 是否可达。开发环境没有 Kafka 时,先创建目标表,PIPE 建立后通过直接 INSERT 模拟摄取流程。

  • BLOOMFILTER INDEX 创建后对存量数据不生效

    CREATE BLOOMFILTER INDEX
    CREATE BLOOMFILTER INDEX
    只对新写入的数据块生效,存量数据需执行
    BUILD INDEX ON TABLE ... (column)
    BUILD INDEX ON TABLE ... (column)
    重建(BLOOMFILTER 类型不支持
    BUILD INDEX
    BUILD INDEX
    ,因此存量数据不能补充建索引;考虑用 INVERTED INDEX 替代需要范围查询的场景)。

  • GROUP_BITMAP 要求整数类型输入:用户 ID 如果是字符串(如 "U001"),需要先转换为整数后才能使用 BITMAP 函数。常见做法是在 Bronze 层维护一张

    user_id → int_id
    user_id → int_id
    映射表,在 Silver 层 JOIN 后使用
    int_id
    int_id
    参与 BITMAP 运算。

  • 漏斗"付费率 > 100%"是正常现象:当天同一用户多次付费时,按"付费次数/登录人数"的计算方式会超过 100%。报表中建议同时提供"付费人数"(COUNT DISTINCT user_id)和"付费次数"两个口径,避免误读。

  • Dynamic Table 首次刷新需要手动触发

    CREATE DYNAMIC TABLE
    CREATE DYNAMIC TABLE
    后系统不会立即执行首次计算,需要
    REFRESH DYNAMIC TABLE <table>
    REFRESH DYNAMIC TABLE <table>
    触发。后续按
    REFRESH INTERVAL
    REFRESH INTERVAL
    自动增量刷新。

  • LAST_VALUE 窗口帧陷阱

    LAST_VALUE
    LAST_VALUE
    的默认帧是
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ,导致在每行返回当前行的值而不是最后一行。需要显式指定
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING


相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询