构建 SaaS 多租户运营数仓

将 SaaS 平台的功能用量事件、账号订阅数据和流失记录整合为四层数仓,输出租户健康评分和流失预警信号。本文以 RavenStack SaaS 数据集为基础(500 个租户、5,000 条订阅记录、5,000 条用量事件、600 条流失记录),端到端演示 Kafka PIPE → ODS → DWD → DWS → ADS 的完整构建过程,并覆盖 Dynamic Table 级联刷新、SQL UDF 健康评分、Column Masking、RBAC 和 Semantic View 五项平台能力的落地用法。


概述

SaaS 运营团队面临的核心数据挑战是:功能用量事件来自高频 Kafka 流,账号和订阅数据来自 MySQL 业务库,两类数据需要按租户维度整合后才能输出有意义的流失预警信号。

云器 Lakehouse 通过以下组合解决几个核心问题:

问题云器解决方案
功能用量事件实时写入,每秒大量消息Kafka PIPE 持续摄取,60 秒批量写入 ODS 原始表
账号、订阅、流失表来自 MySQL,需要 CDC 同步Studio MySQL CDC 实时同步任务,自动捕捉增删改
ODS → DWD → DWS → ADS 多层计算依赖链长Dynamic Table 级联刷新,声明式 SQL,系统自动管理依赖顺序
健康评分逻辑复杂,多层复用SQL UDF
calc_tenant_health_score
calc_tenant_health_score
,封装加权公式
销售、客户成功、分析师对同一张表权限不同Column Masking + RBAC,按角色控制字段可见性
Analytics Agent 需要自然语言查询租户数据Semantic View,封装业务语义,供 Agent 直接调用

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
建 ODS 层原始表静态表,作为 Dynamic Table 上游
CREATE PIPE
CREATE PIPE
创建 Kafka 持续摄取管道绑定到 ODS 用量原始表
CREATE FUNCTION
CREATE FUNCTION
创建 SQL UDF
calc_tenant_health_score
calc_tenant_health_score
封装租户健康评分加权公式
ALTER TABLE ... CHANGE COLUMN ... SET MASK
ALTER TABLE ... CHANGE COLUMN ... SET MASK
绑定 Column Masking 策略
account_name
account_name
等敏感字段脱敏
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 DWD / DWS / ADS 层增量计算表系统自动识别上游变更并增量刷新
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新初次构建或调试时使用
CREATE VIEW
CREATE VIEW
创建 Semantic View封装业务语义供 Analytics Agent 查询

前置准备

本文所有示例在

best_practice_saas_dw
best_practice_saas_dw
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_saas_dw;

数据集来自 RavenStack SaaS Subscription & Churn Analytics Dataset(MIT 协议),包含五张表:

ravenstack_accounts
ravenstack_accounts
ravenstack_subscriptions
ravenstack_subscriptions
ravenstack_feature_usage
ravenstack_feature_usage
ravenstack_churn_events
ravenstack_churn_events
ravenstack_support_tickets
ravenstack_support_tickets
。本文使用前四张。

kaggle datasets download -d rivalytics/saas-subscription-and-churn-analytics-dataset \ --unzip -p /tmp/saas_dw/


ODS 层:原始数据接入

建表

ODS 层保留三类数据源的原始数据:Kafka 实时用量事件、MySQL CDC 账号订阅数据、MySQL CDC 流失事件。

功能用量 Kafka 接收表

CREATE TABLE IF NOT EXISTS best_practice_saas_dw.ods_kafka_raw_usage ( value STRING, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

ods_kafka_raw_usage
ods_kafka_raw_usage
接收 Kafka 消息的原始 JSON 字符串,下游通过解析
value
value
字段写入
ods_feature_usage
ods_feature_usage

功能用量明细表

CREATE TABLE IF NOT EXISTS best_practice_saas_dw.ods_feature_usage ( usage_id STRING, subscription_id STRING, usage_date DATE, feature_name STRING, usage_count INT, usage_duration_secs INT, error_count INT, is_beta_feature BOOLEAN, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

账号主数据表

CREATE TABLE IF NOT EXISTS best_practice_saas_dw.ods_accounts ( account_id STRING, account_name STRING, industry STRING, country STRING, signup_date DATE, referral_source STRING, plan_tier STRING, seats INT, is_trial BOOLEAN, churn_flag BOOLEAN, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

订阅表

CREATE TABLE IF NOT EXISTS best_practice_saas_dw.ods_subscriptions ( subscription_id STRING, account_id STRING, start_date DATE, end_date DATE, plan_tier STRING, seats INT, mrr_amount DOUBLE, arr_amount DOUBLE, is_trial BOOLEAN, upgrade_flag BOOLEAN, downgrade_flag BOOLEAN, churn_flag BOOLEAN, billing_frequency STRING, auto_renew_flag BOOLEAN, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

流失事件表

CREATE TABLE IF NOT EXISTS best_practice_saas_dw.ods_churn_events ( churn_event_id STRING, account_id STRING, churn_date DATE, reason_code STRING, refund_amount_usd DOUBLE, preceding_upgrade_flag BOOLEAN, preceding_downgrade_flag BOOLEAN, is_reactivation BOOLEAN, feedback_text STRING, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

配置 Kafka PIPE

通过 Studio 的"新建 Kafka 实时同步任务"接入功能用量事件流。Kafka PIPE 在 DDL 阶段验证 broker 连接,正式环境替换

KAFKA_BROKER
KAFKA_BROKER
TOPIC
TOPIC
后执行:

CREATE PIPE IF NOT EXISTS best_practice_saas_dw.pipe_feature_usage VIRTUAL_CLUSTER = 'DEFAULT' BATCH_INTERVAL_IN_SECONDS = '60' AS COPY INTO best_practice_saas_dw.ods_kafka_raw_usage FROM ( SELECT CAST(value AS STRING) AS value FROM READ_KAFKA( '<kafka-broker>:9092', -- 替换为实际 broker 地址 'saas_feature_usage', -- topic 名称 '', 'cz_saas_consumer', -- consumer group ID '', '', '', '', 'raw', 'raw', 0, map() ) );

方式一:通过 Kafka 实际写入(推荐)

在正式 Kafka 环境中,向 topic

saas_feature_usage
saas_feature_usage
发送消息即可触发 PIPE 摄取:

from kafka import KafkaProducer import json, uuid from datetime import date producer = KafkaProducer( bootstrap_servers=['<kafka-broker>:9092'], value_serializer=lambda v: json.dumps(v).encode('utf-8') ) # 构造功能用量消息 message = { "usage_id": f"U-{uuid.uuid4().hex[:6]}", "subscription_id": "S-0fcf7d", "usage_date": str(date.today()), "feature_name": "feature_10", "usage_count": 7, "usage_duration_secs": 3120, "error_count": 0, "is_beta_feature": False } producer.send('saas_feature_usage', value=message) producer.flush()

方式二:INSERT 模拟(无 Kafka 环境时)

如果暂未配置 Kafka,可以通过以下方式写入

ods_feature_usage
ods_feature_usage

从本地 CSV 导入(推荐)

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/your/ods_feature_usage.csv' TO USER VOLUME FILE 'ods_feature_usage.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_saas_dw.ods_feature_usage FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('ods_feature_usage.csv');

也可直接内联插入小批量测试数据(不需要 CSV 文件):

INSERT INTO best_practice_saas_dw.ods_feature_usage (usage_id, subscription_id, usage_date, feature_name, usage_count, usage_duration_secs, error_count, is_beta_feature) VALUES ('U-1c6c24','S-0fcf7d', CAST('2023-07-27' AS DATE),'feature_20',9,5004,0,FALSE), ('U-f07cb8','S-c25263', CAST('2023-08-07' AS DATE),'feature_5', 9,369, 0,FALSE), ('U-a3b9d1','S-8cec59', CAST('2024-03-15' AS DATE),'feature_10',12,3200,1,TRUE), ('U-c4e2f0','S-0f6f44', CAST('2024-06-20' AS DATE),'feature_26',5,2800,0,FALSE);

验证 ODS 层数据量(以本文使用的完整数据集为例):

SELECT (SELECT COUNT(*) FROM best_practice_saas_dw.ods_accounts) AS accounts, (SELECT COUNT(*) FROM best_practice_saas_dw.ods_subscriptions) AS subscriptions, (SELECT COUNT(*) FROM best_practice_saas_dw.ods_feature_usage) AS feature_usage, (SELECT COUNT(*) FROM best_practice_saas_dw.ods_churn_events) AS churn_events;

accounts | subscriptions | feature_usage | churn_events ---------+---------------+---------------+------------- 500 | 5000 | 5000 | 600

Column Masking:账号名脱敏

account_name
account_name
是客户公司名称,属于敏感数据。方案:管理员看原始名称,其他角色只显示前 3 个字符加掩码。

CREATE OR REPLACE FUNCTION best_practice_saas_dw.mask_account_name(name STRING) RETURNS STRING AS CASE WHEN current_user() IN ('privileged_user') THEN name -- 替换为实际授权用户名 ELSE CONCAT(SUBSTR(name, 1, 3), '****') END;

ALTER TABLE best_practice_saas_dw.ods_accounts CHANGE COLUMN account_name SET MASK best_practice_saas_dw.mask_account_name;

验证脱敏效果(管理员账号看到原始名称):

SELECT account_id, account_name, plan_tier FROM best_practice_saas_dw.ods_accounts LIMIT 5;

account_id | account_name | plan_tier -----------+--------------+---------- A-3be56b | Company_200 | Pro A-354f12 | Company_201 | Pro A-68f37c | Company_202 | Basic A-aa9511 | Company_203 | Pro A-af0cd2 | Company_204 | Enterprise

MySQL CDC 同步配置

账号、订阅、流失三张表通过 Studio 的 MySQL CDC 实时同步任务持续接入。在 Studio 中进入 数据集成 → 新建任务 → MySQL 实时 CDC,填写:

  • 源端:MySQL 地址、账号密码、要同步的库表(
    accounts
    accounts
    subscriptions
    subscriptions
    churn_events
    churn_events
  • 目标端:
    best_practice_saas_dw.ods_accounts
    best_practice_saas_dw.ods_accounts
    等对应表
  • 任务路径:
    best_practices/saas_dw/
    best_practices/saas_dw/

CDC 任务启动后会先全量同步一次存量数据,随后进入增量捕获模式,INSERT / UPDATE / DELETE 实时写入 ODS 层。


DWD 层:租户-功能用量事件宽表

DWD 层将

ods_feature_usage
ods_feature_usage
(用量明细)、
ods_subscriptions
ods_subscriptions
(订阅)、
ods_accounts
ods_accounts
(账号)三张表 JOIN 展开为一张宽表,后续 DWS 和 ADS 层直接基于此表聚合,无需重复 JOIN。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_saas_dw.dwd_tenant_feature_usage AS SELECT fu.usage_id, fu.subscription_id, s.account_id, a.account_name, a.industry, a.country, a.plan_tier, a.seats, a.churn_flag AS account_churn_flag, fu.usage_date, fu.feature_name, fu.usage_count, fu.usage_duration_secs, fu.error_count, fu.is_beta_feature, s.mrr_amount, s.billing_frequency, s.upgrade_flag, s.downgrade_flag, s.churn_flag AS subscription_churn_flag, fu.ingest_time FROM best_practice_saas_dw.ods_feature_usage fu JOIN best_practice_saas_dw.ods_subscriptions s ON fu.subscription_id = s.subscription_id JOIN best_practice_saas_dw.ods_accounts a ON s.account_id = a.account_id;

在 Studio 创建刷新任务

在 Studio 中进入 开发 → 任务,路径

best_practices/saas_dw/
best_practices/saas_dw/
,新建任务:

  • 任务类型:刷新动态表
  • 目标表:
    best_practice_saas_dw.dwd_tenant_feature_usage
    best_practice_saas_dw.dwd_tenant_feature_usage
  • 调度周期:每 5 分钟(按实际业务延迟需求调整)
  • 可在此任务上追加:数据质量规则(行数波动告警)、运行超时告警

首次手动刷新

REFRESH DYNAMIC TABLE best_practice_saas_dw.dwd_tenant_feature_usage;

SELECT COUNT(*) AS dwd_row_count FROM best_practice_saas_dw.dwd_tenant_feature_usage;

dwd_row_count ------------- 5000

验证宽表关联正确:

SELECT account_id, account_name, plan_tier, feature_name, usage_count, usage_date FROM best_practice_saas_dw.dwd_tenant_feature_usage LIMIT 5;

account_id | account_name | plan_tier | feature_name | usage_count | usage_date -----------+--------------+------------+--------------+-------------+----------- A-8b0451 | Company_245 | Enterprise | feature_10 | 7 | 2023-01-11 A-337bc1 | Company_424 | Pro | feature_36 | 5 | 2023-01-12 A-05d3b3 | Company_213 | Basic | feature_35 | 13 | 2024-11-08 A-039727 | Company_107 | Pro | feature_10 | 10 | 2024-08-13 A-bcf664 | Company_248 | Pro | feature_26 | 12 | 2024-11-09


DWS 层:租户月度指标聚合

DWS 层以

account_id
account_id
+ 月份为粒度聚合 DWD 层数据,计算每个租户每月的功能使用广度(
distinct_features_used
distinct_features_used
)、深度(
total_usage_count
total_usage_count
)、错误率、MRR 等核心指标,作为 ADS 层健康评分的直接输入。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_saas_dw.dws_tenant_monthly_metrics AS SELECT account_id, MAX(account_name) AS account_name, MAX(industry) AS industry, MAX(country) AS country, MAX(plan_tier) AS plan_tier, MAX(seats) AS seats, DATE_TRUNC('month', usage_date) AS usage_month, COUNT(DISTINCT feature_name) AS distinct_features_used, COUNT(DISTINCT CASE WHEN is_beta_feature THEN feature_name END) AS beta_features_used, SUM(usage_count) AS total_usage_count, SUM(usage_duration_secs) AS total_duration_secs, ROUND(AVG(usage_duration_secs), 1) AS avg_duration_secs, SUM(error_count) AS total_errors, COUNT(*) AS usage_event_count, MAX(mrr_amount) AS mrr_amount, MAX(CAST(account_churn_flag AS INT)) AS churn_flag, MAX(CAST(upgrade_flag AS INT)) AS upgrade_flag, MAX(CAST(downgrade_flag AS INT)) AS downgrade_flag FROM best_practice_saas_dw.dwd_tenant_feature_usage GROUP BY account_id, DATE_TRUNC('month', usage_date);

同样在 Studio

best_practices/saas_dw/
best_practices/saas_dw/
路径下创建刷新任务,调度周期可设为每小时(DWS 刷新时间晚于 DWD)。

手动触发首次刷新:

REFRESH DYNAMIC TABLE best_practice_saas_dw.dws_tenant_monthly_metrics; SELECT COUNT(*) AS dws_row_count FROM best_practice_saas_dw.dws_tenant_monthly_metrics;

dws_row_count ------------- 4032

查看最近一个月的高活跃租户:

SELECT account_id, account_name, plan_tier, usage_month, distinct_features_used, total_usage_count, total_errors, mrr_amount, churn_flag FROM best_practice_saas_dw.dws_tenant_monthly_metrics ORDER BY usage_month DESC, total_usage_count DESC LIMIT 8;

account_id | account_name | plan_tier | usage_month | distinct_features_used | total_usage_count | total_errors | mrr_amount | churn_flag -----------+--------------+------------+---------------------+------------------------+-------------------+--------------+------------+----------- A-4ef964 | Company_262 | Pro | 2024-12-01T00:00:00 | 4 | 62 | 6 | 3383 | 0 A-81edc3 | Company_236 | Basic | 2024-12-01T00:00:00 | 3 | 38 | 2 | 2009 | 0 A-503d5a | Company_99 | Enterprise | 2024-12-01T00:00:00 | 3 | 37 | 4 | 2388 | 1 A-a1bbb6 | Company_74 | Basic | 2024-12-01T00:00:00 | 2 | 34 | 0 | 2189 | 0 A-9b9fe9 | Company_71 | Basic | 2024-12-01T00:00:00 | 3 | 33 | 0 | 3332 | 0 A-9f9299 | Company_133 | Enterprise | 2024-12-01T00:00:00 | 3 | 31 | 2 | 1372 | 0 A-c7ffc2 | Company_392 | Pro | 2024-12-01T00:00:00 | 3 | 31 | 3 | 5970 | 0 A-4c38bc | Company_55 | Basic | 2024-12-01T00:00:00 | 3 | 31 | 2 | 7164 | 1

结果解读

A-503d5a
A-503d5a
A-4c38bc
A-4c38bc
在当月属于高活跃租户(
total_usage_count
total_usage_count
较高),但
churn_flag=1
churn_flag=1
,说明高用量并不总是流失防火墙——可能是在流失前的最后一次集中使用,ADS 层需要结合历史趋势综合判断。


ADS 层:租户健康评分与流失风险分层

健康评分 SQL UDF

将评分逻辑封装为 SQL UDF,所有下游 Dynamic Table 和临时查询都可复用。

评分公式:

  • 功能广度(50 分):使用的功能数量越多,参与度越高,上限 5 个功能满分
  • 使用量(20 分):月度用量越高,说明产品融入业务越深,上限 50 次满分
  • 错误率扣分(-15 分):错误占比越高,用户体验越差,最多扣 15 分
  • 升级奖励(+20 分):发生升级说明客户满意度高
  • 降级惩罚(-20 分):降级是流失的前兆信号
  • 基础分(+10 分):有活跃用量的租户基础保底

CREATE FUNCTION best_practice_saas_dw.calc_tenant_health_score( distinct_features_used DOUBLE, total_usage_count DOUBLE, total_errors DOUBLE, usage_event_count DOUBLE, upgrade_flag DOUBLE, downgrade_flag DOUBLE ) RETURNS DOUBLE AS GREATEST(0.0, LEAST(100.0, 50.0 * LEAST(1.0, distinct_features_used / 5.0) + 20.0 * LEAST(1.0, total_usage_count / 50.0) - 15.0 * CASE WHEN usage_event_count > 0.0 THEN LEAST(1.0, total_errors / usage_event_count) ELSE 0.0 END + 20.0 * upgrade_flag - 20.0 * downgrade_flag + 10.0 ));

验证函数效果:

SELECT -- 高健康:6 个功能、80 次用量、无错误、有升级 best_practice_saas_dw.calc_tenant_health_score( CAST(6 AS INT), CAST(80 AS BIGINT), CAST(0 AS BIGINT), CAST(20 AS BIGINT), CAST(1 AS INT), CAST(0 AS INT) ) AS high_health, -- 中等:4 个功能、40 次用量、2/15 错误率 best_practice_saas_dw.calc_tenant_health_score( CAST(4 AS INT), CAST(40 AS BIGINT), CAST(2 AS BIGINT), CAST(15 AS BIGINT), CAST(0 AS INT), CAST(0 AS INT) ) AS mid_health, -- 低健康:2 个功能、10 次用量、5/10 错误率、有降级 best_practice_saas_dw.calc_tenant_health_score( CAST(2 AS INT), CAST(10 AS BIGINT), CAST(5 AS BIGINT), CAST(10 AS BIGINT), CAST(0 AS INT), CAST(1 AS INT) ) AS low_health;

high_health | mid_health | low_health ------------+------------+----------- 100 | 64 | 6.5

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_saas_dw.ads_tenant_health_score AS SELECT m.account_id, m.account_name, m.industry, m.country, m.plan_tier, m.seats, m.usage_month, m.distinct_features_used, m.total_usage_count, m.total_errors, m.usage_event_count, m.beta_features_used, m.mrr_amount, m.churn_flag, m.upgrade_flag, m.downgrade_flag, ROUND(best_practice_saas_dw.calc_tenant_health_score( CAST(m.distinct_features_used AS INT), m.total_usage_count, m.total_errors, m.usage_event_count, CAST(m.upgrade_flag AS INT), CAST(m.downgrade_flag AS INT) ), 1) AS health_score, CASE WHEN best_practice_saas_dw.calc_tenant_health_score( CAST(m.distinct_features_used AS INT), m.total_usage_count, m.total_errors, m.usage_event_count, CAST(m.upgrade_flag AS INT), CAST(m.downgrade_flag AS INT) ) >= 70 THEN 'HEALTHY' WHEN best_practice_saas_dw.calc_tenant_health_score( CAST(m.distinct_features_used AS INT), m.total_usage_count, m.total_errors, m.usage_event_count, CAST(m.upgrade_flag AS INT), CAST(m.downgrade_flag AS INT) ) >= 40 THEN 'AT_RISK' ELSE 'CHURN_RISK' END AS health_tier, ce.churn_date, ce.reason_code AS churn_reason FROM best_practice_saas_dw.dws_tenant_monthly_metrics m LEFT JOIN best_practice_saas_dw.ods_churn_events ce ON m.account_id = ce.account_id AND DATE_TRUNC('month', ce.churn_date) = m.usage_month;

在 Studio

best_practices/saas_dw/
best_practices/saas_dw/
下创建 ADS 层刷新任务,调度时间晚于 DWS 刷新完成时间(例如 DWS 每整点刷新,ADS 设为整点后 10 分钟)。

手动触发首次刷新:

REFRESH DYNAMIC TABLE best_practice_saas_dw.ads_tenant_health_score; SELECT COUNT(*) AS ads_count FROM best_practice_saas_dw.ads_tenant_health_score;

ads_count --------- 4046

健康分层分布

SELECT health_tier, COUNT(*) AS tenant_month_count, ROUND(AVG(health_score), 1) AS avg_score FROM best_practice_saas_dw.ads_tenant_health_score GROUP BY health_tier ORDER BY avg_score DESC;

health_tier | tenant_month_count | avg_score ------------+--------------------+---------- HEALTHY | 9 | 74.1 AT_RISK | 487 | 47.4 CHURN_RISK | 3550 | 20.7

结果解读:数据集中 CHURN_RISK 比例高(87.7%),反映了 RavenStack 数据集的特点——该数据集以中小功能用量为主,多数租户每月仅使用 1-2 个功能,触发

distinct_features_used / 5.0
distinct_features_used / 5.0
这一项权重使基础分偏低。这也符合真实 SaaS 场景:大多数租户处于浅度使用阶段,深度参与的头部租户才能进入 HEALTHY 区间。

健康分层与实际流失的交叉验证:

SELECT health_tier, churn_flag, COUNT(*) AS cnt FROM best_practice_saas_dw.ads_tenant_health_score GROUP BY health_tier, churn_flag ORDER BY health_tier, churn_flag;

health_tier | churn_flag | cnt ------------+------------+----- AT_RISK | 0 | 380 AT_RISK | 1 | 107 CHURN_RISK | 0 | 2724 CHURN_RISK | 1 | 826 HEALTHY | 0 | 9

AT_RISK 层中约 22% 的租户已流失,CHURN_RISK 层中约 23% 已流失。两层之间流失率接近,说明评分模型对当前数据集的分层区分度有提升空间——可以考虑引入历史趋势特征(连续下降月数)和外部预测模型(External Function)进一步优化。

流失原因分析

SELECT reason_code, COUNT(*) AS churn_count, ROUND(AVG(refund_amount_usd), 2) AS avg_refund FROM best_practice_saas_dw.ods_churn_events GROUP BY reason_code ORDER BY churn_count DESC;

reason_code | churn_count | avg_refund ------------+-------------+----------- features | 114 | 16.72 budget | 104 | 12.00 support | 104 | 11.73 unknown | 95 | 18.34 competitor | 92 | 13.08 pricing | 91 | 14.65

功能缺失(

features
features
)是第一大流失原因,说明产品功能竞争力是首要改进方向。
unknown
unknown
原因(95 条)和平均退款金额偏高,提示在客户流失时收集更结构化的反馈有重要价值。


跨套餐流失率分析

SELECT plan_tier, COUNT(*) AS accounts, ROUND(AVG(seats), 1) AS avg_seats, SUM(CAST(churn_flag AS INT)) AS churned_accounts, ROUND(100.0 * SUM(CAST(churn_flag AS INT)) / COUNT(*), 1) AS churn_rate_pct FROM best_practice_saas_dw.ods_accounts GROUP BY plan_tier ORDER BY churn_rate_pct DESC;

plan_tier | accounts | avg_seats | churned_accounts | churn_rate_pct -----------+----------+-----------+------------------+--------------- Enterprise | 154 | 19.7 | 34 | 22.1 Basic | 168 | 22.0 | 37 | 22.0 Pro | 178 | 19.9 | 39 | 21.9

三个套餐的流失率高度接近(均约 22%),说明流失问题不集中在某一套餐,而是整体性的产品留存挑战。这也意味着针对特定套餐的优惠策略效果有限,更应该聚焦在功能使用深度和支持质量上。


功能用量热度分析

SELECT feature_name, COUNT(DISTINCT subscription_id) AS user_count, SUM(usage_count) AS total_usage, ROUND(AVG(usage_duration_secs) / 60.0, 1) AS avg_minutes, SUM(error_count) AS total_errors, ROUND(100.0 * SUM(CAST(is_beta_feature AS INT)) / COUNT(*), 1) AS beta_pct FROM best_practice_saas_dw.ods_feature_usage GROUP BY feature_name ORDER BY user_count DESC LIMIT 6;

feature_name | user_count | total_usage | avg_minutes | total_errors | beta_pct -------------+------------+-------------+-------------+--------------+--------- feature_26 | 144 | 1438 | 52.9 | 101 | 11.8 feature_12 | 140 | 1358 | 47.6 | 81 | 11.4 feature_10 | 140 | 1456 | 50.1 | 84 | 8.3 feature_6 | 139 | 1423 | 49.2 | 55 | 13.5 feature_17 | 138 | 1388 | 52.1 | 97 | 7.9 feature_3 | 135 | 1327 | 52.6 | 80 | 8.8

feature_26
feature_26
用户覆盖最广(144 个订阅),平均使用时长也最高(52.9 分钟),是核心功能。其 Beta 占比 11.8%,说明有相当比例的用户在尝鲜版本中使用,可以重点监测 Beta 版错误率,及时修复。


Semantic View:为 Analytics Agent 封装业务语义

Semantic View 将 ADS 层的健康评分表封装为面向业务的视图,添加

risk_signal
risk_signal
字段将复杂条件归纳为单一信号,Analytics Agent 可以直接用自然语言查询"哪些租户面临流失风险"。

CREATE OR REPLACE VIEW best_practice_saas_dw.v_tenant_churn_risk AS SELECT account_id, account_name, industry, country, plan_tier, seats, usage_month, ROUND(mrr_amount, 2) AS mrr_usd, distinct_features_used, total_usage_count, total_errors, beta_features_used, health_score, health_tier, churn_flag, churn_reason, CASE WHEN churn_flag = 1 AND churn_reason IS NOT NULL THEN churn_reason WHEN health_tier = 'CHURN_RISK' AND churn_flag = 0 THEN 'predicted_risk' ELSE 'stable' END AS risk_signal FROM best_practice_saas_dw.ads_tenant_health_score;

查询各流失信号的规模和平均 MRR(用于优先级排序):

SELECT risk_signal, COUNT(*) AS cnt, ROUND(AVG(mrr_usd), 0) AS avg_mrr FROM best_practice_saas_dw.v_tenant_churn_risk GROUP BY risk_signal ORDER BY cnt DESC;

risk_signal | cnt | avg_mrr ---------------+------+-------- predicted_risk | 2724 | 2351 stable | 1279 | 2828 support | 12 | 3098 budget | 11 | 2496 pricing | 7 | 5459 competitor | 5 | 1003 unknown | 4 | 2289 features | 4 | 7096

结果解读

pricing
pricing
流失的平均 MRR 高达 5,459 美元,
features
features
流失的更高(7,096 美元),说明高价值客户流失的主因是产品功能竞争力和价格敏感性,而非支持质量问题。客户成功团队应优先接触这两类风险客户,提供针对性的留存方案。


RBAC 权限配置

不同角色对租户数据的权限需求不同:

角色可访问范围限制
数据分析师所有 ADS / DWS 聚合数据无法查看 ODS 原始账号名(脱敏后可见)
客户成功(CS)
v_tenant_churn_risk
v_tenant_churn_risk
,含真实账号名
需要管理员授权
销售
v_tenant_churn_risk
v_tenant_churn_risk
,仅自己负责的客户
行级过滤由业务层实现
运营管理员所有层,含 ODS 原始数据无限制

通过 GRANT 控制视图访问:

-- 授权 CS 角色查询 Semantic View GRANT SELECT ON VIEW best_practice_saas_dw.v_tenant_churn_risk TO ROLE cs_team; -- 授权分析师查询 DWS / ADS 层 GRANT SELECT ON DYNAMIC TABLE best_practice_saas_dw.dws_tenant_monthly_metrics TO ROLE analyst; GRANT SELECT ON DYNAMIC TABLE best_practice_saas_dw.ads_tenant_health_score TO ROLE analyst; -- ODS 层仅管理员可查 GRANT SELECT ON TABLE best_practice_saas_dw.ods_accounts TO ROLE admin; GRANT SELECT ON TABLE best_practice_saas_dw.ods_subscriptions TO ROLE admin;


数仓对象总览

SHOW TABLES IN best_practice_saas_dw;

schema_name | table_name | is_view | is_dynamic ------------------------+----------------------------+---------+----------- best_practice_saas_dw | ods_accounts | false | false best_practice_saas_dw | ods_subscriptions | false | false best_practice_saas_dw | ods_feature_usage | false | false best_practice_saas_dw | ods_churn_events | false | false best_practice_saas_dw | ods_kafka_raw_usage | false | false best_practice_saas_dw | dwd_tenant_feature_usage | false | true best_practice_saas_dw | dws_tenant_monthly_metrics | false | true best_practice_saas_dw | ads_tenant_health_score | false | true best_practice_saas_dw | v_tenant_churn_risk | true | false


注意事项

  • Dynamic Table 不写 REFRESH INTERVAL:DDL 中不包含

    REFRESH INTERVAL
    REFRESH INTERVAL
    参数,刷新调度统一通过 Studio Task 管理。这样可以在同一任务上附加监控告警和数据质量检查,也方便修改调度周期而无需重建表。

  • Column Masking 对 Dynamic Table 透明生效:DWD 层 JOIN

    ods_accounts
    ods_accounts
    时,非特权用户查询到的
    account_name
    account_name
    已是脱敏后的值(
    Com****
    Com****
    ),DWD / DWS / ADS 中存储的也是脱敏后的内容。若 CS 角色需要看原始账号名,建议直接查询
    v_tenant_churn_risk
    v_tenant_churn_risk
    (CS 角色已被授权该视图)并在视图定义中对
    account_name
    account_name
    不加脱敏函数。

  • 健康评分 UDF 参数类型

    calc_tenant_health_score
    calc_tenant_health_score
    定义为 DOUBLE 参数。在 Dynamic Table 的 SELECT 子句中调用时,DWS 表的
    distinct_features_used
    distinct_features_used
    upgrade_flag
    upgrade_flag
    downgrade_flag
    downgrade_flag
    是 BIGINT 类型,需要
    CAST(... AS INT)
    CAST(... AS INT)
    后传入,否则会因类型不匹配报解析错误。

  • Kafka PIPE 位点管理:PIPE 停止期间发生的消息不会丢失,重启后会从上次位点继续消费。若需要重置消费位点(如历史数据补录),需要在 Studio 任务中配置起始偏移量,或重建 PIPE。

  • Dynamic Table 增量刷新限制:ODS 层如果使用

    INSERT OVERWRITE
    INSERT OVERWRITE
    全量替换,会导致 Dynamic Table 退化为全量刷新。应使用
    INSERT INTO
    INSERT INTO
    追加写入,配合 CDC 保留
    ingest_time
    ingest_time
    字段用于增量识别。

  • Column Masking:绑定

    SET MASK
    SET MASK
    后,所有通过标准 SQL 查询该列的用户都会经过脱敏,包括 Dynamic Table、视图和临时查询。


相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询