留存与同期群分析指南


快速选型

分析类型业务问题核心技法复杂度
基础留存注册后第 N 天还有多少用户回来?
DATEDIFF
DATEDIFF
+ 条件聚合
⭐⭐
同期群矩阵不同月份注册的用户留存趋势对比?
DATE_TRUNC
DATE_TRUNC
+ 行转列
⭐⭐⭐
连续留存用户最长连续活跃几天?
ROW_NUMBER
ROW_NUMBER
+ 差值分组
⭐⭐⭐
流失分析哪些用户 N 天没来了?
MAX
MAX
+
DATEDIFF
DATEDIFF
+ 阈值分类
⭐⭐
留存曲线留存率随时间衰减的趋势?周/月聚合 + 折线图数据⭐⭐

测试数据

本文所有示例基于以下用户行为数据:

-- 用户注册表 CREATE TABLE users ( user_id BIGINT, register_date VARCHAR -- 实际生产中建议用 DATE 类型 ); -- 用户活跃日志表 CREATE TABLE activity ( user_id BIGINT, activity_date VARCHAR );

模拟 3 个同期群(1 月、2 月、3 月各注册 5 人),共 15 用户,活跃记录覆盖 3 个月。


1. 基础留存:N 日留存率

留存定义

  • 第 0 日留存: 注册当天活跃的用户(应为 100%)
  • 第 N 日留存: 注册后第 N 天仍有活跃的用户数 / 当期注册用户数

SQL 实现

WITH retention_base AS ( SELECT u.user_id, u.register_date, a.activity_date, DATEDIFF(a.activity_date, u.register_date) AS days_since_register FROM users u JOIN activity a ON u.user_id = a.user_id ), cohort_size AS ( SELECT DATE_TRUNC('MONTH', CAST(register_date AS DATE)) AS cohort_month, COUNT(DISTINCT user_id) AS cohort_users FROM users GROUP BY 1 ), retention_by_day AS ( SELECT DATE_TRUNC('MONTH', CAST(rb.register_date AS DATE)) AS cohort_month, rb.days_since_register, COUNT(DISTINCT rb.user_id) AS retained_users FROM retention_base rb GROUP BY 1, 2 ) SELECT r.cohort_month, r.days_since_register, r.retained_users, c.cohort_users, ROUND(r.retained_users * 100.0 / c.cohort_users, 1) AS retention_pct FROM retention_by_day r JOIN cohort_size c ON r.cohort_month = c.cohort_month WHERE r.days_since_register IN (0, 1, 3, 7, 14, 30) ORDER BY r.cohort_month, r.days_since_register;

输出示例:

cohort_monthdays_since_registerretained_userscohort_usersretention_pct
2026-01-01055100.0
2026-01-01155100.0
2026-01-0132540.0
2026-01-0173560.0
2026-02-01055100.0
2026-02-0174580.0

关键说明

  • DATEDIFF(end, start)
    DATEDIFF(end, start)
    返回两个日期之间的天数差
  • DATE_TRUNC('MONTH', date)
    DATE_TRUNC('MONTH', date)
    将日期截断到月初,用于同期群分组
  • 留存天数选择应根据业务周期:游戏看 1/3/7 日,SaaS 看 7/14/30 日

2. 同期群留存矩阵(行转列)

适用场景

BI 报表中常见的留存矩阵:行 = 同期群,列 = 留存天数,单元格 = 留存率。

SQL 实现

-- 在基础留存查询基础上,使用 CASE 行转列 SELECT r.cohort_month, c.cohort_users, MAX(CASE WHEN r.days_since_register = 0 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_0, MAX(CASE WHEN r.days_since_register = 1 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_1, MAX(CASE WHEN r.days_since_register = 3 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_3, MAX(CASE WHEN r.days_since_register = 7 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_7, MAX(CASE WHEN r.days_since_register = 14 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_14, MAX(CASE WHEN r.days_since_register = 30 THEN ROUND(r.retained_users * 100.0 / c.cohort_users, 1) END) AS day_30 FROM retention_by_day r JOIN cohort_size c ON r.cohort_month = c.cohort_month GROUP BY r.cohort_month, c.cohort_users ORDER BY r.cohort_month;

输出示例:

cohort_monthcohort_usersday_0day_1day_3day_7day_14day_30
2026-01-0151001004060NULLNULL
2026-02-015100100208020NULL
2026-03-015100100206020NULL

动态列数

如需支持更多留存天数(如 60/90 日),只需在

SELECT
SELECT
中追加
MAX(CASE WHEN ...)
MAX(CASE WHEN ...)
列。


3. 连续留存:最长连续活跃天数

核心逻辑

使用窗口函数

ROW_NUMBER()
ROW_NUMBER()
标记连续区间:

  • 如果日期连续,则
    activity_date - row_number
    activity_date - row_number
    的差值相同
  • 同一差值即为同一连续区间

SQL 实现

WITH ranked AS ( SELECT user_id, CAST(activity_date AS DATE) AS activity_date, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) AS rn FROM activity ), grouped AS ( SELECT user_id, activity_date, rn, -- 核心技巧: 用 DATEDIFF 计算相对于固定基准日的天数差,再减去 rn DATEDIFF(activity_date, CAST('2000-01-01' AS DATE)) - rn AS grp FROM ranked ), streaks AS ( SELECT user_id, grp, MIN(activity_date) AS streak_start, MAX(activity_date) AS streak_end, COUNT(*) AS streak_length FROM grouped GROUP BY user_id, grp ), max_streaks AS ( SELECT user_id, MAX(streak_length) AS max_consecutive_days FROM streaks GROUP BY user_id ) SELECT ms.user_id, ms.max_consecutive_days, s.streak_start, s.streak_end FROM max_streaks ms JOIN streaks s ON ms.user_id = s.user_id AND ms.max_consecutive_days = s.streak_length ORDER BY ms.max_consecutive_days DESC, ms.user_id;

输出示例:

user_idmax_consecutive_daysstreak_startstreak_end
122026-01-052026-01-06
222026-01-102026-01-11
............

扩展:连续 N 天活跃用户数

-- 统计连续活跃 ≥ 3 天的用户数 SELECT COUNT(DISTINCT user_id) AS users_with_3day_streak FROM streaks WHERE streak_length >= 3;


4. 流失分析:N 天未活跃用户识别

用户状态分类

状态定义运营动作
active7 天内有活跃保持体验
warning7-13 天未活跃推送提醒
at_risk14-29 天未活跃优惠挽回
churned30+ 天未活跃召回活动

SQL 实现

-- 假设分析基准日为 2026-03-25 WITH user_last_active AS ( SELECT u.user_id, u.register_date, MAX(a.activity_date) AS last_active_date, DATEDIFF(CAST('2026-03-25' AS DATE), CAST(MAX(a.activity_date) AS DATE)) AS days_since_active FROM users u LEFT JOIN activity a ON u.user_id = a.user_id GROUP BY u.user_id, u.register_date ) SELECT user_id, register_date, last_active_date, days_since_active, CASE WHEN days_since_active >= 30 THEN 'churned' WHEN days_since_active >= 14 THEN 'at_risk' WHEN days_since_active >= 7 THEN 'warning' ELSE 'active' END AS user_status FROM user_last_active ORDER BY days_since_active DESC;

输出示例:

user_idregister_datelast_active_datedays_since_activeuser_status
52026-01-252026-01-2658churned
42026-01-202026-01-2757churned
82026-02-102026-02-2528at_risk
122026-03-052026-03-205active

流失率统计

SELECT user_status, COUNT(*) AS user_count, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct FROM user_last_active GROUP BY user_status ORDER BY user_count DESC;


5. 留存曲线:按周聚合的留存趋势

适用场景

生成折线图数据,展示留存率随时间衰减的趋势。

SQL 实现

WITH retention_base AS ( SELECT u.user_id, u.register_date, a.activity_date, DATEDIFF(a.activity_date, u.register_date) AS days_since_register FROM users u JOIN activity a ON u.user_id = a.user_id ), cohort_size AS ( SELECT DATE_TRUNC('MONTH', CAST(register_date AS DATE)) AS cohort_month, COUNT(DISTINCT user_id) AS cohort_users FROM users GROUP BY 1 ), retention_by_week AS ( SELECT DATE_TRUNC('MONTH', CAST(rb.register_date AS DATE)) AS cohort_month, FLOOR(rb.days_since_register / 7.0) AS week_number, COUNT(DISTINCT rb.user_id) AS retained_users FROM retention_base rb GROUP BY 1, 2 ) SELECT r.cohort_month, r.week_number, r.retained_users, c.cohort_users, ROUND(r.retained_users * 100.0 / c.cohort_users, 1) AS retention_pct FROM retention_by_week r JOIN cohort_size c ON r.cohort_month = c.cohort_month ORDER BY r.cohort_month, r.week_number;

输出示例:

cohort_monthweek_numberretained_userscohort_usersretention_pct
2026-01-01055100.0
2026-01-0114580.0
2026-01-0122540.0
2026-01-0132540.0
2026-02-01055100.0
2026-02-0114580.0

可视化建议

  • X 轴:
    week_number
    week_number
    (周数)
  • Y 轴:
    retention_pct
    retention_pct
    (留存率)
  • 图例:
    cohort_month
    cohort_month
    (不同颜色代表不同注册月份)

6. 高级场景:同期群 + 行为特征交叉分析

场景:不同渠道用户的留存差异

SELECT u.channel, DATE_TRUNC('MONTH', CAST(u.register_date AS DATE)) AS cohort_month, COUNT(DISTINCT u.user_id) AS cohort_users, COUNT(DISTINCT CASE WHEN DATEDIFF(a.activity_date, u.register_date) = 7 THEN u.user_id END) AS day_7_retained, ROUND( COUNT(DISTINCT CASE WHEN DATEDIFF(a.activity_date, u.register_date) = 7 THEN u.user_id END) * 100.0 / COUNT(DISTINCT u.user_id), 1 ) AS day_7_retention_pct FROM users u LEFT JOIN activity a ON u.user_id = a.user_id GROUP BY u.channel, DATE_TRUNC('MONTH', CAST(u.register_date AS DATE)) ORDER BY cohort_month, day_7_retention_pct DESC;

场景:同期群 LTV(生命周期价值)

SELECT DATE_TRUNC('MONTH', CAST(u.register_date AS DATE)) AS cohort_month, COUNT(DISTINCT u.user_id) AS cohort_users, SUM(o.order_amount) AS total_revenue, ROUND(SUM(o.order_amount) / COUNT(DISTINCT u.user_id), 2) AS avg_ltv FROM users u LEFT JOIN orders o ON u.user_id = o.user_id GROUP BY 1 ORDER BY 1;


常见问题

1.
DATEDIFF
DATEDIFF
参数顺序

-- 正确: DATEDIFF(end, start) DATEDIFF('2026-03-25', '2026-01-01') -- 返回 83 -- 错误: 参数反了会得到负数 DATEDIFF('2026-01-01', '2026-03-25') -- 返回 -83

2.
DATE_ADD
DATE_ADD
语法

-- 正确: DATE_ADD(date, INTERVAL n unit) DATE_ADD(CAST('2026-01-10' AS DATE), INTERVAL 3 DAY) -- 错误: 不支持 3 参数形式 DATE_ADD('2026-01-10', 3, 'DAY') -- 语法错误

3. 连续区间计算的基准日

-- 正确: 使用固定基准日(如 '2000-01-01') DATEDIFF(activity_date, CAST('2000-01-01' AS DATE)) - rn AS grp -- 错误: 使用动态基准日会导致分组错误 DATEDIFF(activity_date, MIN(activity_date)) - rn -- 每组基准不同

4. 留存率分母为零

-- 安全写法: 使用 NULLIF 避免除零错误 ROUND(retained_users * 100.0 / NULLIF(cohort_users, 0), 1)

5. 同期群分组粒度

-- 按月分组(推荐) DATE_TRUNC('MONTH', CAST(register_date AS DATE)) -- 按周分组 DATE_TRUNC('WEEK', CAST(register_date AS DATE)) -- 按天分组(数据量大时不推荐) CAST(register_date AS DATE)


性能优化建议

场景优化策略
大数据量留存计算先按
user_id
user_id
去重活跃日志,再 JOIN
同期群矩阵
retention_base
retention_base
物化为 Dynamic Table
连续活跃统计
activity
activity
表按
(user_id, activity_date)
(user_id, activity_date)
建 Bloom Filter 索引
流失分析基准日参数化,避免硬编码

-- 优化示例: 先去重再计算 WITH distinct_activity AS ( SELECT DISTINCT user_id, CAST(activity_date AS DATE) AS activity_date FROM activity ), retention_base AS ( SELECT u.user_id, u.register_date, a.activity_date, DATEDIFF(a.activity_date, u.register_date) AS days_since_register FROM users u JOIN distinct_activity a ON u.user_id = a.user_id ) -- 后续逻辑不变...

联系我们
预约咨询
微信咨询
电话咨询