会话分析 (Sessionization) 指南


快速选型

分析类型业务问题核心技法复杂度
会话切分如何把连续事件分组为独立会话?
LAG
LAG
+
TIMESTAMPDIFF
TIMESTAMPDIFF
+
SUM
SUM
⭐⭐⭐
会话聚合平均会话时长、页面数是多少?
MIN/MAX
MIN/MAX
+
COUNT
COUNT
+
GROUP BY
GROUP BY
⭐⭐
跳出率多少用户只看了一页就离开?
CASE WHEN COUNT = 1
CASE WHEN COUNT = 1
⭐⭐
路径分析用户在会话中访问了哪些页面?
GROUP_CONCAT
GROUP_CONCAT
+
ORDER BY
ORDER BY
⭐⭐
转化分析多少会话完成了目标行为?
MAX(CASE WHEN event_type = ...)
MAX(CASE WHEN event_type = ...)
⭐⭐

测试数据

本文所有示例基于以下用户行为事件流:

-- 用户行为事件表 CREATE TABLE user_events ( user_id BIGINT, event_time TIMESTAMP_LTZ, page VARCHAR, event_type VARCHAR -- view, add_to_cart, complete, etc. );

模拟 3 个用户,共 18 个事件,覆盖 5 个会话(含跳出会话和转化会话)。


1. 会话切分(核心)

会话定义

  • 会话超时: 用户连续 30 分钟无活动,则下次活动视为新会话开始
  • 会话 ID: 按用户分区,从 1 开始递增

四步切分法

WITH events AS ( SELECT user_id, CAST(event_time AS TIMESTAMP_LTZ) AS event_time, page, event_type FROM user_events ), -- Step 1: 获取上一个事件时间 with_prev AS ( SELECT user_id, event_time, page, event_type, LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event_time FROM events ), -- Step 2: 计算时间间隔,标记新会话开始 with_gap AS ( SELECT *, CASE WHEN prev_event_time IS NULL THEN 1 -- 第一个事件,新会话 WHEN TIMESTAMPDIFF(MINUTE, prev_event_time, event_time) > 30 THEN 1 -- 超时,新会话 ELSE 0 END AS is_new_session FROM with_prev ), -- Step 3: 累积标记生成会话 ID with_session AS ( SELECT *, SUM(is_new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id FROM with_gap ) SELECT user_id, session_id, event_time, page, event_type FROM with_session ORDER BY user_id, session_id, event_time;

输出示例:

user_idsession_idevent_timepageevent_type
112026-03-01 10:00:00homeview
112026-03-01 10:05:00productsview
112026-03-01 10:15:00product_detailview
122026-03-01 14:00:00homeview
122026-03-01 14:10:00cartview
212026-03-01 09:00:00landingview
222026-03-01 11:00:00homeview

关键说明

  • TIMESTAMPDIFF(MINUTE, start, end)
    TIMESTAMPDIFF(MINUTE, start, end)
    返回两个时间戳之间的分钟差
  • DATEDIFF
    DATEDIFF
    TIMESTAMP_LTZ
    TIMESTAMP_LTZ
    类型默认返回天数差(不是分钟),必须用
    TIMESTAMPDIFF
    TIMESTAMPDIFF
  • 会话超时阈值可根据业务调整:电商常用 30 分钟,内容平台可用 15 分钟

2. 会话聚合指标

SQL 实现

-- 在会话切分基础上聚合 SELECT user_id, session_id, MIN(event_time) AS session_start, MAX(event_time) AS session_end, TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS session_duration_min, COUNT(*) AS page_views, COUNT(DISTINCT page) AS unique_pages, -- 跳出会话: 只有 1 个页面视图 CASE WHEN COUNT(*) = 1 THEN 1 ELSE 0 END AS is_bounce, -- 转化会话: 包含目标事件 MAX(CASE WHEN event_type IN ('add_to_cart', 'complete') THEN 1 ELSE 0 END) AS is_converted FROM with_session GROUP BY user_id, session_id ORDER BY user_id, session_id;

输出示例:

user_idsession_idsession_startsession_endduration_minpage_viewsis_bounceis_converted
112026-03-01 10:00:002026-03-01 10:15:0015300
122026-03-01 14:00:002026-03-01 14:10:0010200
212026-03-01 09:00:002026-03-01 09:00:000110
222026-03-01 11:00:002026-03-01 11:12:0012501
312026-03-01 08:00:002026-03-01 08:30:0030701

3. 整体会话统计

SQL 实现

SELECT COUNT(*) AS total_sessions, COUNT(DISTINCT user_id) AS unique_users, ROUND(AVG(page_views), 1) AS avg_pages_per_session, ROUND(AVG(session_duration_min), 1) AS avg_duration_min, ROUND(SUM(CASE WHEN page_views = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS bounce_rate_pct, ROUND(SUM(is_converted) * 100.0 / COUNT(*), 1) AS conversion_rate_pct FROM session_agg;

输出示例:

total_sessionsunique_usersavg_pagesavg_duration_minbounce_rateconversion_rate
533.613.420.040.0

4. 用户路径分析

SQL 实现

SELECT user_id, session_id, GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ') AS page_path, COUNT(*) AS path_length FROM with_session GROUP BY user_id, session_id ORDER BY user_id, session_id;

输出示例:

user_idsession_idpage_pathpath_length
11home → products → product_detail3
12home → cart2
21landing1
22home → products → product_detail → cart → checkout5
31home → products → product_detail → reviews → cart → checkout7

常见路径模式统计

-- 统计最常见的 5 条路径 SELECT page_path, COUNT(*) AS session_count, COUNT(DISTINCT user_id) AS unique_users FROM ( SELECT user_id, session_id, GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ') AS page_path FROM with_session GROUP BY user_id, session_id ) GROUP BY page_path ORDER BY session_count DESC LIMIT 5;


5. 会话质量分层

分层标准

层级页面数时长定义
跳出10 分钟只看一页即离开
浅层2-3< 5 分钟快速浏览
中等4-65-15 分钟有一定参与
深度7+> 15 分钟深度参与

SQL 实现

SELECT user_id, session_id, page_views, session_duration_min, CASE WHEN page_views = 1 THEN 'bounce' WHEN page_views <= 3 AND session_duration_min < 5 THEN 'shallow' WHEN page_views <= 6 AND session_duration_min < 15 THEN 'medium' ELSE 'deep' END AS session_quality FROM session_agg ORDER BY user_id, session_id;

质量分布统计

SELECT session_quality, COUNT(*) AS session_count, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct FROM ( SELECT CASE WHEN page_views = 1 THEN 'bounce' WHEN page_views <= 3 AND session_duration_min < 5 THEN 'shallow' WHEN page_views <= 6 AND session_duration_min < 15 THEN 'medium' ELSE 'deep' END AS session_quality FROM session_agg ) GROUP BY session_quality ORDER BY session_count DESC;


6. 进阶场景

场景 1: 按来源渠道分析会话质量

SELECT u.traffic_source, COUNT(DISTINCT s.session_id) AS sessions, ROUND(AVG(s.page_views), 1) AS avg_pages, ROUND(SUM(CASE WHEN s.page_views = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS bounce_rate FROM session_agg s JOIN users u ON s.user_id = u.user_id GROUP BY u.traffic_source ORDER BY bounce_rate;

场景 2: 会话内事件序列标记

-- 标记每个事件在会话中的位置(第一步、第二步...) SELECT user_id, session_id, event_time, page, ROW_NUMBER() OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS step_number, CASE WHEN event_type = 'complete' THEN 'conversion' WHEN event_type = 'add_to_cart' THEN 'intent' ELSE 'browse' END AS event_category FROM with_session ORDER BY user_id, session_id, step_number;

场景 3: 跨会话用户行为分析

-- 统计每个用户的总会话数、平均会话间隔 WITH session_times AS ( SELECT user_id, session_id, MIN(event_time) AS session_start FROM with_session GROUP BY user_id, session_id ), session_gaps AS ( SELECT user_id, session_id, session_start, LAG(session_start) OVER (PARTITION BY user_id ORDER BY session_start) AS prev_session_start, TIMESTAMPDIFF(HOUR, LAG(session_start) OVER (PARTITION BY user_id ORDER BY session_start), session_start) AS hours_between_sessions FROM session_times ) SELECT user_id, COUNT(*) AS total_sessions, ROUND(AVG(hours_between_sessions), 1) AS avg_hours_between_sessions FROM session_gaps WHERE hours_between_sessions IS NOT NULL GROUP BY user_id;


常见问题

1.
DATEDIFF
DATEDIFF
vs
TIMESTAMPDIFF
TIMESTAMPDIFF

-- 错误: DATEDIFF 对 TIMESTAMP_LTZ 返回天数差(分钟级事件会显示 0) DATEDIFF(event_time, prev_event_time) -- 返回 0 -- 正确: 使用 TIMESTAMPDIFF 指定单位 TIMESTAMPDIFF(MINUTE, prev_event_time, event_time) -- 返回 5 TIMESTAMPDIFF(SECOND, prev_event_time, event_time) -- 返回 300 TIMESTAMPDIFF(HOUR, prev_event_time, event_time) -- 返回 0

2. 会话超时阈值选择

业务类型推荐阈值原因
电商30 分钟用户可能离开去比较价格
内容/新闻15 分钟阅读完成后即离开
SaaS 工具60 分钟用户可能长时间操作
游戏10 分钟活跃度高,间隔短

3. 单页会话的时长计算

-- 跳出会话的 session_start = session_end,时长为 0 -- 这是正确的,不应使用 TIMESTAMPDIFF 计算相邻事件间隔作为会话时长 -- 应使用 MIN/MAX 计算首尾事件差值 TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) -- 正确

4.
GROUP_CONCAT
GROUP_CONCAT
排序

-- 错误: 不指定 ORDER BY,路径顺序不确定 GROUP_CONCAT(page SEPARATOR ' → ') -- 正确: 按时间排序保证路径顺序 GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ')

5. 大规模数据性能

-- 优化: 先按日期分区过滤,再会话切分 WITH filtered_events AS ( SELECT * FROM user_events WHERE event_time >= '2026-03-01' AND event_time < '2026-03-02' ), -- 后续会话切分逻辑不变...


性能优化建议

场景优化策略
大数据量会话切分按天分区处理,避免全表窗口计算
高频查询将会话聚合结果物化为 Dynamic Table
路径分析
(user_id, event_time)
(user_id, event_time)
建 Bloom Filter 索引
跨天会话使用
DATE_TRUNC('DAY', event_time)
DATE_TRUNC('DAY', event_time)
作为额外分区键

-- 推荐: 将会话切分逻辑封装为 Dynamic Table CREATE DYNAMIC TABLE dws_user_sessions REFRESH INTERVAL 1 HOUR AS WITH with_prev AS (...), with_gap AS (...), with_session AS (...) SELECT user_id, session_id, event_time, page, event_type FROM with_session;

联系我们
预约咨询
微信咨询
电话咨询