HR 员工全生命周期数仓实践

整合 HRIS 系统、绩效评估和考勤数据,构建覆盖员工入职到离职全链路的分析数仓,输出离职风险预测、薪酬公平分析和组织效能诊断报告。本文以 Kaggle HR Analytics 数据集(1470 条员工记录)为基础,端到端演示 ODS → DWD → DWS → ADS 四层架构,涵盖 Column Masking(薪酬字段脱敏)、Dynamic Table(月度指标自动聚合)和窗口函数(LEAD/LAG 晋升分析)的完整落地方法。


概述

HR 数据仓库的核心挑战是数据高度敏感(薪酬、绩效、离职原因)同时又需要支持不同角色的分析需求(HR BP 看全量、员工看自己、分析师看脱敏数据)。云器 Lakehouse 通过以下组合解决这几个核心问题:

问题解决方案
薪酬、绩效字段高度敏感,不同角色看到的数据范围不同Column Masking,绑定到列,非特权用户自动得到脱敏值
ODS → DWD → ADS 多层聚合需要按月自动刷新Dynamic Table,声明式 SQL,系统自动维护上下游依赖链
需要识别晋升停滞员工和薪酬低于同级基准的风险窗口函数
RANK / AVG OVER / LAG
RANK / AVG OVER / LAG
,在 DWD 层直接计算
刷新调度需要附加数据质量规则和告警Studio Task 管理调度,同一任务上可附加告警和质检规则
高基数员工 ID 点查频繁按需添加 Bloomfilter Index,加速部门过滤

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
建 ODS 原始员工表普通静态表,作为 Dynamic Table 上游
CREATE FUNCTION
CREATE FUNCTION
创建薪酬脱敏 UDF
mask_salary
mask_salary
用于 Column Masking 策略绑定
ALTER TABLE ... CHANGE COLUMN ... SET MASK
ALTER TABLE ... CHANGE COLUMN ... SET MASK
绑定 Column Masking 到
monthly_income
monthly_income
非特权用户查询自动返回 -1
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 DWD / DWS / ADS 层增量计算表系统自动识别上游变更并增量刷新
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新初次构建或调试时使用
AVG() OVER (PARTITION BY ...)
AVG() OVER (PARTITION BY ...)
计算同部门同级别薪酬均值用于薪酬公平分析
RANK() OVER (ORDER BY ...)
RANK() OVER (ORDER BY ...)
计算员工在部门内的薪酬排名用于薪酬竞争力报告
LAG()
LAG()
计算晋升间隔相对变化识别晋升停滞员工

前置准备

本文所有示例在

best_practice_hr_analytics
best_practice_hr_analytics
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_hr_analytics;

数据集来自 Kaggle HR Analytics Case Study(Apache 2.0),共 1470 条记录、35 列,覆盖员工年龄、部门、职级、年薪、绩效评分、离职标签等核心字段。本文取前 40 行作为演示数据集。


ODS 层:原始员工数据

建表

CREATE TABLE IF NOT EXISTS best_practice_hr_analytics.doc_ods_employees ( emp_id INT, age INT, attrition STRING, -- 'Yes'/'No',是否已离职 business_travel STRING, daily_rate INT, department STRING, distance_from_home INT, education INT, -- 1=Below College ... 5=Doctor education_field STRING, employee_number INT, env_satisfaction INT, -- 1-4 gender STRING, hourly_rate INT, job_involvement INT, -- 1-4 job_level INT, -- 1=Entry ... 5=C-Level job_role STRING, job_satisfaction INT, -- 1=Low 4=Very High marital_status STRING, monthly_income INT, -- 敏感字段,将绑定 Column Masking monthly_rate INT, num_companies_worked INT, overtime STRING, -- 'Yes'/'No' pct_salary_hike INT, -- 上次调薪幅度(%) performance_rating INT, -- 3=Excellent 4=Outstanding relationship_satisfaction INT, -- 1-4 stock_option_level INT, total_working_years INT, training_times_last_year INT, work_life_balance INT, -- 1-4 years_at_company INT, years_in_current_role INT, years_since_last_promo INT, years_with_curr_mgr INT );

导入数据

本文以本地 CSV 文件(从 Kaggle 下载后取前 40 行)通过 INSERT 方式载入。生产环境可使用 MySQL 批量同步或 COPY INTO 从 Volume 导入完整数据集:

kaggle datasets download -d bhanupratapbiswas/hr-analytics-case-study \ --unzip -p /tmp/hr_analytics/

从本地 CSV 导入数据(推荐):

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_hr_analytics.doc_ods_employees FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('data.csv');

也可直接内联插入小批量测试数据(不需要 CSV 文件):

-- 通过 INSERT INTO 载入前 40 行演示数据(实际执行,共 40 行) INSERT INTO best_practice_hr_analytics.doc_ods_employees VALUES (1,41,'Yes','Travel_Rarely',1102,'Sales',1,2,'Life Sciences',1,2, 'Female',94,3,2,'Sales Executive',4,'Single',5993,19479,8,'Yes',11,3,1,0,8,0,1,6,4,0,5), (2,49,'No','Travel_Frequently',279,'Research & Development',8,1,'Life Sciences',2,3, 'Male',61,2,2,'Research Scientist',2,'Married',5130,24907,1,'No',23,4,4,1,10,3,3,10,7,1,7), (3,37,'Yes','Travel_Rarely',1373,'Research & Development',2,2,'Other',4,4, 'Male',92,2,1,'Laboratory Technician',3,'Single',2090,2396,6,'Yes',15,3,2,0,7,3,3,0,0,0,0), (4,33,'No','Travel_Frequently',1392,'Research & Development',3,4,'Life Sciences',5,4, 'Female',56,3,1,'Research Scientist',3,'Married',2909,23159,1,'Yes',11,3,3,0,8,3,3,8,7,3,0), (5,27,'No','Travel_Rarely',591,'Research & Development',2,1,'Medical',7,1, 'Male',40,3,1,'Laboratory Technician',2,'Married',3468,16632,9,'No',12,3,4,1,6,3,3,2,2,2,2), (6,32,'No','Travel_Frequently',1005,'Research & Development',2,2,'Life Sciences',8,4, 'Male',79,3,1,'Laboratory Technician',4,'Single',3068,11864,0,'No',13,3,3,0,8,2,2,7,7,3,6), (7,59,'No','Travel_Rarely',1324,'Research & Development',3,3,'Medical',10,3, 'Female',81,4,1,'Laboratory Technician',1,'Married',2670,9964,4,'Yes',20,4,1,3,12,3,2,1,0,0,0), (8,30,'No','Travel_Rarely',1358,'Research & Development',24,1,'Life Sciences',11,4, 'Male',67,3,1,'Laboratory Technician',3,'Divorced',2693,13335,1,'No',22,4,2,1,1,2,3,1,0,0,0), (9,38,'No','Travel_Frequently',216,'Research & Development',23,3,'Life Sciences',12,4, 'Male',44,2,3,'Manufacturing Director',3,'Single',9526,8787,0,'No',21,4,2,0,10,2,3,9,7,1,8), (10,36,'No','Travel_Rarely',1299,'Research & Development',27,3,'Medical',13,3, 'Male',94,3,2,'Healthcare Representative',3,'Married',5237,16577,6,'No',13,3,2,2,17,3,2,7,7,7,7), (11,35,'No','Travel_Rarely',809,'Research & Development',16,3,'Medical',14,1, 'Male',84,4,1,'Laboratory Technician',2,'Married',2426,16479,0,'No',13,3,3,1,6,5,3,5,4,0,3), (12,29,'No','Travel_Rarely',153,'Research & Development',15,2,'Life Sciences',15,4, 'Female',49,2,2,'Laboratory Technician',3,'Single',4193,12682,0,'Yes',12,3,4,0,10,3,3,9,5,0,8), (13,31,'No','Travel_Rarely',670,'Research & Development',26,1,'Life Sciences',16,1, 'Male',31,3,1,'Research Scientist',3,'Divorced',2911,15170,1,'No',17,3,4,1,5,1,2,5,2,4,3), (14,34,'No','Travel_Rarely',1346,'Research & Development',19,2,'Medical',18,2, 'Male',93,3,1,'Laboratory Technician',4,'Divorced',2661,8758,0,'No',11,3,3,1,3,2,3,2,2,1,2), (15,28,'Yes','Travel_Rarely',103,'Research & Development',24,3,'Life Sciences',19,3, 'Male',50,2,1,'Laboratory Technician',3,'Single',2028,12947,5,'Yes',14,3,2,0,6,4,3,4,2,0,3), (16,29,'No','Travel_Rarely',1389,'Research & Development',21,4,'Life Sciences',20,2, 'Female',51,4,3,'Manufacturing Director',1,'Divorced',9980,10195,1,'No',11,3,3,1,10,1,3,10,9,8,8), (17,32,'No','Travel_Rarely',334,'Research & Development',5,2,'Life Sciences',21,1, 'Male',80,4,1,'Research Scientist',2,'Divorced',3298,15053,0,'Yes',12,3,4,2,7,5,2,6,2,0,5), (18,22,'No','Non-Travel',1123,'Research & Development',16,2,'Medical',22,4, 'Male',96,4,1,'Laboratory Technician',4,'Divorced',2935,7324,1,'Yes',13,3,2,2,1,2,2,1,0,0,0), (19,53,'No','Travel_Rarely',1219,'Sales',2,4,'Life Sciences',23,1, 'Female',78,2,4,'Manager',4,'Married',15427,22021,2,'No',16,3,3,0,31,3,3,25,8,3,7), (20,38,'No','Travel_Rarely',371,'Research & Development',2,3,'Life Sciences',24,4, 'Male',45,3,1,'Research Scientist',4,'Single',3944,4306,5,'Yes',11,3,3,0,6,3,3,3,2,1,2), (21,24,'No','Non-Travel',673,'Research & Development',11,2,'Other',26,1, 'Female',96,4,2,'Manufacturing Director',3,'Divorced',4011,8232,0,'No',18,3,4,1,5,5,2,4,2,1,3), (22,36,'Yes','Travel_Rarely',1218,'Sales',9,4,'Life Sciences',27,3, 'Male',82,2,1,'Sales Representative',1,'Single',3407,6986,7,'No',23,4,2,0,10,4,3,5,3,0,3), (23,34,'No','Travel_Rarely',419,'Research & Development',7,4,'Life Sciences',28,1, 'Female',53,3,3,'Research Director',2,'Single',11994,21293,0,'No',11,3,3,0,13,4,3,12,6,2,11), (24,21,'No','Travel_Rarely',391,'Research & Development',15,2,'Life Sciences',30,3, 'Male',96,3,1,'Research Scientist',4,'Single',1232,19281,1,'No',14,3,4,0,0,6,3,0,0,0,0), (25,34,'Yes','Travel_Rarely',699,'Research & Development',6,1,'Medical',31,2, 'Male',83,3,1,'Research Scientist',1,'Single',2960,17102,2,'No',11,3,3,0,8,2,3,4,2,1,3), (26,53,'No','Travel_Rarely',1282,'Research & Development',5,3,'Other',32,3, 'Female',58,3,5,'Manager',3,'Divorced',19094,10735,4,'No',11,3,4,1,26,3,2,14,13,4,8), (27,32,'Yes','Travel_Frequently',1125,'Research & Development',16,1,'Life Sciences',33,2, 'Female',72,1,1,'Research Scientist',1,'Single',3919,4681,1,'Yes',22,4,2,0,10,5,3,10,2,6,7), (28,42,'No','Travel_Rarely',691,'Sales',8,4,'Marketing',35,3, 'Male',48,3,2,'Sales Executive',2,'Married',6825,21173,0,'No',11,3,4,1,10,2,3,9,7,4,2), (29,44,'No','Travel_Rarely',477,'Research & Development',7,4,'Medical',36,1, 'Female',42,2,3,'Healthcare Representative',4,'Married',10248,2094,3,'No',14,3,4,1,24,4,3,22,6,5,17), (30,46,'No','Travel_Rarely',705,'Sales',2,4,'Marketing',38,2, 'Female',83,3,5,'Manager',1,'Single',18947,22822,3,'No',12,3,4,0,22,2,2,2,2,2,1), (31,33,'No','Travel_Rarely',924,'Research & Development',2,3,'Medical',39,3, 'Male',78,3,1,'Laboratory Technician',4,'Single',2496,6670,4,'No',11,3,4,0,7,3,3,1,1,0,0), (32,44,'No','Travel_Rarely',1459,'Research & Development',10,4,'Other',40,4, 'Male',41,3,2,'Healthcare Representative',4,'Married',6465,19121,2,'Yes',13,3,4,0,9,5,4,4,2,1,3), (33,30,'No','Travel_Rarely',125,'Research & Development',9,2,'Medical',41,4, 'Male',83,2,1,'Laboratory Technician',3,'Single',2206,16117,1,'No',13,3,4,0,10,5,3,10,0,1,8), (34,39,'Yes','Travel_Rarely',895,'Sales',5,3,'Technical Degree',42,4, 'Male',56,3,2,'Sales Representative',4,'Married',2086,3335,3,'No',14,3,3,1,19,6,4,1,0,0,0), (35,24,'Yes','Travel_Rarely',813,'Research & Development',1,3,'Medical',45,2, 'Male',61,3,1,'Research Scientist',4,'Married',2293,3020,2,'Yes',16,3,1,1,6,2,2,2,0,2,0), (36,43,'No','Travel_Rarely',1273,'Research & Development',2,2,'Medical',46,4, 'Female',72,4,1,'Research Scientist',3,'Divorced',2645,21923,1,'No',12,3,4,2,6,3,2,5,3,1,4), (37,50,'Yes','Travel_Rarely',869,'Sales',3,2,'Marketing',47,1, 'Male',86,2,1,'Sales Representative',3,'Married',2683,3810,1,'Yes',14,3,3,0,3,2,3,3,2,0,2), (38,35,'No','Travel_Rarely',890,'Sales',2,3,'Marketing',49,4, 'Female',97,3,1,'Sales Representative',4,'Married',2014,9687,1,'No',13,3,1,0,2,3,3,2,2,2,2), (39,36,'No','Travel_Rarely',852,'Research & Development',5,4,'Life Sciences',51,2, 'Female',82,2,1,'Research Scientist',1,'Married',3419,13072,9,'Yes',14,3,4,1,6,3,4,1,1,0,0), (40,33,'No','Travel_Frequently',1141,'Sales',1,3,'Life Sciences',52,3, 'Female',42,4,2,'Sales Executive',1,'Married',5376,3193,2,'No',19,3,1,2,10,3,3,5,3,1,3) ;

验证行数:

SELECT COUNT(*) AS ods_row_count FROM best_practice_hr_analytics.doc_ods_employees;

ods_row_count ------------- 40

Column Masking:薪酬字段脱敏

monthly_income
monthly_income
是高度敏感字段。方案:HR 管理员看原始值,其他用户查询返回 -1。

-- 创建脱敏函数 CREATE OR REPLACE FUNCTION best_practice_hr_analytics.mask_salary(salary INT) RETURNS INT AS CASE WHEN current_user() IN ('privileged_user') THEN salary -- 替换为实际获授权的用户名 ELSE -1 END; -- 绑定到 monthly_income 列 ALTER TABLE best_practice_hr_analytics.doc_ods_employees CHANGE COLUMN monthly_income SET MASK best_practice_hr_analytics.mask_salary;

验证脱敏效果(管理员账号可见原始值):

SELECT emp_id, department, job_role, monthly_income FROM best_practice_hr_analytics.doc_ods_employees LIMIT 5;

emp_id | department | job_role | monthly_income -------+--------------------------+------------------------+--------------- 31 | Research & Development | Laboratory Technician | 2496 32 | Research & Development | Healthcare Representative | 6465 33 | Research & Development | Laboratory Technician | 2206 34 | Sales | Sales Representative | 2086 35 | Research & Development | Research Scientist | 2293


DWD 层:员工事件时间线

DWD 层基于 ODS 原始数据,衍生出三类分析字段:任期分层(

tenure_band
tenure_band
)、晋升停滞标志(
promotion_stalled_flag
promotion_stalled_flag
)和留任风险评分(
retention_risk_score
retention_risk_score
)。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_dwd_employee_timeline AS SELECT emp_id, department, job_role, job_level, age, gender, marital_status, education, education_field, years_at_company, years_in_current_role, years_since_last_promo, years_with_curr_mgr, total_working_years, monthly_income, pct_salary_hike, performance_rating, job_satisfaction, work_life_balance, env_satisfaction, overtime, business_travel, attrition, -- 任期分层,便于分组分析 CASE WHEN years_at_company <= 1 THEN 'New' WHEN years_at_company <= 3 THEN 'Junior' WHEN years_at_company <= 7 THEN 'Mid' WHEN years_at_company <= 15 THEN 'Senior' ELSE 'Veteran' END AS tenure_band, -- 晋升停滞标志:2 年以上未晋升且绩效评分 >= 3 CASE WHEN years_since_last_promo >= 2 AND performance_rating >= 3 THEN 1 ELSE 0 END AS promotion_stalled_flag, -- 留任风险评分(0.0 ~ 1.0,越高越危险) ROUND( CASE WHEN overtime = 'Yes' THEN 0.25 ELSE 0.0 END + CASE WHEN job_satisfaction <= 2 THEN 0.25 ELSE 0.0 END + CASE WHEN work_life_balance <= 2 THEN 0.20 ELSE 0.0 END + CASE WHEN years_since_last_promo >= 3 THEN 0.15 ELSE 0.0 END + CASE WHEN pct_salary_hike <= 12 THEN 0.15 ELSE 0.0 END , 2) AS retention_risk_score FROM best_practice_hr_analytics.doc_ods_employees;

配置 Studio 刷新任务

# 在 Studio best_practices/hr_analytics/ 路径下创建任务 cz-cli task create-folder hr_analytics --parent 186117 -p skill_test # 返回 folder_id: 186127 cz-cli task create refresh_hr_dwd_timeline --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_dwd_timeline \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dwd_employee_timeline;" \ -p skill_test cz-cli task save-cron refresh_hr_dwd_timeline --cron "0 1 * * *" -p skill_test

任务路径:

best_practices/hr_analytics/refresh_hr_dwd_timeline
best_practices/hr_analytics/refresh_hr_dwd_timeline
,每日 01:00 触发。在此任务上可进一步配置:数据行数告警(若 DWD 行数骤降触发通知)、数据质量规则(
retention_risk_score
retention_risk_score
不得全为 0)等。

手动触发首次刷新

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dwd_employee_timeline; SELECT COUNT(*) AS dwd_count FROM best_practice_hr_analytics.doc_dwd_employee_timeline;

dwd_count --------- 40

按任期分层的离职风险分析

任期分层帮助 HR 识别哪个阶段员工流失风险最高:

SELECT tenure_band, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_cnt FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY tenure_band ORDER BY avg_risk DESC;

tenure_band | emp_count | avg_risk | attrition_cnt ------------+-----------+----------+-------------- Senior | 10 | 0.39 | 1 Mid | 13 | 0.36 | 4 Junior | 7 | 0.32 | 2 New | 8 | 0.26 | 2 Veteran | 2 | 0.15 | 0

结果解读:Senior(7-15 年)和 Mid(3-7 年)员工的留任风险均值最高,分别为 0.39 和 0.36,提示这两个群体承担较多加班和晋升停滞压力。Veteran(15 年以上)员工风险最低,可能已达到稳定职级。


DWS 层:部门人力指标聚合

DWS 层以

department
department
为粒度聚合 DWD 层数据,输出部门级离职率、平均任期、薪酬基准和风险指数。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_dws_dept_headcount_metrics AS SELECT department, COUNT(*) AS headcount, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_count, ROUND(SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS attrition_rate_pct, ROUND(AVG(years_at_company), 2) AS avg_tenure_years, ROUND(AVG(monthly_income), 0) AS avg_monthly_income, ROUND(AVG(performance_rating), 2) AS avg_performance, ROUND(AVG(job_satisfaction), 2) AS avg_job_satisfaction, ROUND(AVG(retention_risk_score), 2) AS avg_retention_risk, SUM(CASE WHEN overtime = 'Yes' THEN 1 ELSE 0 END) AS overtime_headcount, SUM(promotion_stalled_flag) AS promotion_stalled_count FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY department;

cz-cli task create refresh_hr_dws_dept_metrics --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_dws_dept_metrics \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dws_dept_headcount_metrics;" \ -p skill_test cz-cli task save-cron refresh_hr_dws_dept_metrics --cron "30 1 * * *" -p skill_test

手动刷新并查询:

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_dws_dept_headcount_metrics; SELECT department, headcount, attrition_count, attrition_rate_pct, avg_tenure_years, avg_monthly_income, avg_performance, avg_retention_risk FROM best_practice_hr_analytics.doc_dws_dept_headcount_metrics ORDER BY attrition_rate_pct DESC;

department | headcount | attrition_count | attrition_rate_pct | avg_tenure_years | avg_monthly_income | avg_performance | avg_retention_risk ---------------------------+-----------+-----------------+--------------------+------------------+--------------------+-----------------+------------------- Sales | 9 | 4 | 44.44 | 6.44 | 6973 | 3.11 | 0.29 Research & Development | 31 | 5 | 16.13 | 5.77 | 4650 | 3.16 | 0.34

结果解读

  • Sales 部门离职率(44.44%)远高于 R&D(16.13%),但 Sales 的平均薪酬(6973)高于 R&D(4650),说明薪酬不是 Sales 离职的主因。结合
    avg_retention_risk
    avg_retention_risk
    数据,R&D 团队整体风险均值(0.34)反而高于 Sales(0.29),可能与 R&D 加班和晋升停滞更突出有关。
  • 两个部门的绩效均值相近(3.11 vs 3.16),说明离职者并非低绩效员工,HR 应优先关注组织氛围和职业发展通道。

ADS 层:离职风险报告与薪酬分析

离职风险报告

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_hr_analytics.doc_ads_attrition_risk_report AS SELECT emp_id, department, job_role, job_level, tenure_band, years_at_company, years_since_last_promo, overtime, job_satisfaction, work_life_balance, performance_rating, monthly_income, pct_salary_hike, promotion_stalled_flag, retention_risk_score, attrition, -- 风险等级 CASE WHEN retention_risk_score >= 0.6 THEN 'HIGH' WHEN retention_risk_score >= 0.3 THEN 'MEDIUM' ELSE 'LOW' END AS risk_level, -- 主要离职信号 CASE WHEN overtime = 'Yes' AND job_satisfaction <= 2 THEN 'Overwork+LowSatisfaction' WHEN overtime = 'Yes' THEN 'Overwork' WHEN job_satisfaction <= 2 THEN 'LowSatisfaction' WHEN years_since_last_promo >= 3 THEN 'PromotionStalled' WHEN pct_salary_hike <= 12 THEN 'LowPayRaise' ELSE 'Normal' END AS primary_risk_signal FROM best_practice_hr_analytics.doc_dwd_employee_timeline;

cz-cli task create refresh_hr_ads_risk_report --type SQL --folder 186127 -p skill_test cz-cli task save-content refresh_hr_ads_risk_report \ --content "REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_ads_attrition_risk_report;" \ -p skill_test cz-cli task save-cron refresh_hr_ads_risk_report --cron "0 2 * * *" -p skill_test

刷新后查看风险分布:

REFRESH DYNAMIC TABLE best_practice_hr_analytics.doc_ads_attrition_risk_report; SELECT risk_level, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS actual_attrition FROM best_practice_hr_analytics.doc_ads_attrition_risk_report GROUP BY risk_level ORDER BY avg_risk DESC;

risk_level | emp_count | avg_risk | actual_attrition -----------+-----------+----------+----------------- HIGH | 5 | 0.68 | 2 MEDIUM | 16 | 0.43 | 2 LOW | 19 | 0.15 | 5

查看 HIGH 风险员工明细:

SELECT emp_id, department, job_role, years_at_company, years_since_last_promo, monthly_income, pct_salary_hike, retention_risk_score, risk_level, primary_risk_signal FROM best_practice_hr_analytics.doc_ads_attrition_risk_report WHERE risk_level = 'HIGH' ORDER BY retention_risk_score DESC;

emp_id | department | job_role | years_at_company | years_since_last_promo | monthly_income | pct_salary_hike | retention_risk_score | risk_level | primary_risk_signal -------+--------------------------+----------------------+------------------+------------------------+----------------+-----------------+----------------------+------------+-------------------------- 17 | Research & Development | Research Scientist | 6 | 0 | 3298 | 12 | 0.85 | HIGH | Overwork+LowSatisfaction 7 | Research & Development | Laboratory Technician| 1 | 0 | 2670 | 20 | 0.70 | HIGH | Overwork+LowSatisfaction 27 | Research & Development | Research Scientist | 10 | 6 | 3919 | 22 | 0.65 | HIGH | Overwork+LowSatisfaction 30 | Sales | Manager | 2 | 2 | 18947 | 12 | 0.60 | HIGH | LowSatisfaction 1 | Sales | Sales Executive | 6 | 0 | 5993 | 11 | 0.60 | HIGH | Overwork

结果解读

  • emp_id=17(R&D Research Scientist)风险最高(0.85),触发因素为同时存在加班和低工作满意度,且调薪幅度仅 12%,处于触发 LowPayRaise 的临界值。建议优先面谈并考虑薪酬调整。
  • emp_id=27(R&D Research Scientist,10年资历)连续 6 年未晋升(
    years_since_last_promo=6
    years_since_last_promo=6
    ),尽管薪酬调幅相对合理(22%),但职业发展停滞是主要风险信号。
  • emp_id=30(Sales Manager,薪酬高达 18947)高薪但工作满意度低,说明管理层的挽留需要关注非物质因素。

加班 × 满意度交叉分析

SELECT overtime, job_satisfaction, COUNT(*) AS emp_count, ROUND(AVG(retention_risk_score), 2) AS avg_risk, SUM(CASE WHEN attrition = 'Yes' THEN 1 ELSE 0 END) AS attrition_cnt FROM best_practice_hr_analytics.doc_dwd_employee_timeline GROUP BY overtime, job_satisfaction ORDER BY avg_risk DESC LIMIT 8;

overtime | job_satisfaction | emp_count | avg_risk | attrition_cnt ---------+------------------+-----------+----------+-------------- Yes | 2 | 1 | 0.85 | 0 Yes | 1 | 3 | 0.62 | 1 Yes | 4 | 5 | 0.43 | 2 No | 1 | 5 | 0.41 | 2 No | 2 | 5 | 0.37 | 0 Yes | 3 | 5 | 0.34 | 3 No | 3 | 8 | 0.22 | 0 No | 4 | 8 | 0.12 | 1

结果解读

overtime=Yes
overtime=Yes
+
job_satisfaction <= 2
job_satisfaction <= 2
的组合风险均值(0.62-0.85)显著高于其他组合,是最需优先干预的交叉维度。值得注意的是,不加班但满意度低(
overtime=No, satisfaction=1
overtime=No, satisfaction=1
)的风险均值(0.41)接近加班+中等满意度群体,说明工作满意度的独立效应不容忽视。


薪酬公平分析(窗口函数)

以下查询不构建 Dynamic Table,直接在 DWD 层运行,作为临时分析工单使用。

同级薪酬公平性分析

SELECT emp_id, department, job_role, job_level, monthly_income, years_at_company, pct_salary_hike, AVG(monthly_income) OVER (PARTITION BY department, job_level) AS dept_level_avg_income, ROUND( (monthly_income - AVG(monthly_income) OVER (PARTITION BY department, job_level)) / AVG(monthly_income) OVER (PARTITION BY department, job_level) * 100 , 1) AS income_vs_peer_pct, RANK() OVER (PARTITION BY department ORDER BY monthly_income DESC) AS income_rank_in_dept FROM best_practice_hr_analytics.doc_dwd_employee_timeline WHERE department = 'Sales' ORDER BY job_level DESC, monthly_income DESC;

emp_id | department | job_role | job_level | monthly_income | years_at_company | pct_salary_hike | dept_level_avg_income | income_vs_peer_pct | income_rank_in_dept -------+------------+-----------------------+-----------+----------------+------------------+-----------------+-----------------------+--------------------+-------------------- 30 | Sales | Manager | 5 | 18947 | 2 | 12 | 18947 | 0 | 1 19 | Sales | Manager | 4 | 15427 | 25 | 16 | 15427 | 0 | 2 28 | Sales | Sales Executive | 2 | 6825 | 9 | 11 | 5070 | 34.6 | 3 1 | Sales | Sales Executive | 2 | 5993 | 6 | 11 | 5070 | 18.2 | 4 40 | Sales | Sales Executive | 2 | 5376 | 5 | 19 | 5070 | 6.0 | 5 34 | Sales | Sales Representative | 2 | 2086 | 1 | 14 | 5070 | -58.9 | 8 22 | Sales | Sales Representative | 1 | 3407 | 5 | 23 | 2701 | 26.1 | 6 37 | Sales | Sales Representative | 1 | 2683 | 3 | 14 | 2701 | -0.7 | 7 38 | Sales | Sales Representative | 1 | 2014 | 2 | 13 | 2701 | -25.4 | 9

结果解读

  • emp_id=34(Sales Representative,job_level=2)薪酬比同级别均值低 58.9%,且仅入职 1 年,属于低薪高风险员工,应检查录用时薪酬设置是否合理。
  • emp_id=28(Sales Executive)薪酬比同级均值高 34.6%,结合其 9 年任期,属于正常的资深溢价;但其调薪幅度(11%)低于同类员工,若持续则可能积累不满。

晋升间隔滞后分析(LAG)

SELECT emp_id, department, job_role, job_level, years_at_company, years_since_last_promo, performance_rating, LAG(performance_rating) OVER ( PARTITION BY department ORDER BY years_at_company ) AS prev_emp_perf, years_since_last_promo - LAG(years_since_last_promo) OVER ( PARTITION BY department ORDER BY years_at_company ) AS promo_lag_delta FROM best_practice_hr_analytics.doc_dwd_employee_timeline WHERE department = 'Research & Development' ORDER BY years_since_last_promo DESC LIMIT 10;

emp_id | department | job_role | job_level | years_at_company | years_since_last_promo | performance_rating | prev_emp_perf | promo_lag_delta -------+--------------------------+-----------------------------+-----------+------------------+------------------------+--------------------+---------------+---------------- 16 | Research & Development | Manufacturing Director | 3 | 10 | 8 | 3 | 4 | 7 10 | Research & Development | Healthcare Representative | 2 | 7 | 7 | 3 | 3 | 4 27 | Research & Development | Research Scientist | 1 | 10 | 6 | 4 | 3 | -2 29 | Research & Development | Healthcare Representative | 3 | 22 | 5 | 3 | 3 | 1 13 | Research & Development | Research Scientist | 1 | 5 | 4 | 3 | 3 | 4 26 | Research & Development | Manager | 5 | 14 | 4 | 3 | 3 | 2

结果解读

  • emp_id=16(Manufacturing Director)已 8 年未晋升,绩效评分维持在 3(Excellent),
    promo_lag_delta=7
    promo_lag_delta=7
    表示其晋升等待时间比部门中任职年限相近的前一名员工多出 7 年,属于明显的晋升异常。
  • emp_id=27(Research Scientist,10 年任期,连续 6 年未晋升,绩效评分 4 = Outstanding)是最典型的"高绩效被卡级"案例,已在 HIGH 风险名单中出现,需要 HR BP 介入评估晋升通道。

数仓对象总览

全部构建完成后,

best_practice_hr_analytics
best_practice_hr_analytics
Schema 下的对象:

SHOW TABLES IN best_practice_hr_analytics;

schema_name | table_name | is_dynamic -------------------------------+----------------------------------+----------- best_practice_hr_analytics | doc_ads_attrition_risk_report | true best_practice_hr_analytics | doc_dwd_employee_timeline | true best_practice_hr_analytics | doc_dws_dept_headcount_metrics | true best_practice_hr_analytics | doc_ods_employees | false

Studio 任务路径(

best_practices/hr_analytics/
best_practices/hr_analytics/
):

任务名刷新目标调度时间
refresh_hr_dwd_timeline
refresh_hr_dwd_timeline
doc_dwd_employee_timeline
doc_dwd_employee_timeline
每日 01:00
refresh_hr_dws_dept_metrics
refresh_hr_dws_dept_metrics
doc_dws_dept_headcount_metrics
doc_dws_dept_headcount_metrics
每日 01:30
refresh_hr_ads_risk_report
refresh_hr_ads_risk_report
doc_ads_attrition_risk_report
doc_ads_attrition_risk_report
每日 02:00

注意事项

  • Column Masking 对 Dynamic Table 透明生效:DWD 层继承 ODS 的

    monthly_income
    monthly_income
    后,非特权用户在 DWD / DWS / ADS 中查到的薪酬字段均为脱敏值(-1)。若需要薪酬公平分析,必须使用已在 Column Masking 策略中授权的账号(即在 mask 函数的
    IN()
    IN()
    列表中显式列出的用户名)运行该查询,才能看到未脱敏的原始值。

  • Dynamic Table 不写 REFRESH INTERVAL:所有 Dynamic Table DDL 中均不包含

    REFRESH INTERVAL
    REFRESH INTERVAL
    参数,刷新调度统一通过 Studio Task 管理。这样做的好处是在同一任务节点上可以附加数据质量规则(如检测
    retention_risk_score
    retention_risk_score
    是否全为 0)和告警通知。

  • 留任风险评分为启发式模型

    retention_risk_score
    retention_risk_score
    基于加班、满意度、晋升停滞和调薪幅度等规则叠加,适合快速筛选高风险群体,不能替代专业的统计预测模型。生产环境建议用 ZettaPark Python Task 运行机器学习模型并将结果写回 ADS 层。

  • Dynamic Table 增量刷新依赖 ODS 变更追踪:第一次

    REFRESH
    REFRESH
    会做全量快照;后续增量刷新只处理 ODS 层自上次刷新以来新增或变更的行。若 ODS 使用
    INSERT OVERWRITE
    INSERT OVERWRITE
    或全量重写,会导致 Dynamic Table 退化为全量刷新,显著增加计算开销。

  • 晋升间隔 LAG 分析的窗口语义

    LAG
    LAG
    PARTITION BY department ORDER BY years_at_company
    PARTITION BY department ORDER BY years_at_company
    以同部门员工的任职年限排序,比较的是年限相近员工之间的晋升停滞差异,不是同一员工的历史时序。如需同一员工在不同时间点的对比,需要员工历史快照表(含时间维度)。


相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询