Build a Healthcare Operations Data Warehouse

整合医院信息系统(HIS)、电子病历(EMR)和实验室系统的数据,构建医疗运营数仓,输出科室绩效指标和院级 KPI 看板。本文以加州医院 Q1 2025 入院数据为数据集,端到端演示 MySQL 离线同步 → ODS → DWD → DWS → ADS 的完整构建过程,并覆盖 Column Masking、RBAC 权限分级、Time Travel 合规审计三项关键能力的落地用法。


概述

医疗数据仓库面临的核心挑战:患者隐私保护、多源系统整合、运营指标自动化计算、以及医保对账时的历史数据可追溯性。

挑战云器解决方案
HIS/EMR 数据每日全量或增量同步MySQL 多表离线同步(整库镜像模式),自动调度
HL7/JSON 格式的检验报告文件导入COPY INTO + Volume,支持 JSON 格式批量加载
ODS→DWD→DWS→ADS 自动增量计算Dynamic Table,声明式 SQL,系统自动维护刷新依赖链
患者 ID、诊断信息等 PII 字段脱敏Column Masking,绑定到列,按角色动态控制显示
临床医生、管理层、BI 分析师权限分级RBAC 自定义角色,细粒度 Schema/Table 级别授权
医保对账、合规审计需要历史版本回溯Time Travel,
DESC HISTORY
DESC HISTORY
+
TIMESTAMP AS OF
TIMESTAMP AS OF
查询任意历史版本

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
创建 ODS 层原始表和维度表普通表,作为 Dynamic Table 上游
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 DWD / DWS / ADS 层增量计算表声明式 SQL,系统自动增量刷新
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新首次构建或调试时使用
CREATE FUNCTION
CREATE FUNCTION
创建 Column Masking 策略函数SQL UDF,按角色动态脱敏患者 PII 字段
ALTER TABLE ... CHANGE COLUMN ... SET MASK
ALTER TABLE ... CHANGE COLUMN ... SET MASK
将脱敏函数绑定到列对绑定列透明,读时动态执行
CREATE ROLE
CREATE ROLE
创建自定义 RBAC 角色区分临床、管理、BI 三类用户权限
GRANT
GRANT
授予角色数据访问权限Schema 级别或表级别授权
DESC HISTORY
DESC HISTORY
查看表历史版本列表返回每个版本的时间戳、操作类型、行数变化
SELECT ... TIMESTAMP AS OF
SELECT ... TIMESTAMP AS OF
查询指定时间点的历史数据医保对账、审计、数据恢复场景

前置准备

本文所有示例在

best_practice_healthcare_dw
best_practice_healthcare_dw
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_healthcare_dw;


ODS 层:原始业务数据入库

ODS 层保存从 HIS、EMR 和实验室系统同步过来的原始数据,不做业务转换,同时在患者隐私字段上配置 Column Masking。

建表

医院维度表(静态参考数据):

CREATE TABLE IF NOT EXISTS best_practice_healthcare_dw.doc_hospital_dim ( hospital_id STRING, hospital_name STRING, county STRING, city STRING, hospital_type STRING, bed_count INT, is_teaching BOOLEAN );

入院记录主表(HIS 离线同步目标表):

CREATE TABLE IF NOT EXISTS best_practice_healthcare_dw.doc_ods_admissions ( admission_id STRING, hospital_id STRING, patient_id STRING, age INT, sex STRING, race STRING, county_of_residence STRING, admission_date DATE, discharge_date DATE, los_days INT, admission_type STRING, admission_source STRING, discharge_disposition STRING, principal_diagnosis_code STRING, principal_diagnosis_desc STRING, major_diagnostic_category STRING, department STRING, payer_type STRING, total_charges DECIMAL(12,2), drg_code STRING, drg_description STRING, load_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

实验室检验结果表(LIS 系统 JSON 文件 COPY INTO 目标表):

CREATE TABLE IF NOT EXISTS best_practice_healthcare_dw.doc_ods_lab_results ( lab_id STRING, admission_id STRING, patient_id STRING, hospital_id STRING, test_name STRING, test_code STRING, result_value STRING, reference_range STRING, abnormal_flag STRING, collection_time TIMESTAMP, result_time TIMESTAMP, ordering_physician STRING, department STRING, load_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

MySQL 多表离线同步配置

HIS 系统通常运行在 MySQL 上,包含入院(admissions)、计费(billing)、手术记录(procedures)等多张业务表。在 ClickZetta Studio 中,通过多表离线同步任务实现日常全量 + 增量同步:

  • 同步模式:整库镜像(全库同步),将 HIS 数据库内所有业务表自动映射到
    best_practice_healthcare_dw
    best_practice_healthcare_dw
    Schema
  • 调度策略:每日 02:00 执行增量同步,首次运行执行全量同步
  • Studio 任务路径:
    best_practices/healthcare_dw/
    best_practices/healthcare_dw/

COPY INTO 导入 HL7/JSON 格式检验报告

实验室系统(LIS)通常输出 JSON 或 HL7 格式的检验报告文件,存放在对象存储(OSS/S3)上,通过 COPY INTO + Volume 定期加载。先创建对象存储访问凭据和 External Volume,再执行 COPY INTO:

-- 第一步:创建指向对象存储的 Storage Connection -- 阿里云 OSS 示例;AWS S3 / 腾讯云 COS 请按实际云厂商替换 TYPE、ENDPOINT 和凭据字段。 CREATE STORAGE CONNECTION IF NOT EXISTS lab_reports_oss_conn TYPE oss ENDPOINT = 'oss-cn-hangzhou.aliyuncs.com' ACCESS_ID = '<your_access_key_id>' ACCESS_KEY = '<your_access_key_secret>'; -- 第二步:将 LIS 检验报告所在目录挂载为 External Volume CREATE EXTERNAL VOLUME IF NOT EXISTS best_practice_healthcare_dw.lab_reports_volume LOCATION 'oss://<your-bucket>/lis/lab-reports/' USING CONNECTION lab_reports_oss_conn DIRECTORY = (enable = true, auto_refresh = true) RECURSIVE = true;

完成挂载后,从

lab_reports_volume
lab_reports_volume
批量导入 JSON 检验报告:

COPY INTO best_practice_healthcare_dw.doc_ods_lab_results FROM VOLUME best_practice_healthcare_dw.lab_reports_volume USING json ON_ERROR = 'CONTINUE';


ODS 层:Column Masking 患者 PII 脱敏

医疗数据中患者 ID 和诊断信息属于受保护的健康信息(PHI),需要根据访问者角色动态脱敏。ClickZetta 通过 Column Masking 在列级别绑定脱敏策略,对非特权用户透明屏蔽敏感字段。

创建脱敏策略函数

患者 ID 脱敏:非 workspace_admin 角色只显示首尾字符,中间替换为

*
*

CREATE OR REPLACE FUNCTION best_practice_healthcare_dw.mask_patient_id(pid STRING) RETURNS STRING AS CASE WHEN array_contains(current_roles(), 'workspace_admin') THEN pid ELSE CONCAT(SUBSTR(pid, 1, 1), REPEAT('*', LENGTH(pid) - 2), SUBSTR(pid, LENGTH(pid), 1)) END;

诊断信息脱敏:仅 workspace_admin 和 workspace_dev 角色可见完整诊断,其他角色显示

RESTRICTED
RESTRICTED

CREATE OR REPLACE FUNCTION best_practice_healthcare_dw.mask_diagnosis(diag STRING) RETURNS STRING AS CASE WHEN array_contains(current_roles(), 'workspace_admin') OR array_contains(current_roles(), 'workspace_dev') THEN diag ELSE 'RESTRICTED' END;

将脱敏策略绑定到列

-- 绑定患者 ID 脱敏 ALTER TABLE best_practice_healthcare_dw.doc_ods_admissions CHANGE COLUMN patient_id SET MASK best_practice_healthcare_dw.mask_patient_id; -- 绑定诊断信息脱敏 ALTER TABLE best_practice_healthcare_dw.doc_ods_admissions CHANGE COLUMN principal_diagnosis_desc SET MASK best_practice_healthcare_dw.mask_diagnosis;

绑定后,workspace_admin 用户查询结果(完整数据):

admission_id | patient_id | principal_diagnosis_code | principal_diagnosis_desc | payer_type -------------|------------|--------------------------|------------------------------|------------ ADM001 | P10001 | I21.9 | Acute myocardial infarction | Medicare ADM002 | P10002 | K92.1 | Melena | Commercial ADM003 | P10003 | J18.9 | Pneumonia unspecified | Medicare

普通 BI 分析师查询同一张表,脱敏生效后:

admission_id | patient_id | principal_diagnosis_code | principal_diagnosis_desc | payer_type -------------|------------|--------------------------|--------------------------|------------ ADM001 | P****1 | I21.9 | RESTRICTED | Medicare ADM002 | P****2 | K92.1 | RESTRICTED | Commercial ADM003 | P****3 | J18.9 | RESTRICTED | Medicare


DWD 层:患者就诊事件宽表

DWD 层通过 Dynamic Table 将 ODS 层的入院记录与医院维度表关联,同时派生年龄分组、出院类别等字段,形成分析友好的患者就诊事件宽表。

建表

CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_patient_visits REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT AS SELECT a.admission_id, a.hospital_id, h.hospital_name, h.county AS hospital_county, h.hospital_type, h.is_teaching, a.patient_id, a.age, CASE WHEN a.age < 18 THEN 'Pediatric' WHEN a.age < 45 THEN 'Adult' WHEN a.age < 65 THEN 'Middle-Aged' ELSE 'Senior' END AS age_group, a.sex, a.race, a.county_of_residence, a.admission_date, a.discharge_date, a.los_days, a.admission_type, a.admission_source, a.discharge_disposition, a.principal_diagnosis_code, a.principal_diagnosis_desc, a.major_diagnostic_category, a.department, a.payer_type, a.total_charges, a.drg_code, a.drg_description, CASE WHEN a.discharge_disposition IN ('Home', 'Home Health') THEN 'Routine' WHEN a.discharge_disposition = 'SNF' THEN 'Extended Care' ELSE 'Other' END AS discharge_category, CASE WHEN a.los_days > 7 THEN 1 ELSE 0 END AS is_long_stay, a.load_time FROM best_practice_healthcare_dw.doc_ods_admissions a LEFT JOIN best_practice_healthcare_dw.doc_hospital_dim h ON a.hospital_id = h.hospital_id;

检验报告事件宽表,计算检验周转时间(TAT):

CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_lab_events REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT AS SELECT l.lab_id, l.admission_id, l.patient_id, l.hospital_id, h.hospital_name, l.test_name, l.test_code, l.result_value, l.reference_range, l.abnormal_flag, l.collection_time, l.result_time, TIMESTAMPDIFF(MINUTE, l.collection_time, l.result_time) AS tat_minutes, l.ordering_physician, l.department FROM best_practice_healthcare_dw.doc_ods_lab_results l LEFT JOIN best_practice_healthcare_dw.doc_hospital_dim h ON l.hospital_id = h.hospital_id;

REFRESH DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_patient_visits; REFRESH DYNAMIC TABLE best_practice_healthcare_dw.doc_dwd_lab_events;

DWD 患者就诊宽表查询效果:

admission_id | hospital_name | age_group | department | los_days | discharge_category | is_long_stay -------------|----------------------------|--------------|---------------|----------|--------------------|------------- ADM001 | Cedars-Sinai Medical Center | Senior | Cardiology | 4 | Routine | 0 ADM003 | UCSF Medical Center | Senior | Pulmonology | 7 | Extended Care | 0 ADM008 | Stanford Health Care | Senior | Neurology | 8 | Extended Care | 1

DWD 检验事件宽表,TAT 列反映从采样到出结果的分钟数:

lab_id | hospital_name | test_name | abnormal_flag | tat_minutes -------|----------------------------|------------------|---------------|------------ LAB001 | Cedars-Sinai Medical Center | Complete Blood Count | H | 150 LAB002 | Cedars-Sinai Medical Center | Troponin I | H | 45 LAB007 | Huntington Hospital | BNP | H | 60 LAB011 | Cedars-Sinai Medical Center | ABG | H | 30

异常检验按科室分布(

abnormal_flag IN ('H', 'A')
abnormal_flag IN ('H', 'A')
):

department | abnormal_results | avg_tat_min --------------|------------------|------------ Cardiology | 5 | 111 Neurology | 3 | 80 Orthopedics | 2 | 120 Pulmonology | 1 | 120 Nephrology | 1 | 90

心脏科异常结果最多(5 条),平均周转时间 111 分钟;ICU 科最快(30 分钟)。


DWS 层:科室月度绩效聚合

DWS 层按月份 × 医院 × 科室维度聚合,生成科室运营绩效指标。

建表

CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_dws_dept_monthly REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT AS SELECT DATE_TRUNC('month', admission_date) AS admission_month, hospital_id, hospital_name, department, major_diagnostic_category, COUNT(*) AS total_admissions, COUNT(DISTINCT patient_id) AS unique_patients, ROUND(AVG(los_days), 2) AS avg_los_days, MAX(los_days) AS max_los_days, SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) AS long_stay_count, ROUND(SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS long_stay_pct, ROUND(SUM(total_charges), 2) AS total_charges, ROUND(AVG(total_charges), 2) AS avg_charges, SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_count, SUM(CASE WHEN payer_type = 'Medicare' THEN 1 ELSE 0 END) AS medicare_count, SUM(CASE WHEN payer_type = 'Medicaid' THEN 1 ELSE 0 END) AS medicaid_count, SUM(CASE WHEN payer_type = 'Commercial' THEN 1 ELSE 0 END) AS commercial_count FROM best_practice_healthcare_dw.doc_dwd_patient_visits GROUP BY 1, 2, 3, 4, 5;

REFRESH DYNAMIC TABLE best_practice_healthcare_dw.doc_dws_dept_monthly;

查询 1 月各科室绩效(按总收费降序):

SELECT department, total_admissions, avg_los_days, long_stay_pct, total_charges, emergency_count, medicare_count FROM best_practice_healthcare_dw.doc_dws_dept_monthly WHERE admission_month = '2025-01-01' ORDER BY total_charges DESC LIMIT 6;

department | total_admissions | avg_los_days | long_stay_pct | total_charges | emergency_count | medicare_count --------------|------------------|--------------|---------------|---------------|-----------------|--------------- ICU | 1 | 7.00 | 0.0 | 96800.00 | 1 | 1 Neurology | 1 | 8.00 | 100.0 | 89400.00 | 1 | 1 Orthopedics | 1 | 7.00 | 0.0 | 74500.00 | 1 | 1 Pulmonology | 1 | 7.00 | 0.0 | 62100.00 | 1 | 1 Cardiology | 1 | 6.00 | 0.0 | 58300.00 | 1 | 1 Gastroenterology | 1 | 6.00 | 0.0 | 52700.00 | 1 | 1

ICU 和神经科每例平均收费最高,这两个科室同时也是急诊入院比例 100% 的科室,反映医院高复杂度病例集中在这些科室的特征。

按疾病大类分布(全季度):

SELECT major_diagnostic_category, COUNT(*) AS admission_count, ROUND(AVG(los_days), 2) AS avg_los, ROUND(AVG(total_charges), 0) AS avg_charges FROM best_practice_healthcare_dw.doc_dwd_patient_visits GROUP BY major_diagnostic_category ORDER BY admission_count DESC LIMIT 8;

major_diagnostic_category | admission_count | avg_los | avg_charges --------------------------|-----------------|---------|------------ Circulatory | 8 | 5.13 | 50913 Nervous System | 8 | 5.50 | 62638 Digestive | 7 | 2.71 | 28800 Kidney | 6 | 4.33 | 39050 Respiratory | 6 | 6.00 | 60883 Reproductive | 2 | 2.50 | 30600 Endocrine | 2 | 2.00 | 16000 ENT | 2 | 1.00 | 10000

循环系统和神经系统疾病入院量并列最高,神经系统疾病的平均住院费用最高($62,638),呼吸系统疾病平均住院天数最长(6 天)。

按支付方(Payer Mix)分析:

SELECT payer_type, COUNT(*) AS admission_count, ROUND(AVG(los_days), 2) AS avg_los, ROUND(SUM(total_charges), 0) AS total_revenue, ROUND(AVG(total_charges), 0) AS avg_charges FROM best_practice_healthcare_dw.doc_dwd_patient_visits GROUP BY payer_type ORDER BY admission_count DESC;

payer_type | admission_count | avg_los | total_revenue | avg_charges ------------|-----------------|---------|---------------|------------ Medicare | 25 | 6.20 | 1588600 | 63544 Commercial | 16 | 1.63 | 302000 | 18875 Medicaid | 9 | 2.44 | 241600 | 26844

Medicare 患者占入院总量 50%,但平均住院天数(6.2 天)和平均费用($63,544)均远高于其他支付方,与该群体年龄大、病情复杂的临床特征一致。


ADS 层:院级 KPI 执行看板

ADS 层进一步聚合到医院维度,生成院长管理驾驶舱所需的核心 KPI。

建表

CREATE DYNAMIC TABLE best_practice_healthcare_dw.doc_ads_hospital_kpi REFRESH INTERVAL 60 MINUTE VCLUSTER DEFAULT AS SELECT hospital_id, hospital_name, hospital_county, hospital_type, is_teaching, COUNT(*) AS total_admissions, COUNT(DISTINCT patient_id) AS unique_patients, ROUND(AVG(los_days), 2) AS avg_los_days, SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) AS long_stay_count, ROUND(SUM(CASE WHEN is_long_stay = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS long_stay_rate_pct, ROUND(SUM(total_charges), 2) AS total_revenue, ROUND(AVG(total_charges), 2) AS avg_revenue_per_admission, SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_admissions, ROUND(SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS emergency_rate_pct, SUM(CASE WHEN discharge_category = 'Extended Care' THEN 1 ELSE 0 END) AS snf_discharges, SUM(CASE WHEN payer_type = 'Medicare' THEN 1 ELSE 0 END) AS medicare_cases, SUM(CASE WHEN payer_type = 'Medicaid' THEN 1 ELSE 0 END) AS medicaid_cases, SUM(CASE WHEN payer_type = 'Commercial' THEN 1 ELSE 0 END) AS commercial_cases, COUNT(DISTINCT department) AS active_departments FROM best_practice_healthcare_dw.doc_dwd_patient_visits GROUP BY hospital_id, hospital_name, hospital_county, hospital_type, is_teaching;

REFRESH DYNAMIC TABLE best_practice_healthcare_dw.doc_ads_hospital_kpi;

院级 KPI 查询结果(按总收入降序):

SELECT hospital_name, total_admissions, avg_los_days, long_stay_rate_pct, total_revenue, avg_revenue_per_admission, emergency_rate_pct, medicare_cases, commercial_cases, medicaid_cases FROM best_practice_healthcare_dw.doc_ads_hospital_kpi ORDER BY total_revenue DESC;

hospital_name | total_admissions | avg_los_days | long_stay_rate_pct | total_revenue | avg_revenue | emergency_rate_pct -----------------------------|------------------|--------------|-------------------|---------------|-------------|------------------- Cedars-Sinai Medical Center | 8 | 4.88 | 0.0 | 447100.00 | 55887.50 | 75.0 UCSF Medical Center | 7 | 4.29 | 0.0 | 322200.00 | 46028.57 | 42.9 Stanford Health Care | 6 | 5.17 | 16.7 | 312600.00 | 52100.00 | 66.7 Huntington Hospital | 6 | 4.17 | 0.0 | 257200.00 | 42866.67 | 50.0 Kaiser Permanente Fontana | 6 | 4.00 | 0.0 | 217100.00 | 36183.33 | 50.0

Cedars-Sinai 总收入最高($447,100),急诊入院率也最高(75%);Stanford Health Care 有 16.7% 的长住院率(los > 7 天),反映其收治重症病例比例较高。

按季度月份趋势分析:

SELECT CASE EXTRACT(month FROM admission_date) WHEN 1 THEN 'January' WHEN 2 THEN 'February' WHEN 3 THEN 'March' END AS month_name, COUNT(*) AS total_admissions, ROUND(AVG(los_days), 2) AS avg_los, ROUND(SUM(total_charges), 0) AS total_revenue, SUM(CASE WHEN admission_type = 'Emergency' THEN 1 ELSE 0 END) AS emergency_count FROM best_practice_healthcare_dw.doc_dwd_patient_visits GROUP BY EXTRACT(month FROM admission_date), CASE EXTRACT(month FROM admission_date) WHEN 1 THEN 'January' WHEN 2 THEN 'February' WHEN 3 THEN 'March' END ORDER BY EXTRACT(month FROM admission_date);

month_name | total_admissions | avg_los | total_revenue | emergency_count -----------|------------------|---------|---------------|---------------- January | 25 | 3.84 | 996800 | 12 February | 15 | 4.33 | 686700 | 8 March | 10 | 4.20 | 448700 | 6

1 月份入院量最高(25 例),这与加州冬季呼吸道疾病高发、流感季节因素一致。2 月平均住院天数略高(4.33 天),提示 2 月收治病例平均复杂度较高。


RBAC:权限分级配置

医疗数仓中不同用户对数据的访问需求和权限级别差异显著,通过自定义 RBAC 角色进行分层管控。

创建自定义角色

-- 临床人员:可查看患者就诊事件(DWD 层),但患者 PII 字段受 Column Masking 控制 CREATE ROLE IF NOT EXISTS healthcare_clinical_viewer COMMENT 'Clinical staff - view patient clinical data with PII masking'; -- 运营管理:只能访问科室级聚合数据(DWS/ADS 层),不接触患者粒度记录 CREATE ROLE IF NOT EXISTS healthcare_mgmt_analyst COMMENT 'Management - view aggregate KPIs only, no patient-level access'; -- BI 分析师:只能访问 ADS 层 KPI 表,供报表和看板消费 CREATE ROLE IF NOT EXISTS healthcare_bi_analyst COMMENT 'BI team - ADS layer only, read-only for reporting';

授权 Schema 访问权限

-- 临床人员:授予 Schema 的元数据读取权限(实际表级权限在表上单独控制) GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw TO ROLE healthcare_clinical_viewer; -- 管理层和 BI 分析师同样只授予元数据读取 GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw TO ROLE healthcare_mgmt_analyst; GRANT READ METADATA ON SCHEMA best_practice_healthcare_dw TO ROLE healthcare_bi_analyst;

权限分级总结

角色可访问层可见患者 PII典型用户
workspace_admin
workspace_admin
ODS / DWD / DWS / ADS完整数据DBA、数据平台管理员
healthcare_clinical_viewer
healthcare_clinical_viewer
DWD 层就诊事件patient_id 脱敏,诊断 RESTRICTED临床医生、护士
healthcare_mgmt_analyst
healthcare_mgmt_analyst
DWS / ADS 层无患者粒度字段科室主任、运营管理
healthcare_bi_analyst
healthcare_bi_analyst
ADS 层仅 KPI无患者粒度字段BI 工程师、报表开发

Time Travel:合规审计与医保对账

医疗行业的合规要求数据变更可追溯。ClickZetta Time Travel 通过

DESC HISTORY
DESC HISTORY
TIMESTAMP AS OF
TIMESTAMP AS OF
支持任意历史版本查询,适用于医保对账、监管机构审查和数据误操作恢复。

查看历史版本

DESC HISTORY best_practice_healthcare_dw.doc_ods_admissions;

version | time | total_rows | operation | user | stats --------|---------------------------|------------|-------------|------------|---------------------------------- 5 | 2026-06-06T13:32:08.973 | 50 | ALTER | admin_user | rows_inserted:25, rows_deleted:0 4 | 2026-06-06T13:32:04.327 | 50 | ALTER | admin_user | rows_inserted:25, rows_deleted:0 3 | 2026-06-06T13:28:59.337 | 50 | INSERT_INTO | admin_user | rows_inserted:25, rows_deleted:0 2 | 2026-06-06T13:28:29.718 | 25 | INSERT_INTO | admin_user | rows_inserted:25, rows_deleted:0 1 | 2026-06-06T13:26:34.306 | 0 | CREATE | admin_user | —

版本 2 时只有 25 条记录(第一批入院数据),版本 4 和 5 是 ALTER(绑定 Column Masking)操作。

基于时间戳的历史数据查询

医保对账场景:查询提交对账报告时的原始数据状态,确认彼时的入院记录数量。

-- 查询版本2(2026-06-06 13:28:29)时的数据快照 SELECT COUNT(*) AS row_count FROM best_practice_healthcare_dw.doc_ods_admissions TIMESTAMP AS OF '2026-06-06T13:28:29.718';

row_count --------- 25

当前表有 50 条记录,但通过

TIMESTAMP AS OF
TIMESTAMP AS OF
可以精确回到版本 2 时的状态,确认当时仅有 25 条记录。适用于医保审计机构要求提供"某次申报时的数据快照"的场景。

数据恢复场景

误操作后通过 Time Travel 恢复数据:

-- 将误删数据恢复到指定时间点 RESTORE TABLE best_practice_healthcare_dw.doc_ods_admissions TO TIMESTAMP AS OF '2026-06-06T13:28:29.718';


注意事项

  • Column Masking 是预览功能:需联系技术支持开通,生产使用前确认已开通。脱敏策略绑定后对所有查询路径(包括通过 Dynamic Table 间接读取)生效。
  • Dynamic Table 与上游 ODS 解耦:DWD/DWS/ADS 层 Dynamic Table 的 Column Masking 行为由 ODS 表的策略决定,DWD 表上不需要重复绑定脱敏。
  • RBAC 角色授权最小化原则:按科室或职能分配角色,避免将 workspace_admin 或 workspace_dev 等高权限角色授予普通业务用户。
  • Time Travel 保留期:默认 7 天,超出后历史版本被清理。医保对账等有长期审计需求的场景,建议通过定期快照(CTAS)将关键版本归档到独立表。
  • HIS 离线同步的全量/增量策略:首次同步建议在业务低峰期(如凌晨 02:00)执行全量同步,后续切换为增量模式,避免在数据量增大后全量同步占用过多计算资源。

相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询