Build a Real-Time Financial Risk Control Data Warehouse

将银行卡交易流实时接入 Lakehouse,构建 ODS → DWD → DWS → ADS 四层风控数仓,输出每笔交易的实时风险评分,供拦截系统使用。本文以 Kaggle Credit Card Transactions Fraud Detection Dataset 为数据基础,演示 Kafka PIPE 实时接入 → Dynamic Table 滑动窗口聚合 → SQL UDF 评分 → Column Masking 脱敏 的完整链路。


概述

金融风控的核心挑战是:实时交易流每秒数千笔,需要在毫秒内完成风险特征计算并返回拦截决策。典型痛点和云器解决方案如下:

问题解决方案
交易流实时接入,Kafka 消息需秒级落仓Kafka PIPE 持续摄取,
BATCH_INTERVAL_IN_SECONDS = 60
BATCH_INTERVAL_IN_SECONDS = 60
ODS → DWD → DWS → ADS 自动增量计算Dynamic Table 声明式 SQL,系统按依赖链自动刷新
用户历史特征(均值、波动率、欺诈次数)实时聚合DWS 层 Dynamic Table,1 分钟刷新
评分逻辑复用,多层都能调用SQL UDF
calc_txn_risk_score()
calc_txn_risk_score()
,封装金额偏差 + 地理距离 + 历史风险
银行卡号、姓名等 PII 数据需差异化授权Column Masking 绑定到列,非特权用户自动脱敏
风控分析师、拦截系统、审计人员权限不同RBAC 三角色模型,细粒度授权

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
建 ODS 层原始交易表和客户主数据表普通表,作为 Dynamic Table 上游
CREATE PIPE
CREATE PIPE
创建 Kafka 持续摄取管道绑定 Kafka topic 到 ODS 目标表
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
构建 DWD、DWS、ADS 层增量计算表系统自动按引用关系排序刷新
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新首次建好后手动触发,验证链路
CREATE FUNCTION
CREATE FUNCTION
创建 SQL UDF
calc_txn_risk_score
calc_txn_risk_score
封装风险评分公式
ALTER TABLE ... CHANGE COLUMN ... SET MASK
ALTER TABLE ... CHANGE COLUMN ... SET MASK
绑定 Column Masking 策略
cc_num
cc_num
等 PII 列差异化展示
GRANT / REVOKE
GRANT / REVOKE
配置 RBAC 角色权限三角色模型(分析师 / 拦截系统 / 审计)

前置准备

本文所有示例在

best_practice_financial_risk
best_practice_financial_risk
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_financial_risk;

执行结果:

{}


ODS 层:原始交易表与客户主数据

建表

交易主表记录每笔卡刷事件,客户主表记录持卡人画像信息。

CREATE TABLE IF NOT EXISTS best_practice_financial_risk.ods_transactions ( txn_id STRING, cc_num STRING, -- 银行卡号,绑定 Column Masking merchant STRING, category STRING, amt DOUBLE, first_name STRING, last_name STRING, gender STRING, street STRING, city STRING, state STRING, zip STRING, lat DOUBLE, -- 持卡人位置(纬度) long_ DOUBLE, -- 持卡人位置(经度) city_pop BIGINT, job STRING, dob STRING, -- 出生日期(字符串格式) trans_num STRING, unix_time BIGINT, merch_lat DOUBLE, -- 商户位置(纬度) merch_long DOUBLE, -- 商户位置(经度) is_fraud INT, -- 欺诈标签:0 正常 / 1 欺诈 trans_date_trans_time TIMESTAMP ); CREATE TABLE IF NOT EXISTS best_practice_financial_risk.ods_customers ( cc_num STRING, first_name STRING, last_name STRING, gender STRING, street STRING, city STRING, state STRING, zip STRING, lat DOUBLE, long_ DOUBLE, city_pop BIGINT, job STRING, dob STRING );

配置 Kafka PIPE

生产环境中,交易数据通过 Kafka 实时接入 ODS 层。以下为 PIPE 配置模板,替换实际 broker 地址和 topic 名称后创建。

-- 先建 raw 消息接收表(JSON 字符串) CREATE TABLE IF NOT EXISTS best_practice_financial_risk.kafka_txn_raw (value STRING); -- 创建 Kafka PIPE CREATE PIPE IF NOT EXISTS best_practice_financial_risk.pipe_txn_stream VIRTUAL_CLUSTER = 'DEFAULT' BATCH_INTERVAL_IN_SECONDS = '60' AS COPY INTO best_practice_financial_risk.kafka_txn_raw FROM ( SELECT CAST(value AS STRING) AS value FROM READ_KAFKA( '<kafka-broker>:9092', -- 替换为实际 broker 地址 'bank_transactions', -- Kafka topic 名称 '', 'cz_fraud_consumer', -- consumer group ID '','','','', 'raw', 'raw', 0, map() ) );

写入测试数据

本文使用 Kaggle fraud-detection 数据集的子集,通过 INSERT 模拟 Kafka 消息已解析写入的效果。

从本地 CSV 导入数据(推荐):

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_financial_risk.ods_customers FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('data.csv');

也可直接内联插入小批量测试数据(不需要 CSV 文件):

INSERT INTO best_practice_financial_risk.ods_customers VALUES ('4532117694074009','John','Smith','M','123 Oak St','Austin','TX','78701',30.2672,-97.7431,950000,'Software Engineer','1985-06-15'), ('4716058826889367','Mary','Johnson','F','456 Elm Ave','Dallas','TX','75201',32.7767,-96.7970,1343000,'Accountant','1990-03-22'), ('4929429090508220','Robert','Williams','M','789 Pine Rd','Houston','TX','77001',29.7604,-95.3698,2300000,'Doctor','1978-11-08'), ('4532117691234567','Linda','Brown','F','321 Maple Dr','San Antonio','TX','78205',29.4241,-98.4936,1434000,'Teacher','1995-07-30'), ('4716058821111222','James','Davis','M','654 Birch Ln','Phoenix','AZ','85001',33.4484,-112.0740,1600000,'Manager','1982-09-14'), ('4929429095555666','Patricia','Miller','F','987 Cedar St','Chicago','IL','60601',41.8781,-87.6298,2700000,'Nurse','1988-04-25'), ('4532117697654321','Michael','Wilson','M','147 Spruce Ave','Los Angeles','CA','90001',34.0522,-118.2437,3980000,'Engineer','1975-12-03'), ('4716058828888999','Jennifer','Moore','F','258 Walnut Rd','New York','NY','10001',40.7128,-74.0060,8336000,'Lawyer','1992-08-17'), ('4929429093333444','David','Taylor','M','369 Hickory Dr','Seattle','WA','98101',47.6062,-122.3321,724000,'Data Scientist','1986-02-28'), ('4532117692222333','Barbara','Anderson','F','741 Ash Blvd','Miami','FL','33101',25.7617,-80.1918,460000,'Marketing','1993-11-11');

INSERT INTO best_practice_financial_risk.ods_transactions VALUES ('TXN001','4532117694074009','fraud_Kirlin and Sons','grocery_pos',9.36,'John','Smith','M','123 Oak St','Austin','TX','78701',30.2672,-97.7431,950000,'Software Engineer','1985-06-15','tx001',1325376018,30.4127,-97.8974,0,CAST('2020-01-01 00:00:18' AS TIMESTAMP)), ('TXN002','4716058826889367','fraud_Sporer-Keebler','entertainment',2529.0,'Mary','Johnson','F','456 Elm Ave','Dallas','TX','75201',32.7767,-96.7970,1343000,'Accountant','1990-03-22','tx002',1325376075,33.4897,-96.9132,1,CAST('2020-01-01 00:01:15' AS TIMESTAMP)), ('TXN007','4532117697654321','fraud_Olson, Becker and Koch','shopping_net',1987.40,'Michael','Wilson','M','147 Spruce Ave','Los Angeles','CA','90001',34.0522,-118.2437,3980000,'Engineer','1975-12-03','tx007',1325379440,34.1808,-118.4634,1,CAST('2020-01-01 00:57:20' AS TIMESTAMP)), ('TXN018','4716058828888999','fraud_Sauer-Kessler','entertainment',4500.00,'Jennifer','Moore','F','258 Walnut Rd','New York','NY','10001',40.7128,-74.0060,8336000,'Lawyer','1992-08-17','tx018',1325386200,40.9345,-74.1234,1,CAST('2020-01-01 02:50:00' AS TIMESTAMP)) -- ... 完整 20 条,此处省略 ;

验证 ODS 层行数:

SELECT COUNT(*) AS total_txns, SUM(is_fraud) AS fraud_count, ROUND(SUM(is_fraud)*100.0/COUNT(*), 1) AS fraud_rate_pct FROM best_practice_financial_risk.ods_transactions;

total_txns | fraud_count | fraud_rate_pct -----------+-------------+--------------- 20 | 7 | 35.0

Column Masking:银行卡号 PII 脱敏

银行卡号是高度敏感的 PII 数据,需要对非授权用户脱敏显示(只露最后 4 位)。

-- 创建脱敏函数 CREATE OR REPLACE FUNCTION best_practice_financial_risk.mask_cc_num(cc_num STRING) RETURNS STRING AS CASE WHEN current_user() IN ('privileged_user') THEN cc_num -- 替换为实际获授权的用户名 ELSE CONCAT('****-****-****-', SUBSTRING(cc_num, LENGTH(cc_num) - 3, 4)) END; -- 绑定到 ods_transactions.cc_num 列 ALTER TABLE best_practice_financial_risk.ods_transactions CHANGE COLUMN cc_num SET MASK best_practice_financial_risk.mask_cc_num;

验证绑定后效果(管理员账号可见原始值):

SELECT txn_id, cc_num, amt, is_fraud FROM best_practice_financial_risk.ods_transactions LIMIT 3;

txn_id | cc_num | amt | is_fraud -------+--------------------+--------+--------- TXN001 | 4532117694074009 | 9.36 | 0 TXN002 | 4716058826889367 | 2529.0 | 1 TXN003 | 4929429090508220 | 4.23 | 0


DWD 层:标准化交易事件

DWD 层通过 Dynamic Table 将 ODS 交易流与客户主数据 JOIN,补充持卡人画像信息,并实时计算持卡人位置与商户位置的地理距离(Haversine 简化版)。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_financial_risk.dwd_txn_events REFRESH INTERVAL 1 MINUTE VCLUSTER DEFAULT AS SELECT t.txn_id, t.cc_num, t.trans_num, t.trans_date_trans_time AS txn_time, t.unix_time, t.merchant, t.category, t.amt, t.is_fraud, t.merch_lat, t.merch_long, -- 地理距离:持卡人位置 vs 商户位置(km,Haversine 简化) ROUND( 111.2 * SQRT( POWER(t.lat - t.merch_lat, 2) + POWER((t.long_ - t.merch_long) * COS(RADIANS(t.lat)), 2) ), 2 ) AS dist_km, c.first_name, c.last_name, c.gender, c.city, c.state, c.job, c.dob, YEAR(t.trans_date_trans_time) - CAST(SUBSTRING(c.dob, 1, 4) AS INT) AS age FROM best_practice_financial_risk.ods_transactions t LEFT JOIN best_practice_financial_risk.ods_customers c ON t.cc_num = c.cc_num;

dist_km
dist_km
列计算了持卡人注册地址与刷卡商户的直线距离。距离越大,表示持卡人在远离常居地的地方消费,风险更高。

手动触发首次刷新

REFRESH DYNAMIC TABLE best_practice_financial_risk.dwd_txn_events;

验证 DWD 层数据:

SELECT txn_id, cc_num, merchant, category, amt, is_fraud, dist_km, city, state, age FROM best_practice_financial_risk.dwd_txn_events ORDER BY txn_time LIMIT 5;

txn_id | cc_num | merchant | category | amt | is_fraud | dist_km | city | state | age -------+--------------------+-----------------------------------+--------------+--------+----------+---------+--------+-------+---- TXN001 | 4532117694074009 | fraud_Kirlin and Sons | grocery_pos | 9.36 | 0 | 21.94 | Austin | TX | 35 TXN002 | 4716058826889367 | fraud_Sporer-Keebler | entertainment| 2529.0 | 1 | 80.03 | Dallas | TX | 30 TXN003 | 4929429090508220 | fraud_Osinski, Murphey and Carver | shopping_net | 4.23 | 0 | 24.2 | Houston| TX | 42 TXN004 | 4532117691234567 | Veum-Skiles | food_dining | 316.97 | 0 | 24.03 | San Antonio | TX | 25 TXN005 | 4716058821111222 | fraud_Kertzmann-Shanahan | gas_transport| 78.85 | 0 | 27.19 | Phoenix| AZ | 38

DWD 层 20 条记录全部关联到客户信息,

dist_km
dist_km
age
age
字段已计算完成。

SELECT COUNT(*) AS dwd_count FROM best_practice_financial_risk.dwd_txn_events;

dwd_count --------- 20


DWS 层:用户风险特征聚合

DWS 层按持卡人(

cc_num
cc_num
)聚合历史交易特征,包括消费均值、波动率、历史欺诈次数和高金额交易频次。这些特征是 ADS 层风险评分的核心输入。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_financial_risk.dws_user_risk_features REFRESH INTERVAL 1 MINUTE VCLUSTER DEFAULT AS SELECT cc_num, first_name, last_name, state, job, age, -- 交易统计(全量历史) COUNT(*) AS txn_total, ROUND(SUM(amt), 2) AS amt_total, ROUND(AVG(amt), 2) AS amt_avg, ROUND(MAX(amt), 2) AS amt_max, ROUND(STDDEV_POP(amt), 2) AS amt_stddev, -- 历史欺诈记录 SUM(is_fraud) AS fraud_history_count, -- 按品类交易次数 COUNT(CASE WHEN category = 'shopping_net' THEN 1 END) AS cat_shopping_net, COUNT(CASE WHEN category = 'entertainment' THEN 1 END) AS cat_entertainment, COUNT(CASE WHEN category = 'grocery_pos' THEN 1 END) AS cat_grocery, COUNT(CASE WHEN category = 'food_dining' THEN 1 END) AS cat_food_dining, -- 最近一次交易时间 MAX(txn_time) AS last_txn_time, -- 高金额交易次数(单笔 > 1000) COUNT(CASE WHEN amt > 1000 THEN 1 END) AS high_amt_txn_count FROM best_practice_financial_risk.dwd_txn_events GROUP BY cc_num, first_name, last_name, state, job, age;

REFRESH DYNAMIC TABLE best_practice_financial_risk.dws_user_risk_features;

查看高风险用户特征(按历史欺诈次数和总金额排序):

SELECT cc_num, first_name, last_name, txn_total, amt_total, amt_avg, amt_max, amt_stddev, fraud_history_count, high_amt_txn_count FROM best_practice_financial_risk.dws_user_risk_features ORDER BY fraud_history_count DESC, amt_total DESC LIMIT 5;

cc_num | first_name | last_name | txn_total | amt_total | amt_avg | amt_max | amt_stddev | fraud_history_count | high_amt_txn_count -------------------+------------+-----------+-----------+-----------+----------+---------+------------+---------------------+------------------- 4716058828888999 | Jennifer | Moore | 2 | 4521.54 | 2260.77 | 4500.0 | 2239.23 | 1 | 1 4929429090508220 | Robert | Williams | 2 | 3214.73 | 1607.37 | 3210.5 | 1603.14 | 1 | 1 4716058826889367 | Mary | Johnson | 2 | 2564.2 | 1282.1 | 2529.0 | 1246.9 | 1 | 1 4929429093333444 | David | Taylor | 2 | 2356.97 | 1178.49 | 2341.17 | 1162.69 | 1 | 1 4532117697654321 | Michael | Wilson | 2 | 2010.5 | 1005.25 | 1987.4 | 982.15 | 1 | 1

Jennifer Moore 的历史消费方差(

amt_stddev = 2239.23
amt_stddev = 2239.23
)极高,说明消费行为波动大,是高风险信号。DWS 层共生成 10 个用户风险画像。


风险评分 UDF

将评分逻辑封装为 SQL UDF,ADS 层可以直接调用,公式简洁可审计。

评分由四个因子叠加(总分上限 100,下限 0):

因子计算方式满分
金额偏差
(amt - hist_avg) / hist_stddev × 10
(amt - hist_avg) / hist_stddev × 10
,最高 40 分
40 分
地理距离距持卡人位置 > 100 km 加 20 分,> 50 km 加 10 分20 分
历史欺诈每条历史欺诈记录加 15 分无上限(被 100 截断)
高金额频次每笔 > 1000 的历史交易加 5 分无上限(被 100 截断)

CREATE FUNCTION best_practice_financial_risk.calc_txn_risk_score( p_amt DOUBLE, -- 当前交易金额 p_hist_avg DOUBLE, -- 用户历史均值 p_hist_stddev DOUBLE, -- 用户历史标准差 p_dist_km DOUBLE, -- 持卡人与商户距离(km) p_fraud_history DOUBLE, -- 历史欺诈次数 p_high_count DOUBLE -- 高金额交易次数 ) RETURNS DOUBLE AS LEAST(100.0, GREATEST(0.0, -- 金额偏差因子 CASE WHEN p_hist_stddev > 0.0 THEN LEAST(40.0, ((p_amt - p_hist_avg) / p_hist_stddev) * 10.0) ELSE 0.0 END -- 地理距离因子 + CASE WHEN p_dist_km > 100.0 THEN 20.0 WHEN p_dist_km > 50.0 THEN 10.0 ELSE 0.0 END -- 历史欺诈因子 + p_fraud_history * 15.0 -- 高金额频次因子 + p_high_count * 5.0 ));

验证两个典型场景:

SELECT best_practice_financial_risk.calc_txn_risk_score( 2529.0, 1282.1, 1246.9, 80.03, 1.0, 1.0 ) AS risk_fraud_txn, best_practice_financial_risk.calc_txn_risk_score( 9.36, 200.0, 100.0, 22.0, 0.0, 0.0 ) AS risk_normal_txn;

risk_fraud_txn | risk_normal_txn ---------------+---------------- 40.0 | 0.0

金额偏差大 + 距离 80 km 的欺诈交易评分 40(MEDIUM),小额正常消费评分 0(LOW)。


ADS 层:实时风险评分输出

ADS 层是风控数仓的最终输出,JOIN DWD 和 DWS,调用 UDF 计算实时风险评分,并打标风险等级供拦截系统直接查询。

建表

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_financial_risk.ads_txn_risk_score REFRESH INTERVAL 1 MINUTE VCLUSTER DEFAULT AS SELECT t.txn_id, t.cc_num, t.txn_time, t.merchant, t.category, t.amt, t.dist_km, t.city, t.state, t.is_fraud, u.amt_avg, u.amt_stddev, u.fraud_history_count, u.high_amt_txn_count, -- 实时风险评分 ROUND(best_practice_financial_risk.calc_txn_risk_score( t.amt, u.amt_avg, u.amt_stddev, t.dist_km, CAST(u.fraud_history_count AS DOUBLE), CAST(u.high_amt_txn_count AS DOUBLE) ), 2) AS risk_score, -- 风险等级 CASE WHEN best_practice_financial_risk.calc_txn_risk_score( t.amt, u.amt_avg, u.amt_stddev, t.dist_km, CAST(u.fraud_history_count AS DOUBLE), CAST(u.high_amt_txn_count AS DOUBLE) ) >= 60 THEN 'HIGH' WHEN best_practice_financial_risk.calc_txn_risk_score( t.amt, u.amt_avg, u.amt_stddev, t.dist_km, CAST(u.fraud_history_count AS DOUBLE), CAST(u.high_amt_txn_count AS DOUBLE) ) >= 30 THEN 'MEDIUM' ELSE 'LOW' END AS risk_level FROM best_practice_financial_risk.dwd_txn_events t LEFT JOIN best_practice_financial_risk.dws_user_risk_features u ON t.cc_num = u.cc_num;

REFRESH DYNAMIC TABLE best_practice_financial_risk.ads_txn_risk_score;

查看高风险交易排行:

SELECT txn_id, merchant, category, amt, dist_km, risk_score, risk_level, is_fraud FROM best_practice_financial_risk.ads_txn_risk_score ORDER BY risk_score DESC, is_fraud DESC LIMIT 10;

txn_id | merchant | category | amt | dist_km | risk_score | risk_level | is_fraud --------+-----------------------------------+--------------+--------+---------+------------+------------+--------- TXN002 | fraud_Sporer-Keebler | entertainment| 2529.0 | 80.03 | 40.0 | MEDIUM | 1 TXN011 | fraud_Kertzmann-Shanahan | shopping_net | 1456.78| 31.26 | 30.0 | MEDIUM | 1 TXN018 | fraud_Sauer-Kessler | entertainment| 4500.0 | 26.56 | 30.0 | MEDIUM | 1 TXN007 | fraud_Olson, Becker and Koch | shopping_net | 1987.4 | 24.78 | 30.0 | MEDIUM | 1 TXN013 | fraud_Sanford and Sons | entertainment| 3210.5 | 19.48 | 30.0 | LOW | 1 TXN009 | fraud_Zboncak LLC | entertainment| 2341.17| 16.71 | 30.0 | LOW | 1 TXN015 | fraud_Brekke-LeBsack | shopping_net | 789.0 | 31.81 | 25.0 | LOW | 1 TXN012 | Batz LLC | gas_transport| 35.2 | 12.97 | 10.0 | LOW | 0 TXN017 | Anderson-Lesch | food_dining | 23.1 | 29.51 | 10.0 | LOW | 0 TXN019 | Gleason Inc | grocery_pos | 15.8 | 23.96 | 10.0 | LOW | 0

风险等级与欺诈率交叉分析

SELECT risk_level, COUNT(*) AS txn_count, SUM(is_fraud) AS fraud_in_bucket, ROUND(SUM(is_fraud)*100.0/COUNT(*), 1) AS fraud_rate_pct, ROUND(AVG(risk_score), 1) AS avg_score FROM best_practice_financial_risk.ads_txn_risk_score GROUP BY risk_level ORDER BY avg_score DESC;

risk_level | txn_count | fraud_in_bucket | fraud_rate_pct | avg_score -----------+-----------+-----------------+----------------+---------- MEDIUM | 4 | 4 | 100.0 | 32.5 LOW | 16 | 3 | 18.8 | 11.3

MEDIUM 风险区间欺诈率达 100%,说明评分模型对高风险交易的识别效果显著。LOW 区间仍有 18.8% 的欺诈,原因是部分欺诈交易单笔金额相对较小,没有触发金额偏差因子,后续可结合序列行为特征进一步优化评分模型。

按品类的欺诈率分析

SELECT category, COUNT(*) AS txn_count, SUM(is_fraud) AS fraud_count, ROUND(SUM(is_fraud)*100.0/COUNT(*), 1) AS fraud_rate_pct, ROUND(AVG(amt), 2) AS avg_amt, ROUND(MAX(amt), 2) AS max_amt FROM best_practice_financial_risk.ads_txn_risk_score GROUP BY category ORDER BY fraud_rate_pct DESC;

category | txn_count | fraud_count | fraud_rate_pct | avg_amt | max_amt ---------------+-----------+-------------+----------------+----------+--------- entertainment | 4 | 4 | 100.0 | 3145.17 | 4500.0 shopping_net | 4 | 3 | 75.0 | 1059.35 | 1987.4 gas_transport | 2 | 0 | 0.0 | 57.03 | 78.85 health_fitness | 3 | 0 | 0.0 | 137.45 | 200.0 grocery_pos | 4 | 0 | 0.0 | 28.03 | 65.43 food_dining | 3 | 0 | 0.0 | 128.58 | 316.97

entertainment
entertainment
shopping_net
shopping_net
是欺诈高发品类,且欺诈交易金额远高于正常消费均值。这为风控规则提供了直接依据:可对这两类品类设置额外的交易限额或二次验证触发条件。


RBAC:三角色权限模型

风控场景涉及三类用户,需要差异化授权:

角色可访问数据权限说明
risk_analyst
risk_analyst
DWD 层、DWS 层分析交易模式,cc_num 自动脱敏
risk_interception
risk_interception
ADS 层仅获取风险评分和等级,不可见原始交易明细
audit_admin
audit_admin
所有层,含 PII 原始值合规审计,可见完整 cc_num 和姓名

-- 创建角色 CREATE ROLE IF NOT EXISTS risk_analyst; CREATE ROLE IF NOT EXISTS risk_interception; CREATE ROLE IF NOT EXISTS audit_admin; -- risk_analyst:可查 DWD 和 DWS(动态表用 DYNAMIC TABLE 关键字) GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.dwd_txn_events TO ROLE risk_analyst; GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.dws_user_risk_features TO ROLE risk_analyst; -- risk_interception:仅查 ADS 输出(动态表) GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.ads_txn_risk_score TO ROLE risk_interception; -- audit_admin:全层访问 GRANT SELECT ON TABLE best_practice_financial_risk.ods_transactions TO ROLE audit_admin; GRANT SELECT ON TABLE best_practice_financial_risk.ods_customers TO ROLE audit_admin; GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.dwd_txn_events TO ROLE audit_admin; GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.dws_user_risk_features TO ROLE audit_admin; GRANT SELECT ON DYNAMIC TABLE best_practice_financial_risk.ads_txn_risk_score TO ROLE audit_admin;


注意事项

  • Dynamic Table 参数格式:正确写法是
    REFRESH INTERVAL N MINUTE VCLUSTER DEFAULT
    REFRESH INTERVAL N MINUTE VCLUSTER DEFAULT
    ,不是
    REFRESH_MODE = INCREMENTAL
    REFRESH_MODE = INCREMENTAL
    REFRESH_INTERVAL = '1 minute'
    REFRESH_INTERVAL = '1 minute'
  • Column Masking:脱敏对下游 Dynamic Table 的 SELECT 也透明生效。
  • UDF 参数类型
    calc_txn_risk_score
    calc_txn_risk_score
    所有参数定义为
    DOUBLE
    DOUBLE
    ;在 Dynamic Table 中调用时,
    BIGINT
    BIGINT
    类型的聚合列(
    fraud_history_count
    fraud_history_count
    high_amt_txn_count
    high_amt_txn_count
    )需显式
    CAST(col AS DOUBLE)
    CAST(col AS DOUBLE)
    转换。
  • Bloomfilter Index 适用场景
    cc_num
    cc_num
    是高基数列,适合创建
    CREATE BLOOMFILTER INDEX
    CREATE BLOOMFILTER INDEX
    ;创建时需与目标表在同一 Schema 上下文(
    USE SCHEMA best_practice_financial_risk
    USE SCHEMA best_practice_financial_risk
    ),否则报"index and table must in the same schema"错误。
  • Kafka PIPE 创建时机:PIPE DDL 执行时会尝试连接 Kafka broker 验证订阅,需在 Kafka 集群和 topic 已就绪的情况下创建。
  • 风险评分公式局限:本文使用的简单规则评分(金额偏差 + 地理距离 + 历史欺诈)适合演示;生产环境推荐通过 External Function 调用机器学习模型 API,将评分结果写回 ADS 层。

相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询