SOC 日志分析数仓实践

将防火墙日志、身份认证日志(IAM)和应用访问日志集中导入云器 Lakehouse,构建低成本、高覆盖的威胁检测数仓,替代或增强传统 SIEM 的分析能力。本文以 Kaggle 网络安全威胁检测日志(含防火墙、IDS、应用三类日志,30 条代表性样本)为数据集,端到端演示 OSS PIPE → Bronze → Silver → Gold 的完整构建过程,并覆盖 Bloomfilter Index、Inverted Index、Dynamic Table、SQL UDF、Time Travel 五项关键平台能力的落地用法。


概述

SOC 日志分析的核心挑战是:海量异构日志(防火墙/IDS/应用)快速接入,从中识别真正有威胁的行为,同时支持安全事件发生后的溯源分析。

问题云器解决方案
日志代理写入 OSS 存储桶后需自动入库OSS PIPE(EVENT_NOTIFICATION 模式),新文件落桶即触发摄取
source_ip / dest_ip 高基数列频繁点查Bloomfilter Index,过滤非命中数据块,降低扫描量
攻击工具指纹、请求路径关键字搜索Inverted Index(english / keyword 分词),支持全文匹配
原始日志 → 归一化 → 威胁聚合自动刷新Dynamic Table,声明式 SQL,增量计算
IP 威胁情报 API 实时标注恶意 IPExternal Function,嵌入 SQL 调用外部威胁情报服务
安全事件发生后需回溯攻击时刻的完整数据Time Travel,按时间戳或版本查询历史快照

涉及的 SQL 命令

命令 / 函数用途说明
CREATE TABLE
CREATE TABLE
建 Bronze 层原始日志表普通表,作为 Dynamic Table 上游
CREATE BLOOMFILTER INDEX
CREATE BLOOMFILTER INDEX
source_ip
source_ip
/
dest_ip
dest_ip
创建布隆过滤索引
适合高基数 IP 列的点查过滤
CREATE INVERTED INDEX
CREATE INVERTED INDEX
user_agent
user_agent
/
request_path
request_path
创建倒排索引
支持攻击工具指纹全文检索
CREATE PIPE
CREATE PIPE
创建 OSS 持续摄取管道EVENT_NOTIFICATION 模式,新文件自动触发
CREATE FUNCTION
CREATE FUNCTION
创建 SQL UDF
classify_ip_risk
classify_ip_risk
封装 IP 风险分级逻辑
CREATE DYNAMIC TABLE
CREATE DYNAMIC TABLE
创建 Silver / Gold 层增量计算表系统自动识别上游变更并增量刷新
REFRESH DYNAMIC TABLE
REFRESH DYNAMIC TABLE
手动触发一次刷新初次构建或调试时使用
SELECT ... TIMESTAMP AS OF
SELECT ... TIMESTAMP AS OF
历史快照查询安全事件溯源,回溯攻击时刻数据
MATCH_ALL
MATCH_ALL
全文搜索函数利用 Inverted Index 加速日志关键字检索

前置准备

本文所有示例在

best_practice_soc_log
best_practice_soc_log
Schema 下运行。

CREATE SCHEMA IF NOT EXISTS best_practice_soc_log;


Bronze 层:原始日志表

建表

Bronze 层保存从 OSS PIPE 或 INSERT 写入的原始日志,字段与日志源保持一致,不做业务加工。

CREATE TABLE IF NOT EXISTS best_practice_soc_log.doc_raw_logs ( log_id BIGINT, log_timestamp TIMESTAMP, source_ip STRING, dest_ip STRING, protocol STRING, action STRING, threat_label STRING, log_type STRING, bytes_transferred BIGINT, user_agent STRING, request_path STRING, ingest_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP() );

ingest_time
ingest_time
使用
DEFAULT CURRENT_TIMESTAMP()
DEFAULT CURRENT_TIMESTAMP()
,OSS PIPE 写入时自动填充,无需在日志体中携带。

创建 Bloomfilter Index

source_ip
source_ip
dest_ip
dest_ip
是高基数字符串列,安全分析师频繁以特定 IP 为条件点查,适合 Bloomfilter Index。

CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_source_ip ON TABLE doc_raw_logs (source_ip); CREATE BLOOMFILTER INDEX IF NOT EXISTS idx_bf_dest_ip ON TABLE doc_raw_logs (dest_ip);

创建 Inverted Index

攻击工具通常在

user_agent
user_agent
中留下特征字符串(如
SQLMap
SQLMap
Nmap
Nmap
Metasploit
Metasploit
),
request_path
request_path
包含注入载荷(如
/admin?id=1 OR 1=1
/admin?id=1 OR 1=1
)。两列建立 Inverted Index 支持全文检索:

CREATE INVERTED INDEX IF NOT EXISTS idx_inv_user_agent ON TABLE doc_raw_logs (user_agent) PROPERTIES('analyzer'='english'); CREATE INVERTED INDEX IF NOT EXISTS idx_inv_request_path ON TABLE doc_raw_logs (request_path) PROPERTIES('analyzer'='keyword');

user_agent
user_agent
使用
english
english
分词器,按单词切分;
request_path
request_path
使用
keyword
keyword
,整体匹配不切分,保留 URL 语义。

验证索引创建结果:

SHOW INDEXES FROM best_practice_soc_log.doc_raw_logs;

index_name | index_type --------------------|------------ idx_bf_source_ip | bloom_filter idx_inv_user_agent | inverted idx_inv_request_path| inverted idx_bf_dest_ip | bloom_filter

数据摄取:OSS PIPE(EVENT_NOTIFICATION 模式)

生产环境中,日志代理(Fluentd / Logstash)将日志写入 OSS 存储桶,由 OSS 的事件通知触发 PIPE 自动摄取。

方式一:通过 OSS PIPE 持续摄取(推荐)

首先创建指向日志存储桶的 Storage Connection 和 Volume,然后创建 PIPE:

-- 已有 OSS Storage Connection 后创建 Volume CREATE VOLUME soc_log_vol EXTERNAL STORAGE_CONNECTION = '<your_oss_connection>' LOCATION = 'oss://<your-bucket>/soc-logs/'; -- 创建 OSS PIPE(EVENT_NOTIFICATION 模式) CREATE PIPE IF NOT EXISTS best_practice_soc_log.pipe_raw_logs VIRTUAL_CLUSTER = 'DEFAULT' AUTO_INGEST = TRUE AS COPY INTO best_practice_soc_log.doc_raw_logs (log_timestamp, source_ip, dest_ip, protocol, action, threat_label, log_type, bytes_transferred, user_agent, request_path) FROM ( SELECT TO_TIMESTAMP($1, 'YYYY-MM-DD"T"HH24:MI:SS'), $2, $3, $4, $5, $6, $7, $8::BIGINT, $9, $10 FROM VOLUME soc_log_vol ) USING csv OPTIONS('header'='true', 'sep'=',');

方式二:INSERT 模拟(无 OSS 环境时)

若暂未配置 OSS,可通过

INSERT INTO
INSERT INTO
直接写入,模拟 OSS PIPE 已解析写入的效果:

从本地 CSV 导入数据(推荐):

-- 第一步:通过 SQL PUT 将本地 CSV 文件上传到 User Volume PUT '/path/to/your/data.csv' TO USER VOLUME FILE 'data.csv';

-- 第二步:从 User Volume COPY INTO 表 COPY INTO best_practice_soc_log.doc_raw_logs FROM USER VOLUME USING csv OPTIONS('header'='true', 'sep'=',', 'nullValue'='') FILES ('data.csv');

也可直接内联插入测试数据(不需要 CSV 文件):

INSERT INTO best_practice_soc_log.doc_raw_logs (log_id, log_timestamp, source_ip, dest_ip, protocol, action, threat_label, log_type, bytes_transferred, user_agent, request_path) VALUES (1, CAST('2024-05-01 00:00:00' AS TIMESTAMP), '192.168.1.125', '192.168.1.124', 'TCP', 'blocked', 'benign', 'firewall', 10889, 'Nmap Scripting Engine', '/'), (2, CAST('2024-05-03 08:15:00' AS TIMESTAMP), '192.168.1.130', '192.168.1.100', 'TCP', 'allowed', 'benign', 'firewall', 2048, 'Nmap Scripting Engine', '/'), (3, CAST('2024-05-07 14:22:00' AS TIMESTAMP), '10.0.0.15', '10.0.0.20', 'UDP', 'allowed', 'benign', 'firewall', 512, 'Mozilla/5.0', '/'), (4, CAST('2024-05-10 09:30:00' AS TIMESTAMP), '172.16.0.1', '172.16.0.5', 'TCP', 'allowed', 'benign', 'firewall', 3072, 'Cobalt Strike', '/'), (5, CAST('2024-05-12 11:45:00' AS TIMESTAMP), '192.168.2.50', '192.168.1.200', 'TCP', 'blocked', 'benign', 'firewall', 1024, 'Mozilla/5.0 Firefox/119.0', '/'), (6, CAST('2024-05-15 16:00:00' AS TIMESTAMP), '10.1.0.10', '10.1.0.50', 'TCP', 'allowed', 'benign', 'firewall', 768, 'curl/7.64.1', '/'), (7, CAST('2024-05-20 10:05:00' AS TIMESTAMP), '192.168.1.77', '192.168.1.100', 'HTTP', 'allowed', 'benign', 'ids', 1536, 'SQLMap/1.6-dev', '/login'), (8, CAST('2024-05-22 13:10:00' AS TIMESTAMP), '192.168.1.50', '192.168.1.80', 'UDP', 'allowed', 'benign', 'ids', 256, 'Havoc/0.7', '/'), (9, CAST('2024-05-25 09:00:00' AS TIMESTAMP), '10.0.0.5', '10.0.0.10', 'TCP', 'allowed', 'benign', 'ids', 1024, 'Metasploit v6.3', '/api/health'), (10, CAST('2024-05-28 15:30:00' AS TIMESTAMP), '192.168.1.228', '192.168.1.1', 'HTTP', 'allowed', 'benign', 'ids', 2048, 'SQLMap/1.6-dev', '/'), (11, CAST('2024-06-01 08:45:00' AS TIMESTAMP), '10.0.0.25', '10.0.0.100', 'TCP', 'allowed', 'benign', 'ids', 384, 'Mozilla/5.0 Safari/537', '/api/status'), (12, CAST('2024-06-05 10:20:00' AS TIMESTAMP), '192.168.1.88', '192.168.1.10', 'HTTP', 'allowed', 'benign', 'application', 5120, 'Mozilla/5.0 Chrome/119.0', '/'), (13, CAST('2024-06-10 14:35:00' AS TIMESTAMP), '10.0.0.30', '10.0.0.5', 'HTTPS', 'allowed', 'benign', 'application', 3584, 'Mozilla/5.0 Firefox/118.0', '/'), (14, CAST('2024-06-12 11:15:00' AS TIMESTAMP), '192.168.1.60', '192.168.1.200', 'HTTP', 'allowed', 'benign', 'application', 1280, 'Nmap Scripting Engine', '/api/search'), (15, CAST('2024-06-15 09:50:00' AS TIMESTAMP), '10.0.0.40', '10.0.0.20', 'HTTPS', 'allowed', 'benign', 'application', 2048, 'Mozilla/5.0 Edge/120.0', '/login'), (16, CAST('2024-06-18 16:30:00' AS TIMESTAMP), '192.168.1.90', '192.168.1.50', 'HTTP', 'allowed', 'benign', 'application', 1792, 'Mozilla/5.0 Chrome/120.0', '/login'), (17, CAST('2024-06-20 10:00:00' AS TIMESTAMP), '10.0.0.50', '10.0.0.15', 'HTTP', 'allowed', 'benign', 'application', 2560, 'Mozilla/5.0 Safari/604', '/home'), (18, CAST('2024-06-22 14:00:00' AS TIMESTAMP), '192.168.1.95', '192.168.1.80', 'HTTP', 'allowed', 'benign', 'application', 1024, 'Mozilla/5.0 Chrome/121.0', '/about'), (19, CAST('2024-06-25 09:30:00' AS TIMESTAMP), '217.89.155.68', '192.168.1.20', 'HTTP', 'allowed', 'benign', 'application', 2048, 'SQLMap/1.6-dev', '/login'), (20, CAST('2024-06-27 11:00:00' AS TIMESTAMP), '10.0.0.60', '10.0.0.25', 'HTTP', 'allowed', 'benign', 'application', 4096, 'Mozilla/5.0 Firefox/120.0', '/assets/logo.png'), (21, CAST('2024-07-31 00:00:00' AS TIMESTAMP), '177.52.183.80', '192.168.1.50', 'HTTPS', 'blocked', 'suspicious', 'ids', 45164, 'Mozilla/5.0 Chrome/120.0', '/login?backup.sql'), (22, CAST('2024-08-05 03:15:00' AS TIMESTAMP), '103.22.200.174','192.168.1.20', 'HTTP', 'blocked', 'suspicious', 'ids', 8192, 'SQLMap/1.6-dev', '/api/users'), (23, CAST('2024-08-10 02:30:00' AS TIMESTAMP), '91.108.4.55', '192.168.1.10', 'TCP', 'blocked', 'suspicious', 'ids', 16384, 'Mozilla/5.0 Firefox/119.0', '/'), (24, CAST('2024-08-12 01:20:00' AS TIMESTAMP), '45.142.213.99', '192.168.1.100', 'TCP', 'blocked', 'suspicious', 'firewall', 6144, 'Nmap Scripting Engine', '/login'), (25, CAST('2024-09-05 04:00:00' AS TIMESTAMP), '78.128.113.47', '192.168.1.30', 'HTTP', 'blocked', 'suspicious', 'application', 12288, 'Mozilla/5.0 Chrome/119.0', '/wp-admin'), (26, CAST('2024-05-18 00:00:00' AS TIMESTAMP), '185.220.101.33','192.168.1.10', 'TCP', 'blocked', 'malicious', 'firewall', 4096, 'Metasploit v6.3', '/'), (27, CAST('2024-06-29 00:00:00' AS TIMESTAMP), '198.199.119.1', '192.168.1.22', 'HTTP', 'blocked', 'malicious', 'ids', 62500, 'curl/7.64.1', '/etc/passwd'), (28, CAST('2024-08-14 00:30:00' AS TIMESTAMP), '91.240.118.172','192.168.1.15', 'TCP', 'blocked', 'malicious', 'firewall', 32768, 'Cobalt Strike', '/'), (29, CAST('2024-11-05 05:20:00' AS TIMESTAMP), '45.95.147.236', '192.168.1.25', 'TCP', 'blocked', 'malicious', 'ids', 65536, 'Havoc/0.7', '/shell.php'), (30, CAST('2024-09-27 23:45:00' AS TIMESTAMP), '104.21.58.152', '192.168.1.40', 'UDP', 'allowed', 'malicious', 'application', 8192, 'Mozilla/5.0 Chrome/118.0', '/admin?id=1 OR 1=1') ;

验证 Bronze 层行数及威胁分布:

SELECT threat_label, log_type, COUNT(*) AS cnt FROM best_practice_soc_log.doc_raw_logs GROUP BY threat_label, log_type ORDER BY threat_label, cnt DESC;

threat_label | log_type | cnt -------------|-------------|---- benign | application | 9 benign | firewall | 6 benign | ids | 5 malicious | ids | 2 malicious | firewall | 2 malicious | application | 1 suspicious | ids | 3 suspicious | firewall | 1 suspicious | application | 1

20 条 benign,5 条 suspicious,5 条 malicious,覆盖防火墙、IDS、应用三类日志源。


全文检索:利用 Inverted Index 定位攻击特征

按工具指纹检索

MATCH_ALL
MATCH_ALL
函数利用 Inverted Index 快速定位含特定攻击工具特征的日志行:

SELECT log_id, source_ip, user_agent, threat_label FROM best_practice_soc_log.doc_raw_logs WHERE MATCH_ALL(user_agent, 'SQLMap') LIMIT 5;

log_id | source_ip | user_agent | threat_label -------|-----------------|------------------|------------- 19 | 217.89.155.68 | SQLMap/1.6-dev | benign 22 | 103.22.200.174 | SQLMap/1.6-dev | suspicious 7 | 192.168.1.77 | SQLMap/1.6-dev | benign 10 | 192.168.1.228 | SQLMap/1.6-dev | benign

搜索到 4 条携带

SQLMap
SQLMap
特征的日志,其中 1 条已被标注为
suspicious
suspicious
benign
benign
标签的行说明该工具本次操作未触发威胁判定,但仍值得关注。

按请求路径关键字检索(Bloomfilter 辅助 IP 过滤)

结合 Bloomfilter Index 点查特定 IP,再用路径分析攻击意图:

SELECT log_id, source_ip, threat_label, action FROM best_practice_soc_log.doc_raw_logs WHERE source_ip = '185.220.101.33';

log_id | source_ip | threat_label | action -------|-----------------|--------------|-------- 26 | 185.220.101.33 | malicious | blocked

Bloomfilter Index 在 block 级别快速排除不含该 IP 的数据块,只扫描命中块。


IP 风险分级 UDF

将 IP 风险判定逻辑封装为 SQL UDF,Silver 和 Gold 层均可复用:

CREATE OR REPLACE FUNCTION best_practice_soc_log.classify_ip_risk( ip STRING, threat_label STRING, is_attack_tool INT ) RETURNS STRING AS CASE WHEN threat_label = 'malicious' THEN 'HIGH' WHEN threat_label = 'suspicious' AND is_attack_tool = 1 THEN 'HIGH' WHEN threat_label = 'suspicious' THEN 'MEDIUM' WHEN is_attack_tool = 1 THEN 'MEDIUM' ELSE 'LOW' END;

验证 UDF:

SELECT source_ip, threat_label, is_attack_tool, best_practice_soc_log.classify_ip_risk(source_ip, threat_label, is_attack_tool) AS ip_risk_level FROM best_practice_soc_log.doc_silver_normalized_logs WHERE threat_label != 'benign' ORDER BY ip_risk_level LIMIT 8;

source_ip | threat_label | is_attack_tool | ip_risk_level -----------------|--------------|----------------|--------------- 103.22.200.174 | suspicious | 1 | HIGH 45.142.213.99 | suspicious | 1 | HIGH 185.220.101.33 | malicious | 1 | HIGH 198.199.119.1 | malicious | 0 | HIGH 91.240.118.172 | malicious | 1 | HIGH 104.21.58.152 | malicious | 0 | HIGH 45.95.147.236 | malicious | 1 | HIGH 177.52.183.80 | suspicious | 0 | MEDIUM


Silver 层 Dynamic Table:归一化与威胁打标

Silver 层在 Bronze 原始日志基础上完成三件事:IP 内外网分类、威胁等级数字化、攻击工具指纹标记。

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_soc_log.doc_silver_normalized_logs AS SELECT log_id, log_timestamp, DATE(log_timestamp) AS log_date, HOUR(log_timestamp) AS log_hour, source_ip, dest_ip, protocol, action, threat_label, log_type, bytes_transferred, user_agent, request_path, ingest_time, -- 是否内网源 IP(RFC 1918 地址段) CASE WHEN source_ip LIKE '192.168.%' OR source_ip LIKE '10.%' OR source_ip LIKE '172.16.%' THEN 1 ELSE 0 END AS is_internal_src, -- 威胁等级数字化 CASE threat_label WHEN 'malicious' THEN 3 WHEN 'suspicious' THEN 2 WHEN 'benign' THEN 1 ELSE 0 END AS threat_level, -- 攻击工具指纹 CASE WHEN user_agent IN ( 'SQLMap/1.6-dev','Nmap Scripting Engine', 'Metasploit v6.3','Cobalt Strike','Havoc/0.7') THEN 1 ELSE 0 END AS is_attack_tool FROM best_practice_soc_log.doc_raw_logs;

手动触发首次刷新:

REFRESH DYNAMIC TABLE best_practice_soc_log.doc_silver_normalized_logs;

验证 Silver 层:

SELECT COUNT(*) AS cnt, SUM(is_attack_tool) AS attack_tool_cnt FROM best_practice_soc_log.doc_silver_normalized_logs;

cnt | attack_tool_cnt ----|---------------- 30 | 14

30 条日志中有 14 条携带已知攻击工具 User-Agent。


Gold 层 Dynamic Table:威胁指标聚合

Gold 层提供两张聚合表:每日每 IP 威胁汇总,以及外网高风险 IP 排行。

每日威胁汇总

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_soc_log.doc_gold_threat_summary AS SELECT source_ip, log_date, log_type, COUNT(*) AS total_events, SUM(CASE WHEN threat_label = 'malicious' THEN 1 ELSE 0 END) AS malicious_cnt, SUM(CASE WHEN threat_label = 'suspicious' THEN 1 ELSE 0 END) AS suspicious_cnt, SUM(CASE WHEN action = 'blocked' THEN 1 ELSE 0 END) AS blocked_cnt, SUM(is_attack_tool) AS attack_tool_cnt, MAX(threat_level) AS max_threat_level, SUM(bytes_transferred) AS total_bytes, COUNT(DISTINCT dest_ip) AS unique_dest_count FROM best_practice_soc_log.doc_silver_normalized_logs GROUP BY source_ip, log_date, log_type;

高风险外网 IP 排行

CREATE DYNAMIC TABLE IF NOT EXISTS best_practice_soc_log.doc_gold_high_risk_ips AS SELECT source_ip, COUNT(DISTINCT log_date) AS active_days, COUNT(*) AS total_events, SUM(CASE WHEN threat_label IN ('malicious','suspicious') THEN 1 ELSE 0 END) AS threat_events, SUM(is_attack_tool) AS attack_tool_hits, ROUND( SUM(CASE WHEN threat_label IN ('malicious','suspicious') THEN 1.0 ELSE 0.0 END) / COUNT(*) * 100, 2 ) AS threat_rate_pct, MAX(log_timestamp) AS last_seen, COUNT(DISTINCT dest_ip) AS targets_count FROM best_practice_soc_log.doc_silver_normalized_logs WHERE is_internal_src = 0 GROUP BY source_ip HAVING threat_events > 0;

手动触发刷新:

REFRESH DYNAMIC TABLE best_practice_soc_log.doc_gold_threat_summary; REFRESH DYNAMIC TABLE best_practice_soc_log.doc_gold_high_risk_ips;

查询高风险 IP 排行:

SELECT source_ip, log_date, max_threat_level, total_events, malicious_cnt, blocked_cnt FROM best_practice_soc_log.doc_gold_threat_summary ORDER BY max_threat_level DESC, total_events DESC LIMIT 5;

source_ip | log_date | max_threat_level | total_events | malicious_cnt | blocked_cnt -----------------|------------|------------------|--------------|---------------|------------ 104.21.58.152 | 2024-09-27 | 3 | 1 | 1 | 0 198.199.119.1 | 2024-06-29 | 3 | 1 | 1 | 1 45.95.147.236 | 2024-11-05 | 3 | 1 | 1 | 1 91.240.118.172 | 2024-08-14 | 3 | 1 | 1 | 1 185.220.101.33 | 2024-05-18 | 3 | 1 | 1 | 1

max_threat_level = 3
max_threat_level = 3
为最高威胁等级(malicious),
104.21.58.152
104.21.58.152
的 DNS 隧道行为(
blocked_cnt = 0
blocked_cnt = 0
malicious_cnt = 1
malicious_cnt = 1
)值得重点关注——该流量未被拦截。


配置刷新任务

Dynamic Table 的定期刷新通过 Studio Task 管理,在任务上可附加监控告警和数据质量检查。

创建 Silver 层刷新任务

# 创建 SQL 类型任务,放在 best_practices 文件夹 cz-cli task create refresh_soc_silver --type SQL --folder best_practices -p skill_test # 写入刷新 SQL cz-cli task save-content refresh_soc_silver \ --content "REFRESH DYNAMIC TABLE best_practice_soc_log.doc_silver_normalized_logs;" \ -p skill_test # 设置每 5 分钟刷新一次 cz-cli task save-cron refresh_soc_silver --cron "*/5 * * * *" -p skill_test

创建 Gold 层刷新任务

cz-cli task create refresh_soc_gold --type SQL --folder best_practices -p skill_test cz-cli task save-content refresh_soc_gold \ --content "REFRESH DYNAMIC TABLE best_practice_soc_log.doc_gold_threat_summary; REFRESH DYNAMIC TABLE best_practice_soc_log.doc_gold_high_risk_ips;" \ -p skill_test # Gold 层每 10 分钟刷新一次(依赖 Silver 完成后) cz-cli task save-cron refresh_soc_gold --cron "*/10 * * * *" -p skill_test

发布任务使调度生效:

cz-cli task deploy refresh_soc_silver -p skill_test cz-cli task deploy refresh_soc_gold -p skill_test


安全事件溯源:Time Travel

发生安全事件后,分析师需要回溯攻击时刻的完整数据状态,Time Travel 提供基于时间戳的历史快照查询。

查看历史版本

DESC HISTORY best_practice_soc_log.doc_raw_logs LIMIT 5;

version | time | total_rows | operation --------|-------------------------|------------|---------- 8 | 2026-06-06T23:37:56.770 | 30 | INSERT 7 | 2026-06-06T23:37:52.335 | 30 | INSERT 6 | 2026-06-06T23:37:24.639 | 30 | INSERT

回溯特定时刻的日志状态

-- 回溯 2026-06-06 23:37:30 时刻的数据状态 SELECT COUNT(*) AS row_count FROM best_practice_soc_log.doc_raw_logs TIMESTAMP AS OF '2026-06-06 23:37:30';

row_count --------- 30

溯源攻击路径

针对已知恶意 IP,回溯其在攻击时间窗口内的所有操作记录:

SELECT log_timestamp, source_ip, dest_ip, protocol, action, request_path FROM best_practice_soc_log.doc_raw_logs TIMESTAMP AS OF '2026-06-06 23:39:07' WHERE source_ip IN ('185.220.101.33', '198.199.119.1', '45.95.147.236') ORDER BY log_timestamp;

Time Travel 保留周期默认 7 天,在此窗口内可随时回溯任意历史版本,无需额外备份。


攻击路径分析

结合 Gold 层聚合数据,快速定位高频攻击路径:

SELECT request_path, COUNT(*) AS total_hits, SUM(CASE WHEN threat_label IN ('malicious','suspicious') THEN 1 ELSE 0 END) AS threat_hits FROM best_practice_soc_log.doc_raw_logs GROUP BY request_path ORDER BY threat_hits DESC, total_hits DESC LIMIT 8;

request_path | total_hits | threat_hits ----------------------|------------|------------ / | 13 | 3 /login | 5 | 1 /login?backup.sql | 1 | 1 /admin?id=1 OR 1=1 | 1 | 1 /wp-admin | 1 | 1 /api/users | 1 | 1 /etc/passwd | 1 | 1 /shell.php | 1 | 1

/
/
路径命中 13 次,其中 3 次为威胁行为;
/etc/passwd
/etc/passwd
/shell.php
/shell.php
是典型的系统文件探测和 Webshell 上传特征,即使只有 1 次访问也需要立即响应。


注意事项

  • Bloomfilter Index 仅对新增数据生效:索引创建后,对已有历史数据需执行
    BUILD INDEX idx_bf_source_ip ON TABLE doc_raw_logs
    BUILD INDEX idx_bf_source_ip ON TABLE doc_raw_logs
    重建;对新写入数据自动生效
  • Inverted Index 分词器选择
    user_agent
    user_agent
    english
    english
    (按单词切分),
    request_path
    request_path
    keyword
    keyword
    (整体不切分);选错分词器会导致
    MATCH_ALL
    MATCH_ALL
    无法匹配预期内容
  • Dynamic Table 不写 REFRESH INTERVAL:刷新调度必须通过 Studio Task 管理,这样才能在同一任务上附加告警和数据质量检查
  • Dynamic Table 静态分区模式:如需对 Silver/Gold 层 Dynamic Table 按
    log_date
    log_date
    分区,必须使用
    PARTITION BY
    PARTITION BY
    + 静态分区声明,不能使用动态分区推断
  • Time Travel 保留周期:默认 7 天,超出周期的历史版本不可查询;对于合规要求保留更长时间的场景,需在建表时配置
    DATA_RETENTION_TIME_IN_DAYS
    DATA_RETENTION_TIME_IN_DAYS
  • OSS PIPE EVENT_NOTIFICATION 模式:需在 OSS 控制台配置事件通知规则,将
    ObjectCreated:*
    ObjectCreated:*
    事件指向 Lakehouse 提供的消息队列 endpoint
  • External Function 调用威胁情报 API:示例中
    classify_ip_risk
    classify_ip_risk
    为 SQL UDF,是逻辑简化版本;生产场景需通过 External Function 调用 AbuseIPDB 等真实 API,参见"相关文档"
  • Column Masking:对
    source_ip
    source_ip
    等 PII 数据可设置脱敏策略

相关文档

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询