feat(skill): add ashare-pre-st-filter — A股 ST/*ST 风险预测框架 by shadowinlife · Pull Request #63 · HKUDS/Vibe-Trading

shadowinlife · 2026-04-30T08:08:17Z

以下内容为人类手打

什么时候用？什么情况用？

这是一个基础的SKILL，它告诉Agent如何预测一只股票在下个年报季是否会被ST。
对于指数型满仓策略，由于该策略头寸固定，需全年优先执行此策略进行风险防控。
对于短期操作策略（如缠论和聪明钱策略），使用者需在每年4月1日至4月28日期间执行该策略以规避风险。
对于大部分超短线用户, 每年春节后至年报发布完成后一周内，利用该策略调整头寸进行做:
- 示例1: 策略可与双针探底等蜡烛图形态学策略配合使用：当预判股票可能被ST后，提前减仓，待其被ST后出现二次圆弧底形态时再介入
- 示例2: 反向使用本策略, 预测一只ST多大可能被摘帽

策略的量化部分

检查项	问的是什么	高风险触发条件	中风险触发条件
R1 亏损+营收	明年会同时亏损且营收不够吗？	min(归母, 扣非) < 0 且营收 < 板块阈值（主板3亿/其他1亿）	净利润为负但营收达标；或营收 < 阈值×1.2
R2 净资产	年末净资产会变负吗？	当前净资产 < 0，或年末预测（当前 + 剩余利润 − 已分红）< 0	0 ~ 1亿之间
R3 分红	三年累计分红交卷了吗？	累计 < 年均利润×30% 且 < 板块阈值（主板5000万/其他3000万），两条同时触发	仅一条触发；或本年分红未公告
R4 亏损链	已连续两年亏损了吗？	去年亏 + 今年预测亏，且营收 < 阈值×1.5（光亏损不够，必须叠加营收条件）	仅今年预测亏；或连续亏但营收远超阈值
E1 审计	审计师认可这家公司吗？	保留/否定/无法表示意见	带持续经营强调段的无保留意见
E2 监管处罚	过去一年被罚了多少次？	加权条数≥5（极高）或≥3（高）；或单条含财务造假/违规担保	加权条数 2~2.5
E3 交易临界	股价和市值还在线吗？	近20日收盘<1元天数≥10；或市值<退市线（主板5亿/其他3亿）	天数1~9；或市值<退市线×1.5

Claude:

Summary

新增 skill ashare-pre-st-filter：A 股 ST/*ST 风险预测框架。

基于最新中报/三季报或业绩预告/快报，预测下一财年是否会因
营收 / 利润 / 净资产 / 分红不达标而被风险警示，并将新浪监管处罚
记录作为独立证据面纳入风险等级。仅适用于 A 股，不预测财务造假
（造假属事后归因，无法事前预测）。

Motivation

A 股 *ST 的核心法定双条件——净利润 < 0 且营收 < 板块阈值
（主板 3 亿 / 创业板·科创板 1 亿）——在投资侧具有强可预测性。
当前缺乏一个把"硬触线"+"双年联动"+"监管处罚证据"统一进双轴
（风险等级 × 可信度）输出的 skill。

Design Highlights

维度	决策
输出	双轴：`risk_level ∈ {低,中,高,极高}` × `confidence ∈ {低,中,高}`，避免单维度误导
R1 触线	`worst_profit = min(n_income_est, profit_dedt_est)` < 0 且营收触线 → 高；营收 < 阈值 ×1.2 → 中（预测 buffer）
R4 双年联动	当年触线 + 下年营收 < 阈值 ×1.5 → 高（更宽 buffer，因双年联动判定要求更严证据）
E2 监管处罚	双窗口：A = 上一完整财年（单条等级）/ B = 过去 12 个月（频次）
主体加权	`subject_normalized` 三态：company×1.0 / shareholder×0.5 / officer×0.5（个人行为不等同于公司治理失序）
频次阈值	加权条数 ≥5.0 极高 / 3.0 高 / 2.0 中 / <2.0 低；E2_freq 与 E2_single 同高 → 升极高

Implementation

`agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py`

零外部依赖：纯 stdlib（HTMLParser + urllib + gzip + unicodedata）
SSRF 防护：白名单 vip.stock.finance.sina.com.cn
抗 ReDoS：_RecordParser 状态机替代多重否定先行正则
NFKC 归一：_norm_text 兼容全/半角变体（如 ５％以上股东）
健壮性：HTTP / parse 失败均回落 {"source":"unavailable",...}，3 次指数退避
优先级：SUBJECT_KEYWORDS 中 shareholder 先于 officer，处理身份叠加

`agent/src/skills/ashare-pre-st-filter/SKILL.md`

E 证据面（E1 巨潮 / E2 处罚 / E3 财报 / E4 …），每面给出 issuer/链接/抓取脚本
端到端伪代码（含 Step 5 双窗口 Python，可直接执行）
R1 ×1.2 / R4 ×1.5 系数差异显式注释，禁止统一

Tests

agent/tests/test_fetch_sina_penalties.py — 49 case 全 pass：

类	case 数	覆盖
`TestValidateStockid`	6	`.SH/.SZ/.BJ` 后缀、6 位裸码、空/格式错误
`TestNormalizeSubject`	11	主体识别 + shareholder > officer 优先级
`TestExtractEventType`	4	黑名单防"公告日期"误识别
`TestParsePenaltyList`	9	HTMLParser 端到端
`TestApplyDateFilter`	5	日期过滤端点 + 缺失日期标记
`TestFetchPenaltyListFallback`	3	host 校验失败 / HTTP 失败 / parse 失败
`TestNoCatastrophicBacktracking`	1	200 行未闭合 `<tr>` < 1s
`TestNormalizeFullwidth`	6	NFKC 归一全/半角变体
`TestRecordParserEdgeCases`	5	嵌套 `<strong>` / 多 `<td>` / 缺 `<thead>`

真实页面冒烟

$ python agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py \
    --ts-code 600745.SH --no-filter
source=sina  matched=40
{
  "ann_date":"2025-07-10",
  "event_type":"问讯",
  "title":"闻泰科技:关于2024年年度报告的信息披露监管问询函的回复公告",
  "reason_normalized":"信息披露违规",
  "issuer_normalized":"上交所",
  "subject_normalized":"company",
  ...
}

CR / 修复轨迹

本 PR 已经历 P0/P1/P2 三轮自查：

P0-1 灾难性回溯正则 → HTMLParser 状态机
P0-2 SKILL.md 双窗口伪代码完整化
P0-3 event_type 黑名单防"公告日期"误识别
P1-1 subject 关键词扩充（一致行动人 / 5%以上股东 / …）
P1-3 R1/R4 系数差异注释
P1-4 pytest 覆盖 ≥5 核心 case（实际 49）
P1-5 输出模板加权条数
P2 NFKC 归一 + _RecordParser 边界（修复期间真实抓到 1 个嵌套 strong bug）

Files

agent/src/skills/ashare-pre-st-filter/SKILL.md (+511)
agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py (+519)
agent/tests/test_fetch_sina_penalties.py (+374)

总计 1404 行，0 删除。

Risks & Limitations

数据源依赖新浪页面结构；若改版需更新 _RecordParser（已抗 ReDoS / 容错降级）
不预测财务造假（设计取舍，已在 SKILL 文档显式说明）
SUBJECT_KEYWORDS 基于现有公开监管文书归纳，新身份语境可能漏召（fallback 到 company，影响仅是少 0.5 权重折扣）

Checklist

49/49 单元测试通过
真实页面端到端冒烟通过
仅 stdlib，无新增依赖
SSRF 防护 + 输入校验 + 抗 ReDoS
CR P0/P1/P2 全部闭环

新增 skill：基于最新中报/三季报或业绩预告/快报，预测下一财年是否会因营收 / 利润 / 净资产 / 分红不达标而被风险警示，并将新浪监管处罚记录作为独立证据面纳入风险等级。仅适用于 A 股，不预测财务造假。核心设计： - 双轴输出（风险等级 + 可信度），避免单维度误导 - R1 触线 / R4 双年联动：1.2 / 1.5 buffer 系数有意区分 - E2 监管处罚双窗口策略（窗口 A 上一完整财年 + 窗口 B 过去 12 个月） - subject_normalized 三态加权（company×1.0 / shareholder×0.5 / officer×0.5） - 频次等级阈值基于加权条数：≥5.0 极高 / 3.0 高 / 2.0 中抓取脚本（agent/src/skills/.../fetch_sina_penalties.py）： - 仅依赖 stdlib（HTMLParser + urllib + gzip） - SSRF 防护：白名单 host vip.stock.finance.sina.com.cn - HTMLParser 状态机替代易回溯正则（抗 ReDoS） - NFKC 文本归一化，兼容全/半角变体 - _RecordParser 处理嵌套 strong / 多 td / 缺 thead 等边界 - HTTP 失败 / 解析失败均回落 fallback JSON，不抛到调用层 - 重试 + 指数退避（MAX_RETRIES=3）测试（agent/tests/test_fetch_sina_penalties.py）： - 49 个 case，覆盖 _validate_stockid / _normalize_subject 优先级 / _extract_event_type 黑名单 / _RecordParser 端到端 + 边界 / _apply_date_filter / fetch_penalty_list fallback / 反 ReDoS 计时 - 真实页面冒烟：600745.SH 抓到 40 条，结构化字段正确 Co-Authored-By: Claude Code <noreply@anthropic.com> AI-Model: claude-opus-4-6

Copilot

Pull request overview

该 PR 新增 skill ashare-pre-st-filter，用于基于 A 股最新财务披露/预告与“新浪监管处罚”证据面，前瞻评估公司下一财年触发 ST/*ST 风险，并以“风险等级 × 可信度”的双轴方式输出，避免单维度结论误导。

Changes:

新增 SKILL.md：定义适用范围、数据获取规范（tushare 优先/akshare 兜底）、R1-R4 预测红线与 E1-E3 证据面、输出模板与禁止事项。
新增 fetch_sina_penalties.py：stdlib 实现的新浪处罚页抓取与 HTMLParser 解析、主体/原因/处理人标准化、日期过滤与失败 fallback。
新增 test_fetch_sina_penalties.py：覆盖 stockid 校验、主体识别优先级、event_type 提取、防脏 HTML 卡死、fallback 行为等单测。

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
agent/src/skills/ashare-pre-st-filter/SKILL.md	定义该风险预测 skill 的数据规范、规则体系与输出模板，并说明 E2（新浪处罚）证据面的使用方式
agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py	实现新浪处罚记录抓取与解析（含 SSRF host 白名单、NFKC 归一、失败 fallback 等）
agent/tests/test_fetch_sina_penalties.py	为抓取/解析脚本提供单元测试与边界用例覆盖

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- main() now catches ValueError from _validate_stockid and emits fallback JSON to stdout with non-zero exit (preserves the 'always JSON' contract) - Remove dead code: _strip_html() and unused 'from html import unescape' - Refine failure-mode wording in SKILL.md (network/parse vs invalid input both produce {source:unavailable,error:...}) - Add 3 SSRF-defense tests: disallowed host / disallowed scheme / URL_TEMPLATE tampering -> http_failed fallback - Reject Copilot C5 (incorrectly conflated tushare daily_basic.total_mv unit '万元' with akshare stock_zh_a_spot_em '总市值' field, which is in '元' per upstream eastmoney push2 protocol; akshare source confirms no scaling). Added an inline anti-confusion note in the data-mapping table so future readers / AI reviewers don't repeat the mistake. Tests: 52/52 pass (49 original + 3 new SSRF cases). Co-Authored-By: Claude Code <noreply@anthropic.com> AI-Model: claude-opus-4-6 AI-Contributed/Feature: 35/35 AI-Contributed/UT: 17/17

Copilot

Pull request overview

This PR introduces a new skill ashare-pre-st-filter, providing an A-share ST/*ST risk forecasting framework and a companion stdlib-only scraper for Sina Finance regulatory penalty records (E2 evidence), with unit tests.

Changes:

Added ashare-pre-st-filter skill documentation (data acquisition rules, red-line logic R1–R4, evidence E1–E3, and output template).
Added fetch_sina_penalties.py (stdlib-only) to fetch + parse Sina penalty HTML into structured records with basic normalization.
Added pytest coverage for validation, parsing edge cases, date filtering, and failure fallbacks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
agent/src/skills/ashare-pre-st-filter/SKILL.md	New skill spec detailing the ST/*ST risk framework and data-fetch workflow.
agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py	New Sina penalties fetcher/parser used as E2 evidence source.
agent/tests/test_fetch_sina_penalties.py	New unit tests validating scraper behavior and robustness characteristics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+def _parse_penalty_list(html: str) -> list[dict[str, Any]]:
+    extractor = _TableExtractor()
+    extractor.feed(html)


+
+**触线判定**（双条件，亏损链 + 营收联动）：
+- 高风险：`worst_last < 0` 且 `worst_profit_est < 0`，**且**（R1 当前年命中高风险 **或** `revenue_full_year_est < 板块阈值 × 1.5`）——即两年均亏 + 营收处于退市风险区间。
+- 中风险：(a) 连续两年均亏但营收远超阈值（≥ 阈值 × 1.5）——亏损链成立但不会触发财务类 *ST，仍可能因经营恶化转 ST；(b) 仅 `worst_profit_est < 0`（当前年预测亏损，去年盈利，未形成连续）；(c) 仅 `worst_last < 0` 但当前年预测为正且接近 0（< 板块阈值 × 0.5）。


+            with urlopen(req, timeout=timeout) as resp:  # noqa: S310 — host is whitelisted
+                raw = resp.read()
+                if resp.headers.get("Content-Encoding", "").lower() == "gzip":
+                    raw = gzip.decompress(raw)
+            try:


+    """文本归一化：NFKC 折叠全角/半角 + 去除内部空白。
+
+    应用场景（CR P2）：新浪页面会出现“５％以上股东”、“控股 股东”等变体，
+    原吝始“kw in text”会漏命中。归一化后“５％”→“5%”，中文间空格被吃掉。


warren618 · 2026-04-30T10:04:50Z

已合并，谢谢这个很扎实的 PR！

我这边在 merge 后补了一个 follow-up hardening：Sina 处罚记录现在会要求传 --stock-name，并给每条记录标注 target_relevance / e2_countable，避免把“证券账户交易列表里出现目标股票”的无关处罚误计入 E2 频次。真实页面也用 600745.SH smoke 过了：郭雪那类记录会被排除，股东处罚记录仍会正常计入。

CI on main 已通过。再次感谢把 A 股 ST 风险预测框架整理得这么完整，这个 skill 很适合作为后续 DataProvider / 基本面预筛选能力的入口。

Copilot AI review requested due to automatic review settings April 30, 2026 08:08

Copilot started reviewing on behalf of shadowinlife April 30, 2026 08:08 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

shadowinlife requested a review from Copilot April 30, 2026 09:28

Copilot started reviewing on behalf of shadowinlife April 30, 2026 09:28 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

warren618 merged commit 968b649 into HKUDS:main Apr 30, 2026

warren618 mentioned this pull request Apr 30, 2026

[Feature] Tushare fundamental coverage is insufficient for pre-filter strategies (need DataProvider abstraction) #62

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(skill): add ashare-pre-st-filter — A股 ST/*ST 风险预测框架#63

feat(skill): add ashare-pre-st-filter — A股 ST/*ST 风险预测框架#63
warren618 merged 2 commits into
HKUDS:mainfrom
shadowinlife:feat/ashare-pre-st-filter-skill

shadowinlife commented Apr 30, 2026 •

edited

Loading

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

warren618 commented Apr 30, 2026

Labels

4 participants

Uh oh!

Conversation

shadowinlife commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

以下内容为人类手打

什么时候用？什么情况用？

策略的量化部分

Summary

Motivation

Design Highlights

Implementation

agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py

agent/src/skills/ashare-pre-st-filter/SKILL.md

Tests

真实页面冒烟

CR / 修复轨迹

Files

Risks & Limitations

Checklist

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

warren618 commented Apr 30, 2026

Labels

4 participants

shadowinlife commented Apr 30, 2026 •

edited

Loading

`agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py`

`agent/src/skills/ashare-pre-st-filter/SKILL.md`