feat(skill): add ashare-pre-st-filter — A股 ST/*ST 风险预测框架#63
Conversation
新增 skill:基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因 营收 / 利润 / 净资产 / 分红不达标而被风险警示,并将新浪监管处罚记录 作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假。 核心设计: - 双轴输出(风险等级 + 可信度),避免单维度误导 - R1 触线 / R4 双年联动:1.2 / 1.5 buffer 系数有意区分 - E2 监管处罚双窗口策略(窗口 A 上一完整财年 + 窗口 B 过去 12 个月) - subject_normalized 三态加权(company×1.0 / shareholder×0.5 / officer×0.5) - 频次等级阈值基于加权条数:≥5.0 极高 / 3.0 高 / 2.0 中 抓取脚本(agent/src/skills/.../fetch_sina_penalties.py): - 仅依赖 stdlib(HTMLParser + urllib + gzip) - SSRF 防护:白名单 host vip.stock.finance.sina.com.cn - HTMLParser 状态机替代易回溯正则(抗 ReDoS) - NFKC 文本归一化,兼容全/半角变体 - _RecordParser 处理嵌套 strong / 多 td / 缺 thead 等边界 - HTTP 失败 / 解析失败均回落 fallback JSON,不抛到调用层 - 重试 + 指数退避(MAX_RETRIES=3) 测试(agent/tests/test_fetch_sina_penalties.py): - 49 个 case,覆盖 _validate_stockid / _normalize_subject 优先级 / _extract_event_type 黑名单 / _RecordParser 端到端 + 边界 / _apply_date_filter / fetch_penalty_list fallback / 反 ReDoS 计时 - 真实页面冒烟:600745.SH 抓到 40 条,结构化字段正确 Co-Authored-By: Claude Code <noreply@anthropic.com> AI-Model: claude-opus-4-6
There was a problem hiding this comment.
Pull request overview
该 PR 新增 skill ashare-pre-st-filter,用于基于 A 股最新财务披露/预告与“新浪监管处罚”证据面,前瞻评估公司下一财年触发 ST/*ST 风险,并以“风险等级 × 可信度”的双轴方式输出,避免单维度结论误导。
Changes:
- 新增
SKILL.md:定义适用范围、数据获取规范(tushare 优先/akshare 兜底)、R1-R4 预测红线与 E1-E3 证据面、输出模板与禁止事项。 - 新增
fetch_sina_penalties.py:stdlib 实现的新浪处罚页抓取与 HTMLParser 解析、主体/原因/处理人标准化、日期过滤与失败 fallback。 - 新增
test_fetch_sina_penalties.py:覆盖 stockid 校验、主体识别优先级、event_type 提取、防脏 HTML 卡死、fallback 行为等单测。
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| agent/src/skills/ashare-pre-st-filter/SKILL.md | 定义该风险预测 skill 的数据规范、规则体系与输出模板,并说明 E2(新浪处罚)证据面的使用方式 |
| agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py | 实现新浪处罚记录抓取与解析(含 SSRF host 白名单、NFKC 归一、失败 fallback 等) |
| agent/tests/test_fetch_sina_penalties.py | 为抓取/解析脚本提供单元测试与边界用例覆盖 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- main() now catches ValueError from _validate_stockid and emits fallback
JSON to stdout with non-zero exit (preserves the 'always JSON' contract)
- Remove dead code: _strip_html() and unused 'from html import unescape'
- Refine failure-mode wording in SKILL.md (network/parse vs invalid input
both produce {source:unavailable,error:...})
- Add 3 SSRF-defense tests: disallowed host / disallowed scheme /
URL_TEMPLATE tampering -> http_failed fallback
- Reject Copilot C5 (incorrectly conflated tushare daily_basic.total_mv
unit '万元' with akshare stock_zh_a_spot_em '总市值' field, which is
in '元' per upstream eastmoney push2 protocol; akshare source confirms
no scaling). Added an inline anti-confusion note in the data-mapping
table so future readers / AI reviewers don't repeat the mistake.
Tests: 52/52 pass (49 original + 3 new SSRF cases).
Co-Authored-By: Claude Code <noreply@anthropic.com>
AI-Model: claude-opus-4-6
AI-Contributed/Feature: 35/35
AI-Contributed/UT: 17/17
There was a problem hiding this comment.
Pull request overview
This PR introduces a new skill ashare-pre-st-filter, providing an A-share ST/*ST risk forecasting framework and a companion stdlib-only scraper for Sina Finance regulatory penalty records (E2 evidence), with unit tests.
Changes:
- Added
ashare-pre-st-filterskill documentation (data acquisition rules, red-line logic R1–R4, evidence E1–E3, and output template). - Added
fetch_sina_penalties.py(stdlib-only) to fetch + parse Sina penalty HTML into structured records with basic normalization. - Added pytest coverage for validation, parsing edge cases, date filtering, and failure fallbacks.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| agent/src/skills/ashare-pre-st-filter/SKILL.md | New skill spec detailing the ST/*ST risk framework and data-fetch workflow. |
| agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py | New Sina penalties fetcher/parser used as E2 evidence source. |
| agent/tests/test_fetch_sina_penalties.py | New unit tests validating scraper behavior and robustness characteristics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def _parse_penalty_list(html: str) -> list[dict[str, Any]]: | ||
| extractor = _TableExtractor() | ||
| extractor.feed(html) |
|
|
||
| **触线判定**(双条件,亏损链 + 营收联动): | ||
| - 高风险:`worst_last < 0` 且 `worst_profit_est < 0`,**且**(R1 当前年命中高风险 **或** `revenue_full_year_est < 板块阈值 × 1.5`)——即两年均亏 + 营收处于退市风险区间。 | ||
| - 中风险:(a) 连续两年均亏但营收远超阈值(≥ 阈值 × 1.5)——亏损链成立但不会触发财务类 *ST,仍可能因经营恶化转 ST;(b) 仅 `worst_profit_est < 0`(当前年预测亏损,去年盈利,未形成连续);(c) 仅 `worst_last < 0` 但当前年预测为正且接近 0(< 板块阈值 × 0.5)。 |
| with urlopen(req, timeout=timeout) as resp: # noqa: S310 — host is whitelisted | ||
| raw = resp.read() | ||
| if resp.headers.get("Content-Encoding", "").lower() == "gzip": | ||
| raw = gzip.decompress(raw) | ||
| try: |
| """文本归一化:NFKC 折叠全角/半角 + 去除内部空白。 | ||
|
|
||
| 应用场景(CR P2):新浪页面会出现“5%以上股东”、“控股 股东”等变体, | ||
| 原吝始“kw in text”会漏命中。归一化后“5%”→“5%”,中文间空格被吃掉。 |
|
已合并,谢谢这个很扎实的 PR! 我这边在 merge 后补了一个 follow-up hardening:Sina 处罚记录现在会要求传 CI on main 已通过。再次感谢把 A 股 ST 风险预测框架整理得这么完整,这个 skill 很适合作为后续 DataProvider / 基本面预筛选能力的入口。 |
以下内容为人类手打
什么时候用?什么情况用?
策略的量化部分
Claude:
Summary
新增 skill
ashare-pre-st-filter:A 股 ST/*ST 风险预测框架。基于最新中报/三季报或业绩预告/快报,预测下一财年是否会因
营收 / 利润 / 净资产 / 分红不达标而被风险警示,并将新浪监管处罚
记录作为独立证据面纳入风险等级。仅适用于 A 股,不预测财务造假
(造假属事后归因,无法事前预测)。
Motivation
A 股 *ST 的核心法定双条件——净利润 < 0 且 营收 < 板块阈值
(主板 3 亿 / 创业板·科创板 1 亿)——在投资侧具有强可预测性。
当前缺乏一个把"硬触线"+"双年联动"+"监管处罚证据"统一进双轴
(风险等级 × 可信度)输出的 skill。
Design Highlights
risk_level ∈ {低,中,高,极高}×confidence ∈ {低,中,高},避免单维度误导worst_profit = min(n_income_est, profit_dedt_est)< 0 且营收触线 → 高;营收 < 阈值 ×1.2 → 中(预测 buffer)subject_normalized三态:company×1.0 / shareholder×0.5 / officer×0.5(个人行为不等同于公司治理失序)Implementation
agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.pyHTMLParser+urllib+gzip+unicodedata)vip.stock.finance.sina.com.cn_RecordParser状态机替代多重否定先行正则_norm_text兼容全/半角变体(如5%以上股东){"source":"unavailable",...},3 次指数退避SUBJECT_KEYWORDS中 shareholder 先于 officer,处理身份叠加agent/src/skills/ashare-pre-st-filter/SKILL.mdTests
agent/tests/test_fetch_sina_penalties.py— 49 case 全 pass:TestValidateStockid.SH/.SZ/.BJ后缀、6 位裸码、空/格式错误TestNormalizeSubjectTestExtractEventTypeTestParsePenaltyListTestApplyDateFilterTestFetchPenaltyListFallbackTestNoCatastrophicBacktracking<tr>< 1sTestNormalizeFullwidthTestRecordParserEdgeCases<strong>/ 多<td>/ 缺<thead>真实页面冒烟
$ python agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py \ --ts-code 600745.SH --no-filter source=sina matched=40 { "ann_date":"2025-07-10", "event_type":"问讯", "title":"闻泰科技:关于2024年年度报告的信息披露监管问询函的回复公告", "reason_normalized":"信息披露违规", "issuer_normalized":"上交所", "subject_normalized":"company", ... }CR / 修复轨迹
本 PR 已经历 P0/P1/P2 三轮自查:
_RecordParser边界(修复期间真实抓到 1 个嵌套 strong bug)Files
agent/src/skills/ashare-pre-st-filter/SKILL.md(+511)agent/src/skills/ashare-pre-st-filter/scripts/fetch_sina_penalties.py(+519)agent/tests/test_fetch_sina_penalties.py(+374)总计 1404 行,0 删除。
Risks & Limitations
_RecordParser(已抗 ReDoS / 容错降级)SUBJECT_KEYWORDS基于现有公开监管文书归纳,新身份语境可能漏召(fallback 到company,影响仅是少 0.5 权重折扣)Checklist