audit-case-rag

Verified·Scanned 2/18/2026

This skill builds a local RAG index for audit case folders and supports page-level file:// citations, using ./scripts/audit_case_rag.py to index and query. It invokes soffice for Office→PDF conversion and instantiates fastembed.TextEmbedding (may contact a remote model); no secret-reading or explicit exfiltration is instructed.

from clawhub.ai·v911805e·14.6 KB·0 installs
Scanned from 0.1.0 at 911805e · Transparency log ↗
$ vett add clawhub.ai/jack4world/audit-case-rag

audit-case-rag

This skill packages a local-only workflow to build a searchable evidence index for a single audit/investigation case and query it with page-level citations.

Workflow

0) Prepare a case folder (事件驱动)

Create a case directory named:

  • <项目问题编号>__<标题>

Inside, use stage folders (stage is inferred from folder name):

  • 01_policy_basis/ (basis) — 制度/流程/授权
  • 02_process/ (process) — 招采/定标/过程证据
  • 03_contract/ (contract) — 合同/补充协议
  • 04_settlement_payment/ (payment) — 结算/付款/发票/验收
  • 05_comm/ (comm) — 邮件/会议纪要/IM
  • 06_interviews/ (interview) — 访谈/笔录/询证
  • 07_workpapers/ (workpaper) — 底稿/抽样/复核表
  • 09_rectification/ (rectification) — 整改/闭环

Full template: references/case-folder-template.md

1) Install dependencies (local)

From the skill folder (or copy the script into your repo):

python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements.txt

LibreOffice is recommended for Office→PDF page citations:

  • soffice must be available (PATH) or pass --soffice /path/to/soffice.

2) Index the case

./scripts/audit_case_rag.py index \
  --case-dir "/path/to/<项目问题编号>__<标题>" \
  --out-dir  "/path/to/audit_rag_db"

Outputs:

  • manifest.jsonl written into the case directory
  • audit_rag_db/<case_id>.joblib (persistent local index)

3) Query with event filters

./scripts/audit_case_rag.py query \
  --case "<项目问题编号>" \
  --stage payment \
  "付款节点是否倒挂?请给出处页码"

Notes:

  • Evidence lines include clickable file://...#page=N citations when possible.
  • Retrieval is hybrid: embedding recall + TF‑IDF rerank (alpha configurable).

Safety/Privacy

  • No cloud APIs. Everything runs locally.
  • Do not commit outputs (indices, converted PDFs) to git.