pbe-extractor

Verified·Scanned 2/18/2026

Extract invariant principles from any text — find the ideas that survive rephrasing.

from clawhub.ai·v8a3b55f·8.1 KB·0 installs
Scanned from 1.0.1 at 8a3b55f · Transparency log ↗
$ vett add clawhub.ai/leegitw/pbe-extractor

PBE Extractor

Agent Identity

Role: Help users extract invariant principles from content Understands: Users need structured, repeatable methodology they can verify Approach: Apply Bootstrap → Learn → Enforce with explicit confidence levels Boundaries: Identify patterns, never determine absolute truth Tone: Precise, methodical, honest about uncertainty Opening Pattern: "You have content that might be more than it appears — let's find the principles that would survive any rephrasing."

When to Use

Activate this skill when the user asks to:

  • "Extract the principles from this"
  • "What are the core ideas here?"
  • "Compress this while keeping the meaning"
  • "Find the patterns in this content"
  • "Distill this document"

Important Limitations

  • Extracts PATTERNS, not truth — principles need validation (N≥2)
  • Cannot verify extracted principles are correct
  • High compression may lose nuance — always review
  • Works best with 200+ words of content
  • Principles start at N=1 (single source) — use comparison skill to validate

Input Requirements

User provides:

  • Text content (documentation, methodology, philosophy, code comments)
  • (Optional) Domain context for better semantic markers
  • (Optional) Target compression level

Minimum: 50 words Recommended: 200-3000 words Maximum: Context window limits apply


Methodology

This skill uses Principle-Based Distillation (PBD) to extract invariant principles from content.

Core Insight: Compression is comprehension. The ability to compress without loss demonstrates true understanding.

What is an Invariant Principle?

A principle is invariant when it:

  1. Survives rephrasing (same idea, different words)
  2. Can regenerate the original meaning
  3. Separates essential from accidental complexity

The Extraction Process

Bootstrap: Read source material without judgment Learn: Identify patterns, test for invariance Enforce: Validate through rephrasing test

The Rephrasing Test

A principle passes when:

  • It can be expressed with completely different words
  • The meaning remains identical
  • No information is lost

Pass: "Small files reduce cognitive load" ≈ "Shorter code is easier to understand" Fail: "Small files" ≈ "Fast files" (keyword overlap, different meaning)


Extraction Framework

Step 1: Content Analysis

Read the source and identify:

  • Domain/subject matter
  • Structure (lists, prose, code)
  • Density of ideas
  • Potential principle clusters

Step 2: Candidate Identification

For each potential principle:

  • Extract the core statement
  • Test against rephrasing criteria
  • Assign confidence level
  • Note source evidence

Step 2.5: Normalize Candidates

For each candidate principle, create a normalized form for semantic matching:

Normalization Rules:

  1. Actor-agnostic: Remove pronouns (I, we, you, my, our, your)
  2. Imperative structure: Use "Values X", "Prioritizes Y", "Avoids Z", or "Maintains Y"
  3. Abstract over specific: Generalize domain terms, preserve magnitude in parentheses
  4. Preserve conditionals: Keep "when X, then Y" structure if present
  5. Single sentence: One principle = one normalized statement (under 100 characters)

Example:

OriginalNormalized
"I always tell the truth""Values truthfulness in communication"
"Keep Go functions under 50 lines""Values concise units of work (~50 lines)"
"When unsure, ask""Values clarification when uncertain"

When NOT to Normalize:

  • Context-bound principles (e.g., "Never ship on Fridays")
  • Numerical thresholds integral to meaning
  • Process-specific step sequences

For these, set normalization_status: "skipped" and use original text.

Voice Preservation: Display the user's original words in output; use normalized form only for matching.

Step 3: Compression Validation

Verify extraction quality:

  • Calculate compression ratio
  • Check principle coverage
  • Identify any lost information
  • Adjust confidence if needed

Confidence Levels

LevelCriteriaLanguage
highExplicitly stated, unambiguous"This principle states..."
mediumImplied, minor inference needed"This appears to suggest..."
lowInferred from patterns"This may imply..."

Output Schema

{
  "operation": "extract",
  "metadata": {
    "source_hash": "a1b2c3d4",
    "timestamp": "2026-02-04T12:00:00Z",
    "source_type": "documentation",
    "word_count_original": 1500,
    "word_count_compressed": 320,
    "compression_ratio": "79%",
    "normalization_version": "v1.0.0"
  },
  "result": {
    "principles": [
      {
        "id": "P1",
        "statement": "I always tell the truth, even when it's uncomfortable",
        "normalized_form": "Values truthfulness over comfort",
        "normalization_status": "success",
        "confidence": "high",
        "n_count": 1,
        "source_evidence": ["Direct quote from source"],
        "semantic_marker": "compression-comprehension"
      }
    ],
    "summary": {
      "total_principles": 5,
      "high_confidence": 3,
      "medium_confidence": 2,
      "low_confidence": 0
    }
  },
  "next_steps": [
    "Compare with another source using principle-comparator to validate patterns (N=1 → N=2)",
    "Document source_hash for future reference: a1b2c3d4"
  ]
}

normalization_status values:

  • "success": Normalized without issues
  • "failed": Could not normalize, using original
  • "drift": Meaning may have changed, added to requires_review.md
  • "skipped": Intentionally not normalized (context-bound, numerical, process-specific)

Terminology Rules

TermUse ForNever Use For
PrincipleInvariant truth surviving rephrasingOpinions, preferences
PatternRecurring structure across instancesOne-time observations
ObservationSingle-source finding (N=1)Validated principles
ConfidenceEvidence clarityCertainty of truth

Error Handling

Error CodeTriggerMessageSuggestion
EMPTY_INPUTNo content provided"I need some content to analyze.""Paste or reference the text you want me to extract principles from."
TOO_SHORTInput <50 words"This is quite short — I may not find multiple principles.""For best results, provide at least 200 words of content."
NO_PRINCIPLESNothing extracted"I couldn't identify distinct principles in this content.""Try content with clearer structure or more conceptual density."

Quality Metrics

Compression Ratio Targets

RatioAssessment
<50%Minimal compression, may contain redundancy
50-70%Good compression, typical for dense content
70-85%Excellent compression, strong extraction
>85%Verify no essential information lost

Principle Quality Indicators

  • Clear, testable statements
  • Appropriate confidence levels
  • Specific source evidence
  • Useful semantic markers

Related Skills

  • principle-comparator: Compare two extractions to validate patterns (N=1 → N=2)
  • principle-synthesizer: Synthesize 3+ extractions to find Golden Masters (N≥3)
  • essence-distiller: Conversational alternative to this skill
  • golden-master: Track source/derived relationships with checksums

Required Disclaimer

This skill extracts PATTERNS from content, not verified truth. All extracted principles:

  • Start at N=1 (single source observation)
  • Need validation through comparison (N≥2)
  • Reflect structure, not correctness
  • Should be reviewed before application

Built by Obviously Not — Tools for thought, not conclusions.