prompt-assemble

Verified·Scanned 2/17/2026

This skill assembles token-safe LLM prompts with optional memory retrieval and a memory safety valve; implementation is provided as scripts/prompt_assemble.py. No security-relevant behaviors detected.

from clawhub.ai·v6487a1d·23.2 KB·0 installs
Scanned from 1.0.4 at 6487a1d · Transparency log ↗
$ vett add clawhub.ai/alexunitario-sketch/prompt-assemble

Prompt Assemble

Overview

A standardized, token-safe prompt assembly framework that guarantees API stability. Implements Two-Phase Context Construction and Memory Safety Valve to prevent token overflow while maximizing relevant context.

Design Goals:

  • ✅ Never fail due to memory-related token overflow
  • ✅ Memory is always discardable enhancement, never rigid dependency
  • ✅ Token budget decisions centralized at prompt assemble layer

When to Use

Use this skill when:

  1. Building or modifying any agent that constructs prompts
  2. Implementing memory retrieval systems
  3. Adding new prompt-related logic to existing agents
  4. Any scenario where token budget safety is required

Core Workflow

User Input
    ↓
Need-Memory Decision
    ↓
Minimal Context Build
    ↓
Memory Retrieval (Optional)
    ↓
Memory Summarization
    ↓
Token Estimation
    ↓
Safety Valve Decision
    ↓
Final Prompt → LLM Call

Phase Details

Phase 0: Base Configuration

# Model Context Windows (2026-02-04)
# - MiniMax-M2.1: 204,000 tokens (default)
# - Claude 3.5 Sonnet: 200,000 tokens
# - GPT-4o: 128,000 tokens

MAX_TOKENS = 204000  # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS  # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3                     # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines        # Max 3 lines per memory

Design Philosophy:

  • Leave 25% buffer for safety (model overhead, estimation errors, spikes)
  • Better to underutilize capacity than to overflow

Phase 1: Minimal Context

  • System prompt
  • Recent N messages (N=3, trimmed)
  • Current user input
  • No memory by default

Phase 2: Memory Need Decision

def need_memory(user_input):
    triggers = [
        "previously",
        "earlier we discussed",
        "do you remember",
        "as I mentioned before",
        "continuing from",
        "before we",
        "last time",
        "previously mentioned"
    ]
    for trigger in triggers:
        if trigger.lower() in user_input.lower():
            return True
    return False

Phase 3: Memory Retrieval (Optional)

memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
    summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))

Phase 4: Token Estimation

Calculate estimated tokens for base_context + summarized_memories.

Phase 5: Safety Valve (Critical)

if estimated_tokens > SAFETY_MARGIN:
    base_context.append("[System Notice] Relevant memory skipped due to token budget.")
    return assemble(base_context)

Hard Rules:

  • ❌ Never downgrade system prompt
  • ❌ Never truncate user input
  • ❌ No "lucky splicing"
  • ✅ Only memory layer is expendable

Phase 6: Final Assembly

final_prompt = assemble(base_context + summarized_memories)
return final_prompt

Memory Data Standards

Allowed in Long-Term Memory

  • ✅ User preferences / identity / long-term goals
  • ✅ Confirmed important conclusions
  • ✅ System-level settings and rules

Forbidden in Long-Term Memory

  • ❌ Raw conversation logs
  • ❌ Reasoning traces
  • ❌ Temporary discussions
  • ❌ Information recoverable from chat history

Quick Start

Copy scripts/prompt_assemble.py to your agent and use:

from prompt_assemble import build_prompt

# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)

Resources

scripts/

  • prompt_assemble.py - Complete implementation with all phases (PromptAssembler class)

references/

  • memory_standards.md - Detailed memory content guidelines
  • token_estimation.md - Token counting strategies