moss-docs
Documentation and capabilities reference for Moss semantic search. Use for understanding Moss APIs, SDKs, and integration patterns.
Moss Agent Skills
Capabilities
Moss is the real-time semantic search runtime for conversational AI. It delivers sub-10ms lookups and instant index updates that run in the browser, on-device, or in the cloud - wherever your agent lives. Agents can create indexes, embed documents, perform semantic/hybrid searches, and manage document lifecycles without managing infrastructure. The platform handles embedding generation, index persistence, and optional cloud sync - allowing agents to focus on retrieval logic rather than infrastructure.
Skills
Index Management
- Create Index: Build a new semantic index with documents and embedding model selection
- Load Index: Load an existing index from persistent storage for querying
- Get Index: Retrieve metadata about a specific index (document count, model, etc.)
- List Indexes: Enumerate all indexes under a project
- Delete Index: Remove an index and all associated data
Document Operations
- Add Documents: Insert or upsert documents into an existing index with optional metadata
- Get Documents: Retrieve stored documents by ID or fetch all documents
- Delete Documents: Remove specific documents from an index by their IDs
Search & Retrieval
- Semantic Search: Query using natural language with vector similarity matching
- Keyword Search: Use BM25-based keyword matching for exact term lookups
- Hybrid Search: Blend semantic and keyword search with configurable alpha weighting
- Metadata Filtering: Constrain results by document metadata (category, language, tags)
- Top-K Results: Return configurable number of best-matching documents with scores
Embedding Models
- moss-minilm: Fast, lightweight model optimized for edge/offline use (default)
- moss-mediumlm: Higher accuracy model with reasonable performance for precision-critical use cases
SDK Methods
| JavaScript | Python | Description |
|---|---|---|
createIndex() | create_index() | Create index with documents |
loadIndex() | load_index() | Load index from storage |
getIndex() | get_index() | Get index metadata |
listIndexes() | list_indexes() | List all indexes |
deleteIndex() | delete_index() | Delete an index |
addDocs() | add_docs() | Add/upsert documents |
getDocs() | get_docs() | Retrieve documents |
deleteDocs() | delete_docs() | Remove documents |
query() | query() | Semantic search |
API Actions
All REST API operations go through POST /manage with an action field:
createIndex- Create index with seed documentsgetIndex- Get metadata for single indexlistIndexes- List all project indexesdeleteIndex- Remove index and assetsaddDocs- Upsert documents into indexgetDocs- Retrieve stored documentsdeleteDocs- Remove documents by ID
Workflows
Basic Semantic Search Workflow
- Initialize MossClient with project credentials
- Call
createIndex()with documents and model (moss-minilmormoss-mediumlm) - Call
loadIndex()to prepare index for queries - Call
query()with search text and top_k parameter - Process returned documents with scores
Hybrid Search Workflow
- Create and load index as above
- Call
query()with alpha parameter to blend semantic and keyword alpha: 1.0= pure semantic,alpha: 0.0= pure keyword,alpha: 0.6= 60/40 blend- Default is semantic-heavy (~0.8) for conversational use cases
Document Update Workflow
- Initialize client and ensure index exists
- Call
addDocs()with new documents andupsert: trueoption - Existing documents with matching IDs are updated; new IDs are inserted
- Call
deleteDocs()to remove outdated documents by ID
Voice Agent Context Injection Workflow
- Initialize MossClient and load index at agent startup
- On each user message, automatically query Moss for relevant context
- Inject search results into LLM context before generating response
- Respond with knowledge-grounded answer (no tool-calling latency)
Offline-First Search Workflow
- Create index with documents using local embedding model
- Load index from local storage
- Query runs entirely on-device with sub-10ms latency
- Optionally sync to cloud for backup and sharing
Integration
Voice Agent Frameworks
- LiveKit: Context injection into voice agent pipeline with
inferedge-mossSDK - Pipecat: Pipeline processor via
pipecat-mosspackage that auto-injects retrieval results
Context
Authentication
SDK requires project credentials:
MOSS_PROJECT_ID: Project identifier from Moss PortalMOSS_PROJECT_KEY: Project access key from Moss Portal
export MOSS_PROJECT_ID=your_project_id
export MOSS_PROJECT_KEY=your_project_key
REST API requires headers:
x-project-key: Project access keyx-service-version: v1: API version headerprojectIdin JSON body
Package Installation
| Language | Package | Install Command |
|---|---|---|
| JavaScript/TypeScript | @inferedge/moss | npm install @inferedge/moss |
| Python | inferedge-moss | pip install inferedge-moss |
| Pipecat Integration | pipecat-moss | pip install pipecat-moss |
Document Schema
interface DocumentInfo {
id: string; // Required: unique identifier
text: string; // Required: content to embed and search
metadata?: object; // Optional: key-value pairs for filtering
}
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
indexName | string | - | Target index name (required) |
query | string | - | Natural language search text (required) |
top_k / topK | number | 5 | Max results to return |
alpha | float | ~0.8 | Hybrid weighting: 0.0=keyword, 1.0=semantic |
filters | object | - | Metadata constraints |
Model Selection
| Model | Use Case | Tradeoff |
|---|---|---|
moss-minilm | Edge, offline, browser, speed-first | Fast, lightweight |
moss-mediumlm | Precision-critical, higher accuracy | Slightly slower |
Performance Expectations
- Sub-10ms local queries (hardware-dependent)
- Instant index updates without reindexing entire corpus
- Sync is optional; compute stays on-device
- No infrastructure to manage
Chunking Best Practices
- Aim for ~200–500 tokens per chunk
- Overlap 10–20% to preserve context
- Normalize whitespace and strip boilerplate
Common Errors
| Error | Cause | Fix |
|---|---|---|
| Unauthorized | Missing credentials | Set MOSS_PROJECT_ID and MOSS_PROJECT_KEY |
| Index not found | Query before create | Call createIndex() first |
| Index not loaded | Query before load | Call loadIndex() before query() |
| Missing embeddings runtime | Invalid model | Use moss-minilm or moss-mediumlm |
Async Pattern
All SDK methods are async - always use await:
// JavaScript
await client.createIndex("faqs", docs, "moss-minilm");
await client.loadIndex("faqs");
const results = await client.query("faqs", "search text", 5);
# Python
await client.create_index("faqs", docs, "moss-minilm")
await client.load_index("faqs")
results = await client.query("faqs", "search text", top_k=5)
For additional documentation and navigation, see: https://docs.usemoss.dev/llms.txt