docstrange
Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
DocStrange by Nanonets
Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key: https://docstrange.nanonets.com/app
Quick Start
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
Setup
1. Get Your API Key
# Visit the dashboard
https://docstrange.nanonets.com/app
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
2. OpenClaw Configuration (Optional)
Add to your ~/.openclaw/openclaw.json:
{
skills: {
entries: {
"docstrange": {
enabled: true,
apiKey: "your_api_key_here",
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Common Tasks
Extract to Markdown
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@document.pdf" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Extract JSON Fields
Simple field list:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@invoice.pdf" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
Extract Tables to CSV
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@table.pdf" \
-F "output_format=csv" \
-F "csv_options=table"
Async Extraction (Large Documents)
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "file=@large-document.pdf" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Advanced Features
Bounding Boxes
Get element coordinates for layout analysis:
-F "include_metadata=bounding_boxes"
Hierarchy Output
Extract document structure (sections, tables, key-value pairs):
-F "json_options=hierarchy_output"
Financial Documents Mode
Enhanced table and number formatting:
-F "markdown_options=financial-docs"
Custom Instructions
Guide extraction with prompts:
-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Multiple Formats
Request multiple formats in one call:
-F "output_format=markdown,json"
When to Use
Use DocStrange For:
- Invoice and receipt processing
- Contract text extraction
- Bank statement parsing
- Form digitization
- Image OCR (scanned documents)
Don't Use For:
- Documents >5 pages with sync (use async)
- Video/audio transcription
- Non-document images
Best Practices
| Document Size | Endpoint | Notes |
|---|---|---|
| <=5 pages | /extract/sync | Immediate response |
| >5 pages | /extract/async | Poll for results |
JSON Extraction:
- Field list:
["field1", "field2"]— quick extractions - JSON schema:
{"type": "object", ...}— strict typing, nested data
Confidence Scores:
- Add
include_metadata=confidence_score - Scores are 0-100 per field
- Review fields <80 manually
Schema Templates
Invoice
{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
Receipt
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Troubleshooting
400 Bad Request:
- Provide exactly one input:
file,file_url, orfile_base64 - Verify API key is valid
Sync Timeout:
- Use async for documents >5 pages
- Poll
/extract/results/{record_id}
Missing Confidence Scores:
- Requires
json_options(field list or schema) - Add
include_metadata=confidence_score
References
- API Docs: https://docstrange.nanonets.com/docs
- Get API Key: https://docstrange.nanonets.com/app