docstrange

Verified·Scanned 2/18/2026

Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.

from clawhub.ai·v191041a·6.5 KB·0 installs
Scanned from 1.0.1 at 191041a · Transparency log ↗
$ vett add clawhub.ai/shhdwi/docstrange

DocStrange by Nanonets

Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.

Get your API key: https://docstrange.nanonets.com/app

Quick Start

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Response:

{
  "success": true,
  "record_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "result": {
    "markdown": {
      "content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
    }
  }
}

Setup

1. Get Your API Key

# Visit the dashboard
https://docstrange.nanonets.com/app

Save your API key:

export DOCSTRANGE_API_KEY="your_api_key_here"

2. OpenClaw Configuration (Optional)

Add to your ~/.openclaw/openclaw.json:

{
  skills: {
    entries: {
      "docstrange": {
        enabled: true,
        apiKey: "your_api_key_here",
        env: {
          DOCSTRANGE_API_KEY: "your_api_key_here",
        },
      },
    },
  },
}

Common Tasks

Extract to Markdown

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

Access content: response["result"]["markdown"]["content"]

Extract JSON Fields

Simple field list:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
  -F "include_metadata=confidence_score"

With JSON schema:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'

Response with confidence scores:

{
  "result": {
    "json": {
      "content": {
        "invoice_number": "INV-2024-001",
        "total_amount": 500.00
      },
      "metadata": {
        "confidence_score": {
          "invoice_number": 98,
          "total_amount": 99
        }
      }
    }
  }
}

Extract Tables to CSV

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@table.pdf" \
  -F "output_format=csv" \
  -F "csv_options=table"

Async Extraction (Large Documents)

For documents >5 pages, use async and poll:

Queue the document:

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
  -F "file=@large-document.pdf" \
  -F "output_format=markdown"

# Returns: {"record_id": "12345", "status": "processing"}

Poll for results:

curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
  -H "Authorization: Bearer $DOCSTRANGE_API_KEY"

# Returns: {"status": "completed", "result": {...}}

Advanced Features

Bounding Boxes

Get element coordinates for layout analysis:

-F "include_metadata=bounding_boxes"

Hierarchy Output

Extract document structure (sections, tables, key-value pairs):

-F "json_options=hierarchy_output"

Financial Documents Mode

Enhanced table and number formatting:

-F "markdown_options=financial-docs"

Custom Instructions

Guide extraction with prompts:

-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"

Multiple Formats

Request multiple formats in one call:

-F "output_format=markdown,json"

When to Use

Use DocStrange For:

  • Invoice and receipt processing
  • Contract text extraction
  • Bank statement parsing
  • Form digitization
  • Image OCR (scanned documents)

Don't Use For:

  • Documents >5 pages with sync (use async)
  • Video/audio transcription
  • Non-document images

Best Practices

Document SizeEndpointNotes
<=5 pages/extract/syncImmediate response
>5 pages/extract/asyncPoll for results

JSON Extraction:

  • Field list: ["field1", "field2"] — quick extractions
  • JSON schema: {"type": "object", ...} — strict typing, nested data

Confidence Scores:

  • Add include_metadata=confidence_score
  • Scores are 0-100 per field
  • Review fields <80 manually

Schema Templates

Invoice

{
  "type": "object",
  "properties": {
    "invoice_number": {"type": "string"},
    "date": {"type": "string"},
    "vendor": {"type": "string"},
    "total": {"type": "number"},
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": {"type": "string"},
          "quantity": {"type": "number"},
          "price": {"type": "number"}
        }
      }
    }
  }
}

Receipt

{
  "type": "object",
  "properties": {
    "merchant": {"type": "string"},
    "date": {"type": "string"},
    "total": {"type": "number"},
    "items": {
      "type": "array",
      "items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
    }
  }
}

Troubleshooting

400 Bad Request:

  • Provide exactly one input: file, file_url, or file_base64
  • Verify API key is valid

Sync Timeout:

  • Use async for documents >5 pages
  • Poll /extract/results/{record_id}

Missing Confidence Scores:

  • Requires json_options (field list or schema)
  • Add include_metadata=confidence_score

References