# Extractly API Documentation for LLMs

## Overview

Extractly is a document extraction API that uses AI to extract structured data from PDFs and other documents based on customizable templates. The API supports template-based extraction where you define the fields you want to extract, and the AI processes documents to return structured JSON data.

**Base URL**: https://www.extractly.dev

**API Version**: V2 (current)

---

## Authentication

All API endpoints require authentication using an API key in the `Authorization` header:

```
Authorization: Bearer ext_your_api_key_here
```

### Getting an API Key

1. Sign up at https://www.extractly.dev
2. Navigate to the dashboard
3. Go to API Keys section
4. Create a new API key
5. Copy the key (it starts with `ext_`)

---

## Core Concepts

### Templates

Templates define the structure of data you want to extract from documents. Each template contains:

- **name**: Descriptive name for the template
- **description**: Optional description


### Field Definitions

Each field in a template has:

- **name**: Field identifier (e.g., `invoice_number`, `person_name`)
- **type**: Data type - `string`, `number`, `date`, `boolean`, or `array`
- **description**: Optional description of what this field contains
- **required**: Boolean indicating if this field must be present
- **aiInstructions**: Optional specific instructions for AI extraction

### Jobs

Extraction is asynchronous. When you submit a document:

1. You receive a `jobId` immediately
2. Processing happens in the background (typically 30-60 seconds)
3. You poll the status endpoint to check progress
4. When complete, the result contains extracted data

### Credits

- Template extractions cost **10 credits** per document
- Check your credit balance before extracting
- Insufficient credits will result in a 402 error

---

## API Endpoints

### 1. Get Template Details

**GET** `/api/v2/templates/{templateId}`

Retrieve details of a specific template including its field definitions.

**Headers:**
```
Authorization: Bearer ext_your_api_key_here
```

**cURL Example:**
```bash
curl -X GET \
  https://www.extractly.dev/api/v2/templates/template_123 \
  -H "Authorization: Bearer ext_your_api_key_here"
```

**Response:**
```json
{
  "template": {
    "id": "template_123",
    "name": "Invoice Extractor",
    "description": "Extract key information from invoices",
    "extractionMethod": "GOOGLE_DOCUMENT_AI",
    "isDraft": false,
    "customerId": "customer_abc",
    "fields": [
      {
        "name": "invoice_number",
        "type": "string",
        "description": "Invoice number",
        "required": true
      },
      {
        "name": "total_amount",
        "type": "number",
        "description": "Total amount",
        "required": true
      },
      {
        "name": "invoice_date",
        "type": "date",
        "description": "Invoice date",
        "required": false
      }
    ],
    "createdAt": "2025-10-01T12:00:00.000Z",
    "updatedAt": "2025-10-01T12:00:00.000Z"
  },
  "version": "v2"
}
```

**Rate Limit:** 100 requests per minute

**Important:**
- Use this endpoint to inspect template structure before extraction
- The `fields` array defines what data will be extracted from documents
- Template management (create/update/delete) is done via the dashboard UI at https://www.extractly.dev/dashboard/templates

---

### 2. Extract Data from Document

**POST** `/api/v2/templates/{templateId}/extract`

Submit a document for extraction using a specific template.

**Headers:**
```
Authorization: Bearer ext_your_api_key_here
Content-Type: multipart/form-data
```

**Form Data:**
- `file`: The PDF file to process (max 10MB)

**cURL Example:**
```bash
curl -X POST \
  https://www.extractly.dev/api/v2/templates/template_123/extract \
  -H "Authorization: Bearer ext_your_api_key_here" \
  -F "file=@invoice.pdf"
```

**Node.js Example:**
```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('file', fs.createReadStream('invoice.pdf'));

const response = await axios.post(
  'https://www.extractly.dev/api/v2/templates/template_123/extract',
  form,
  {
    headers: {
      'Authorization': 'Bearer ext_your_api_key_here',
      ...form.getHeaders()
    }
  }
);

console.log(response.data);
```

**Python Example:**
```python
import requests

url = "https://www.extractly.dev/api/v2/templates/template_123/extract"
headers = {"Authorization": "Bearer ext_your_api_key_here"}

with open("invoice.pdf", "rb") as file:
    files = {"file": file}
    response = requests.post(url, headers=headers, files=files)

print(response.json())
```

**Alternative: JSON with Base64**

You can also send the file as base64-encoded JSON:

**Headers:**
```
Authorization: Bearer ext_your_api_key_here
Content-Type: application/json
```

**Request Body:**
```json
{
  "file": "base64_encoded_file_content",
  "fileName": "document.pdf",
  "mimeType": "application/pdf"
}
```

**Response (202 Accepted):**
```json
{
  "message": "Template extraction is processing...",
  "jobId": "job_abc123",
  "fileName": "invoice.pdf",
  "templateId": "template_123",
  "templateName": "Invoice Extractor",
  "customerId": "customer_abc",
  "version": "v2"
}
```

**Important:**
- Save the `jobId` to check extraction status
- Processing typically takes 30-60 seconds
- **Supported file formats depend on extraction method**:
  - **Document AI**: PDF only
  - **Raw Text Extraction**: PDF, Office files (Word, Excel, PowerPoint), images (PNG, JPG), HTML, and more
- Maximum file size: 10MB
- Costs 10 credits per extraction

**Error Responses:**

**400 Bad Request - No file:**
```json
{
  "error": "No file provided"
}
```

**400 Bad Request - Invalid file type:**
```json
{
  "error": "Only PDF files are allowed"
}
```

**400 Bad Request - File too large:**
```json
{
  "error": "File size too large. Maximum size is 10MB."
}
```

**402 Payment Required - Insufficient credits:**
```json
{
  "error": "Insufficient credits",
  "creditsRequired": 10,
  "creditsAvailable": 5
}
```

**404 Not Found - Template not found:**
```json
{
  "error": "Template not found"
}
```

**429 Too Many Requests - Rate limit exceeded:**
```json
{
  "error": "Rate limit exceeded"
}
```

**Rate Limit:** 50 requests per minute

---

### 3. Check Extraction Status

**GET** `/api/v2/status/{jobId}`

Check the processing status of an extraction job.

**Headers:**
```
Authorization: Bearer ext_your_api_key_here
```

**cURL Example:**
```bash
curl -X GET \
  https://www.extractly.dev/api/v2/status/job_abc123 \
  -H "Authorization: Bearer ext_your_api_key_here"
```

**Response - Processing:**
```json
{
  "jobId": "job_abc123",
  "status": "processing",
  "message": "Job is currently being processed.",
  "version": "v2",
  "createdAt": "2025-10-03T14:06:55.230Z",
  "updatedAt": "2025-10-03T14:07:30.123Z",
  "document": null,
  "template": {
    "id": "template_123",
    "name": "Invoice Extractor"
  }
}
```

**Response - Completed:**
```json
{
  "jobId": "job_abc123",
  "status": "completed",
  "message": "Job has completed successfully.",
  "version": "v2",
  "createdAt": "2025-10-03T14:06:55.230Z",
  "updatedAt": "2025-10-03T14:09:19.844Z",
  "document": {
    "id": "doc_xyz789",
    "fileName": "invoice.pdf"
  },
  "template": {
    "id": "template_123",
    "name": "Invoice Extractor"
  },
  "result": {
    "invoice_number": "INV-2025-001",
    "total_amount": "1250.50",
    "invoice_date": "2025-10-01"
  },
  "completedAt": "2025-10-03T14:09:19.843Z"
}
```

**Response - Failed:**
```json
{
  "jobId": "job_abc123",
  "status": "failed",
  "message": "Extraction failed: Invalid PDF format",
  "version": "v2",
  "createdAt": "2025-10-03T14:06:55.230Z",
  "updatedAt": "2025-10-03T14:07:00.123Z",
  "document": null,
  "template": {
    "id": "template_123",
    "name": "Invoice Extractor"
  },
  "error": "Invalid PDF format"
}
```

**Job Status Values:**
- `pending`: Job is queued for processing
- `processing`: Job is currently being processed
- `completed`: Job completed successfully (includes `result` field)
- `failed`: Job failed (includes `error` field)
- `cancelled`: Job was cancelled

**Important:**
- The `result` field structure depends on the template's field definitions
- Poll this endpoint every 2-5 seconds until status is `completed` or `failed`
- Results are stored and can be retrieved later using the same `jobId`

**Rate Limit:** No explicit rate limit (reasonable polling recommended)

---

## Field Types Reference

When defining template fields, use these types:

| Type | Description | Example Values |
|------|-------------|----------------|
| `string` | Text data | "John Doe", "Invoice #12345" |
| `number` | Numeric data | 42, 3.14, 1000.50 |
| `date` | Date strings | "2025-10-03", "January 1, 2025" |
| `boolean` | True/false values | true, false |
| `array` | List of strings | ["item1", "item2", "item3"] |

---

## Error Handling

### HTTP Status Codes

- `200 OK`: Request successful
- `202 Accepted`: Extraction job accepted and processing
- `400 Bad Request`: Invalid request (missing file, invalid format)
- `401 Unauthorized`: Invalid or missing API key
- `402 Payment Required`: Insufficient credits
- `404 Not Found`: Template or job not found
- `409 Conflict`: Template name already exists or template in use
- `429 Too Many Requests`: Rate limit exceeded
- `500 Internal Server Error`: Server error

### Common Error Messages

```json
{
  "error": "Template not found"
}
```

```json
{
  "error": "Insufficient credits",
  "creditsRequired": 10,
  "creditsAvailable": 5
}
```

```json
{
  "error": "Rate limit exceeded"
}
```

---

## Rate Limits

| Endpoint | Limit |
|----------|-------|
| Get Template (GET /api/v2/templates/{id}) | 100 requests/minute |
| Extract Document (POST /api/v2/templates/{id}/extract) | 50 requests/minute |
| Check Status (GET /api/v2/status/{jobId}) | No explicit limit (reasonable polling) |

Rate limit information is included in response headers:
- `X-RateLimit-Limit`: Maximum requests allowed
- `X-RateLimit-Remaining`: Remaining requests in window
- `X-RateLimit-Reset`: Unix timestamp when limit resets

---

## Best Practices

### 1. Template Design

- **Be specific with field names**: Use descriptive names like `invoice_number` instead of `number`
- **Provide AI instructions**: Help the AI understand context with aiInstructions
- **Mark required fields**: Set `required: true` for critical fields
- **Use appropriate types**: Choose the correct field type for better validation

### 2. Error Handling

- Always check credit balance before bulk operations
- Implement retry logic for 429 (rate limit) errors
- Handle 402 (insufficient credits) by prompting for credit purchase
- Parse error messages for user-friendly feedback

### 3. Polling

- Poll status endpoint every 2-5 seconds
- Implement exponential backoff for long-running jobs
- Set a reasonable timeout (e.g., 2 minutes)
- Cache completed results to avoid re-polling

### 4. File Preparation

- Ensure PDFs are not password-protected
- Keep file sizes under 10MB
- Use high-quality scans (300 DPI recommended)
- Avoid heavily redacted or corrupted files

### 5. Security

- Store API keys securely (environment variables, secrets manager)
- Never expose API keys in client-side code
- Use HTTPS for all requests
- Rotate API keys periodically

---

## Example Workflows

### Complete Extraction Workflow

```javascript
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const API_KEY = 'ext_your_api_key_here';
const BASE_URL = 'https://www.extractly.dev';
const templateId = 'template_123';

async function extractDocument(filePath) {
  // 1. Get template details (optional)
  const templateResponse = await axios.get(
    \`${BASE_URL}/api/v2/templates/${templateId}\`,
    { headers: { Authorization: \`Bearer ${API_KEY}\` } }
  );
  console.log('Fields to extract:', templateResponse.data.template.fields);

  // 2. Submit document for extraction
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));

  const extractResponse = await axios.post(
    \`${BASE_URL}/api/v2/templates/${templateId}/extract\`,
    form,
    {
      headers: {
        Authorization: \`Bearer ${API_KEY}\`,
        ...form.getHeaders()
      }
    }
  );

  const jobId = extractResponse.data.jobId;
  console.log('Job submitted:', jobId);

  // 3. Poll for results
  while (true) {
    const statusResponse = await axios.get(
      \`${BASE_URL}/api/v2/status/${jobId}\`,
      { headers: { Authorization: \`Bearer ${API_KEY}\` } }
    );

    if (statusResponse.data.status === 'completed') {
      return statusResponse.data.result;
    } else if (statusResponse.data.status === 'failed') {
      throw new Error(statusResponse.data.error);
    }

    console.log('Status:', statusResponse.data.status);
    await new Promise(resolve => setTimeout(resolve, 3000));
  }
}

// Use the function
const result = await extractDocument('invoice.pdf');
console.log('Extracted data:', result);
// Output: { invoice_number: 'INV-2025-001', total_amount: '1250.50', invoice_date: '2025-10-01' }
```

### Batch Processing

```javascript
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const API_KEY = 'ext_your_api_key_here';
const BASE_URL = 'https://www.extractly.dev';
const templateId = 'template_123';

async function submitDocument(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));

  const response = await axios.post(
    \`${BASE_URL}/api/v2/templates/${templateId}/extract\`,
    form,
    {
      headers: {
        Authorization: \`Bearer ${API_KEY}\`,
        ...form.getHeaders()
      }
    }
  );

  return response.data.jobId;
}

async function waitForJob(jobId) {
  while (true) {
    const response = await axios.get(
      \`${BASE_URL}/api/v2/status/${jobId}\`,
      { headers: { Authorization: \`Bearer ${API_KEY}\` } }
    );

    if (response.data.status === 'completed') {
      return response.data.result;
    } else if (response.data.status === 'failed') {
      throw new Error(response.data.error);
    }

    await new Promise(resolve => setTimeout(resolve, 3000));
  }
}

// Process multiple documents
const files = ['invoice1.pdf', 'invoice2.pdf', 'invoice3.pdf'];

// Submit all documents
const jobIds = await Promise.all(files.map(file => submitDocument(file)));
console.log('Submitted jobs:', jobIds);

// Wait for all results
const results = await Promise.all(jobIds.map(jobId => waitForJob(jobId)));
console.log('All results:', results);
```

---

## Support

For API support and questions:
- Documentation: https://www.extractly.dev/dashboard/
- Email: hamed+extractly@finna.ai

---

## Changelog

### V2 (Current)
- Asynchronous job-based extraction
- Improved error handling