docprocess

Server path: /docprocess | Type: Embedded | PCID required: No

Tools

Tool	Description
`docprocess_csv_to_xlsx`	Convert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL.
`docprocess_html_to_pdf`	Convert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs.
`docprocess_md_to_docx`	Convert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL.
`docprocess_md_to_pdf`	Convert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL.
`docprocess_xlsx_to_csv`	Convert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet.
`docprocess_docx_to_txt`	Extract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL.
`docprocess_pdf_to_txt`	Extract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL.
`docprocess_pptx_to_txt`	Extract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL.
`docprocess_ocr`	Extract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status.
`docprocess_ocr_poll`	Check the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”.
`docprocess_fill_pdf`	Fill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled.
`docprocess_create_word`	Create a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document.
`docprocess_word_ai`	Process Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status.
`docprocess_word_ai_poll`	Check the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”.
`docprocess_fill_word_tpl`	Fill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document.
`docprocess_validate_csv`	Validate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings.
`docprocess_xml_to_json`	Convert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead.
`docprocess_invoice_extract`	Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts).
`docprocess_invoice_extract_poll`	Check the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files.

docprocess_csv_to_xlsx

Convert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to CSV files to convert to Excel
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to CSV files to convert to Excel"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_html_to_pdf

Convert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to HTML files to convert (e.g., [“https://example.com/document.html”]). Must be direct links to .html files.
`file_links_expire_in_days`	number	No	`7`	Number of days before the PDF download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to HTML files to convert (e.g., [\"https://example.com/document.html\"]). Must be direct links to .html files."
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the PDF download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_md_to_docx

Convert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to Markdown files (.MD) to convert to Word
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to Markdown files (.MD) to convert to Word"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_md_to_pdf

Convert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to Markdown files (.MD) to convert to PDF
`pdf_format`	string	No	`"a4"`	Page format: “a4” (default) or “letter”
`pdf_orientation`	string	No	`"portrait"`	Page orientation: “portrait” (default) or “landscape”
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to Markdown files (.MD) to convert to PDF"
    },
    "pdf_format": {
      "type": "string",
      "enum": [
        "a4",
        "letter"
      ],
      "default": "a4",
      "description": "Page format: \"a4\" (default) or \"letter\""
    },
    "pdf_orientation": {
      "type": "string",
      "enum": [
        "portrait",
        "landscape"
      ],
      "default": "portrait",
      "description": "Page orientation: \"portrait\" (default) or \"landscape\""
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_xlsx_to_csv

Convert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to Excel files (.XLSX) to convert to CSV
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to Excel files (.XLSX) to convert to CSV"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_docx_to_txt

Extract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to Word documents (.DOCX) to extract text from
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to Word documents (.DOCX) to extract text from"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_pdf_to_txt

Extract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to PDF files to extract text from
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to PDF files to extract text from"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_pptx_to_txt

Extract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to PowerPoint files (.PPTX) to extract text from
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to PowerPoint files (.PPTX) to extract text from"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_ocr

Extract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status. Parameters:

Parameter	Type	Required	Default	Description
`fileUrls`	string[]	Yes	—	Required: Array of URLs to image files (PNG, JPEG, GIF, WebP, BMP, TIFF, SVG) or PDF documents to process
`languageHints`	string[]	No	—	Optional: Array of language codes for better OCR accuracy (e.g., [“en”, “es”]). Defaults to [“en”]
`extractTextOnly`	boolean	No	`true`	When true (default), returns only extracted text. When false, includes detailed OCR data like bounding boxes and word positions for layout analysis.
`collectionId`	string	No	—	Optional: Filestorage collection ID to store OCR results. If not provided, uses default collection.
`async`	boolean	No	`false`	When true, processes files asynchronously in background. Automatically enabled for large files. Returns responseId for polling.

Show inputSchema

{
  "type": "object",
  "properties": {
    "fileUrls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to image files (PNG, JPEG, GIF, WebP, BMP, TIFF, SVG) or PDF documents to process"
    },
    "languageHints": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Optional: Array of language codes for better OCR accuracy (e.g., [\"en\", \"es\"]). Defaults to [\"en\"]"
    },
    "extractTextOnly": {
      "type": "boolean",
      "default": true,
      "description": "When true (default), returns only extracted text. When false, includes detailed OCR data like bounding boxes and word positions for layout analysis."
    },
    "collectionId": {
      "type": "string",
      "description": "Optional: Filestorage collection ID to store OCR results. If not provided, uses default collection."
    },
    "async": {
      "type": "boolean",
      "default": false,
      "description": "When true, processes files asynchronously in background. Automatically enabled for large files. Returns responseId for polling."
    }
  },
  "required": [
    "fileUrls"
  ]
}

docprocess_ocr_poll

Check the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:

Parameter	Type	Required	Default	Description
`responseId`	string	Yes	—	Required: The responseId returned by docprocess_ocr when async=true

Show inputSchema

{
  "type": "object",
  "properties": {
    "responseId": {
      "type": "string",
      "description": "Required: The responseId returned by docprocess_ocr when async=true"
    }
  },
  "required": [
    "responseId"
  ]
}

docprocess_fill_pdf

Fill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled. Parameters:

Parameter	Type	Required	Default	Description
`pdf_url`	string	Yes	—	Required: URL to the PDF form to fill. The PDF must have fillable form fields (not just a static document).
`form_data`	any	Yes	—	Required: JSON object with field names and values. For text fields: {“fieldName”: “value”}. For checkboxes: {“checkboxName”: true} or {“checkboxName”: “checked”}. For dropdowns: {“dropdownName”: “optionText”}. Field matching is case-insensitive and supports partial matches.
`output_filename`	string	No	`"filled_form.pdf"`	Optional: Name for the output file (default: “filled_form.pdf”)
`file_links_expire_in_days`	number	No	`7`	Number of days before the download link expires (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "pdf_url": {
      "type": "string",
      "description": "Required: URL to the PDF form to fill. The PDF must have fillable form fields (not just a static document)."
    },
    "form_data": {
      "type": "effects",
      "description": "Required: JSON object with field names and values. For text fields: {\"fieldName\": \"value\"}. For checkboxes: {\"checkboxName\": true} or {\"checkboxName\": \"checked\"}. For dropdowns: {\"dropdownName\": \"optionText\"}. Field matching is case-insensitive and supports partial matches."
    },
    "output_filename": {
      "type": "string",
      "default": "filled_form.pdf",
      "description": "Optional: Name for the output file (default: \"filled_form.pdf\")"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download link expires (default: 7)"
    }
  },
  "required": [
    "pdf_url",
    "form_data"
  ]
}

docprocess_create_word

Create a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document. Parameters:

Parameter	Type	Required	Default	Description
`document_spec`	any	Yes	—	Document specification JSON. Structure: {“sections”: [{“children”: [elements]}]}. Element types: “heading” (requires level 1-6, text), “paragraph” (text or children array for formatting), “bullet” (text, optional level), “table” (rows array). Example: {“sections”: [{“children”: [{“type”: “heading”, “level”: 1, “text”: “Report”, “alignment”: “center”}, {“type”: “paragraph”, “children”: [{“text”: “Bold ”, “bold”: true}, {“text”: “normal text”}]}, {“type”: “bullet”, “text”: “Item 1”}, {“type”: “table”, “rows”: [[“Header 1”, “Header 2”], [“Value 1”, “Value 2”]]}]}]}
`output_filename`	string	No	`"created_document.docx"`	Optional: Name for the output file (default: “created_document.docx”)
`file_links_expire_in_days`	number	No	`7`	Number of days before the download link expires (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "document_spec": {
      "type": "effects",
      "description": "Document specification JSON. Structure: {\"sections\": [{\"children\": [elements]}]}. Element types: \"heading\" (requires level 1-6, text), \"paragraph\" (text or children array for formatting), \"bullet\" (text, optional level), \"table\" (rows array). Example: {\"sections\": [{\"children\": [{\"type\": \"heading\", \"level\": 1, \"text\": \"Report\", \"alignment\": \"center\"}, {\"type\": \"paragraph\", \"children\": [{\"text\": \"Bold \", \"bold\": true}, {\"text\": \"normal text\"}]}, {\"type\": \"bullet\", \"text\": \"Item 1\"}, {\"type\": \"table\", \"rows\": [[\"Header 1\", \"Header 2\"], [\"Value 1\", \"Value 2\"]]}]}]}"
    },
    "output_filename": {
      "type": "string",
      "default": "created_document.docx",
      "description": "Optional: Name for the output file (default: \"created_document.docx\")"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download link expires (default: 7)"
    }
  },
  "required": [
    "document_spec"
  ]
}

docprocess_word_ai

Process Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status. Parameters:

Parameter	Type	Required	Default	Description
`documentUrl`	string	Yes	—	Required: URL to the Word document (.DOCX) to process
`task`	string	Yes	—	Required: Natural language description of what to do (e.g., “translate to Spanish”, “fix grammar errors”, “rewrite in formal tone”, “summarize to 2 paragraphs”)
`model`	string	No	—	Optional: LLM model to use. Options: claude-sonnet-4-5-20250929 (default), gpt-4.1, gpt-4o, gemini-2.5-flash
`strategy`	string	No	—	Optional: SPARSE_CHANGES for minor edits (grammar, spelling), DENSE_CHANGES for major changes (translation, rewriting). Auto-detected if omitted.

Show inputSchema

{
  "type": "object",
  "properties": {
    "documentUrl": {
      "type": "string",
      "description": "Required: URL to the Word document (.DOCX) to process"
    },
    "task": {
      "type": "string",
      "description": "Required: Natural language description of what to do (e.g., \"translate to Spanish\", \"fix grammar errors\", \"rewrite in formal tone\", \"summarize to 2 paragraphs\")"
    },
    "model": {
      "type": "string",
      "description": "Optional: LLM model to use. Options: claude-sonnet-4-5-20250929 (default), gpt-4.1, gpt-4o, gemini-2.5-flash"
    },
    "strategy": {
      "type": "string",
      "enum": [
        "SPARSE_CHANGES",
        "DENSE_CHANGES"
      ],
      "description": "Optional: SPARSE_CHANGES for minor edits (grammar, spelling), DENSE_CHANGES for major changes (translation, rewriting). Auto-detected if omitted."
    }
  },
  "required": [
    "documentUrl",
    "task"
  ]
}

docprocess_word_ai_poll

Check the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:

Parameter	Type	Required	Default	Description
`responseId`	string	Yes	—	Required: The responseId returned by docprocess_word_ai

Show inputSchema

{
  "type": "object",
  "properties": {
    "responseId": {
      "type": "string",
      "description": "Required: The responseId returned by docprocess_word_ai"
    }
  },
  "required": [
    "responseId"
  ]
}

docprocess_fill_word_tpl

Fill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document. Parameters:

Parameter	Type	Required	Default	Description
`template_url`	string	Yes	—	Required: URL to the Word document template (.docx). Template should contain placeholders like {name}, {user.email}, {#items}{name}{/items} for loops (must close with {/items}), {#condition}text{/condition} for conditionals. Templates must be .docx format (not older .doc).
`data`	any	Yes	—	Required: JSON object with data to fill placeholders. Example: {“name”: “John”, “items”: [{“name”: “Widget”, “price”: 10}], “isPremium”: true}. Keys must match placeholder names exactly (case-sensitive). For loops, provide arrays: {“items”: [{“name”: “A”}, {“name”: “B”}]}. For conditionals, provide booleans: {“isPremium”: true}.
`output_filename`	string	No	`"filled_template.docx"`	Optional: Name for the output file (default: “filled_template.docx”)
`file_links_expire_in_days`	number	No	`7`	Number of days before the download link expires (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "template_url": {
      "type": "string",
      "description": "Required: URL to the Word document template (.docx). Template should contain placeholders like {name}, {user.email}, {#items}{name}{/items} for loops (must close with {/items}), {#condition}text{/condition} for conditionals. Templates must be .docx format (not older .doc)."
    },
    "data": {
      "type": "effects",
      "description": "Required: JSON object with data to fill placeholders. Example: {\"name\": \"John\", \"items\": [{\"name\": \"Widget\", \"price\": 10}], \"isPremium\": true}. Keys must match placeholder names exactly (case-sensitive). For loops, provide arrays: {\"items\": [{\"name\": \"A\"}, {\"name\": \"B\"}]}. For conditionals, provide booleans: {\"isPremium\": true}."
    },
    "output_filename": {
      "type": "string",
      "default": "filled_template.docx",
      "description": "Optional: Name for the output file (default: \"filled_template.docx\")"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download link expires (default: 7)"
    }
  },
  "required": [
    "template_url",
    "data"
  ]
}

docprocess_validate_csv

Validate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to CSV files to validate
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to CSV files to validate"
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_xml_to_json

Convert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead. Parameters:

Parameter	Type	Required	Default	Description
`file_urls`	string[]	Yes	—	Required: Array of URLs to XML files to convert to JSON
`store_xml_json`	boolean	No	`false`	Optional: If true, stores the converted JSON as a file and returns download URL. If false (default), returns the full JSON data directly in the response.
`file_links_expire_in_days`	number	No	`7`	Number of days before the download links expire (default: 7, only applies when store_xml_json=true)

Show inputSchema

{
  "type": "object",
  "properties": {
    "file_urls": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Required: Array of URLs to XML files to convert to JSON"
    },
    "store_xml_json": {
      "type": "boolean",
      "default": false,
      "description": "Optional: If true, stores the converted JSON as a file and returns download URL. If false (default), returns the full JSON data directly in the response."
    },
    "file_links_expire_in_days": {
      "type": "number",
      "default": 7,
      "description": "Number of days before the download links expire (default: 7, only applies when store_xml_json=true)"
    }
  },
  "required": [
    "file_urls"
  ]
}

docprocess_invoice_extract

Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts). Parameters:

Parameter	Type	Required	Default	Description
`fileUrl`	string	Yes	—	Required: URL to the invoice file. Supported formats: PDF (.pdf), JPEG (.jpg/.jpeg), PNG (.png)
`pagesPerShard`	number	No	`3`	Optional: Number of pages per processing shard for PDFs (default: 3). Smaller values may improve accuracy for dense invoices.

Show inputSchema

{
  "type": "object",
  "properties": {
    "fileUrl": {
      "type": "string",
      "description": "Required: URL to the invoice file. Supported formats: PDF (.pdf), JPEG (.jpg/.jpeg), PNG (.png)"
    },
    "pagesPerShard": {
      "type": "number",
      "default": 3,
      "description": "Optional: Number of pages per processing shard for PDFs (default: 3). Smaller values may improve accuracy for dense invoices."
    }
  },
  "required": [
    "fileUrl"
  ]
}

docprocess_invoice_extract_poll

Check the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files. Parameters:

Parameter	Type	Required	Default	Description
`jobId`	string	Yes	—	Required: The jobId returned by docprocess_invoice_extract

Show inputSchema

{
  "type": "object",
  "properties": {
    "jobId": {
      "type": "string",
      "description": "Required: The jobId returned by docprocess_invoice_extract"
    }
  },
  "required": [
    "jobId"
  ]
}

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

Tools

docprocess_csv_to_xlsx

docprocess_html_to_pdf

docprocess_md_to_docx

docprocess_md_to_pdf

docprocess_xlsx_to_csv

docprocess_docx_to_txt

docprocess_pdf_to_txt

docprocess_pptx_to_txt

docprocess_ocr

docprocess_ocr_poll

docprocess_fill_pdf

docprocess_create_word

docprocess_word_ai

docprocess_word_ai_poll

docprocess_fill_word_tpl

docprocess_validate_csv

docprocess_xml_to_json

docprocess_invoice_extract

docprocess_invoice_extract_poll

Triggers API

Platform API

Embedded MCP Servers

Application MCP Servers

​Tools

​docprocess_csv_to_xlsx

​docprocess_html_to_pdf

​docprocess_md_to_docx

​docprocess_md_to_pdf

​docprocess_xlsx_to_csv

​docprocess_docx_to_txt

​docprocess_pdf_to_txt

​docprocess_pptx_to_txt

​docprocess_ocr

​docprocess_ocr_poll

​docprocess_fill_pdf

​docprocess_create_word

​docprocess_word_ai

​docprocess_word_ai_poll

​docprocess_fill_word_tpl

​docprocess_validate_csv

​docprocess_xml_to_json

​docprocess_invoice_extract

​docprocess_invoice_extract_poll

Tools

docprocess_csv_to_xlsx

docprocess_html_to_pdf

docprocess_md_to_docx

docprocess_md_to_pdf

docprocess_xlsx_to_csv

docprocess_docx_to_txt

docprocess_pdf_to_txt

docprocess_pptx_to_txt

docprocess_ocr

docprocess_ocr_poll

docprocess_fill_pdf

docprocess_create_word

docprocess_word_ai

docprocess_word_ai_poll

docprocess_fill_word_tpl

docprocess_validate_csv

docprocess_xml_to_json

docprocess_invoice_extract

docprocess_invoice_extract_poll