/docprocess | Type: Embedded | PCID required: No
Tools
| Tool | Description |
|---|---|
docprocess_csv_to_xlsx | Convert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL. |
docprocess_html_to_pdf | Convert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs. |
docprocess_md_to_docx | Convert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL. |
docprocess_md_to_pdf | Convert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL. |
docprocess_xlsx_to_csv | Convert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet. |
docprocess_docx_to_txt | Extract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL. |
docprocess_pdf_to_txt | Extract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL. |
docprocess_pptx_to_txt | Extract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL. |
docprocess_ocr | Extract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status. |
docprocess_ocr_poll | Check the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. |
docprocess_fill_pdf | Fill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled. |
docprocess_create_word | Create a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document. |
docprocess_word_ai | Process Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status. |
docprocess_word_ai_poll | Check the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. |
docprocess_fill_word_tpl | Fill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document. |
docprocess_validate_csv | Validate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings. |
docprocess_xml_to_json | Convert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead. |
docprocess_invoice_extract | Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts). |
docprocess_invoice_extract_poll | Check the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files. |
docprocess_csv_to_xlsx
Convert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to CSV files to convert to Excel |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_html_to_pdf
Convert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to HTML files to convert (e.g., [“https://example.com/document.html”]). Must be direct links to .html files. |
file_links_expire_in_days | number | No | 7 | Number of days before the PDF download links expire (default: 7) |
docprocess_md_to_docx
Convert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to Markdown files (.MD) to convert to Word |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_md_to_pdf
Convert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to Markdown files (.MD) to convert to PDF |
pdf_format | string | No | "a4" | Page format: “a4” (default) or “letter” |
pdf_orientation | string | No | "portrait" | Page orientation: “portrait” (default) or “landscape” |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_xlsx_to_csv
Convert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to Excel files (.XLSX) to convert to CSV |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_docx_to_txt
Extract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to Word documents (.DOCX) to extract text from |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_pdf_to_txt
Extract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to PDF files to extract text from |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_pptx_to_txt
Extract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to PowerPoint files (.PPTX) to extract text from |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_ocr
Extract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
fileUrls | string[] | Yes | — | Required: Array of URLs to image files (PNG, JPEG, GIF, WebP, BMP, TIFF, SVG) or PDF documents to process |
languageHints | string[] | No | — | Optional: Array of language codes for better OCR accuracy (e.g., [“en”, “es”]). Defaults to [“en”] |
extractTextOnly | boolean | No | true | When true (default), returns only extracted text. When false, includes detailed OCR data like bounding boxes and word positions for layout analysis. |
collectionId | string | No | — | Optional: Filestorage collection ID to store OCR results. If not provided, uses default collection. |
async | boolean | No | false | When true, processes files asynchronously in background. Automatically enabled for large files. Returns responseId for polling. |
docprocess_ocr_poll
Check the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
responseId | string | Yes | — | Required: The responseId returned by docprocess_ocr when async=true |
docprocess_fill_pdf
Fill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
pdf_url | string | Yes | — | Required: URL to the PDF form to fill. The PDF must have fillable form fields (not just a static document). |
form_data | any | Yes | — | Required: JSON object with field names and values. For text fields: {“fieldName”: “value”}. For checkboxes: {“checkboxName”: true} or {“checkboxName”: “checked”}. For dropdowns: {“dropdownName”: “optionText”}. Field matching is case-insensitive and supports partial matches. |
output_filename | string | No | "filled_form.pdf" | Optional: Name for the output file (default: “filled_form.pdf”) |
file_links_expire_in_days | number | No | 7 | Number of days before the download link expires (default: 7) |
docprocess_create_word
Create a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
document_spec | any | Yes | — | Document specification JSON. Structure: {“sections”: [{“children”: [elements]}]}. Element types: “heading” (requires level 1-6, text), “paragraph” (text or children array for formatting), “bullet” (text, optional level), “table” (rows array). Example: {“sections”: [{“children”: [{“type”: “heading”, “level”: 1, “text”: “Report”, “alignment”: “center”}, {“type”: “paragraph”, “children”: [{“text”: “Bold ”, “bold”: true}, {“text”: “normal text”}]}, {“type”: “bullet”, “text”: “Item 1”}, {“type”: “table”, “rows”: [[“Header 1”, “Header 2”], [“Value 1”, “Value 2”]]}]}]} |
output_filename | string | No | "created_document.docx" | Optional: Name for the output file (default: “created_document.docx”) |
file_links_expire_in_days | number | No | 7 | Number of days before the download link expires (default: 7) |
docprocess_word_ai
Process Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
documentUrl | string | Yes | — | Required: URL to the Word document (.DOCX) to process |
task | string | Yes | — | Required: Natural language description of what to do (e.g., “translate to Spanish”, “fix grammar errors”, “rewrite in formal tone”, “summarize to 2 paragraphs”) |
model | string | No | — | Optional: LLM model to use. Options: claude-sonnet-4-5-20250929 (default), gpt-4.1, gpt-4o, gemini-2.5-flash |
strategy | string | No | — | Optional: SPARSE_CHANGES for minor edits (grammar, spelling), DENSE_CHANGES for major changes (translation, rewriting). Auto-detected if omitted. |
docprocess_word_ai_poll
Check the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
responseId | string | Yes | — | Required: The responseId returned by docprocess_word_ai |
docprocess_fill_word_tpl
Fill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
template_url | string | Yes | — | Required: URL to the Word document template (.docx). Template should contain placeholders like {name}, {user.email}, {#items}{name}{/items} for loops (must close with {/items}), {#condition}text{/condition} for conditionals. Templates must be .docx format (not older .doc). |
data | any | Yes | — | Required: JSON object with data to fill placeholders. Example: {“name”: “John”, “items”: [{“name”: “Widget”, “price”: 10}], “isPremium”: true}. Keys must match placeholder names exactly (case-sensitive). For loops, provide arrays: {“items”: [{“name”: “A”}, {“name”: “B”}]}. For conditionals, provide booleans: {“isPremium”: true}. |
output_filename | string | No | "filled_template.docx" | Optional: Name for the output file (default: “filled_template.docx”) |
file_links_expire_in_days | number | No | 7 | Number of days before the download link expires (default: 7) |
docprocess_validate_csv
Validate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to CSV files to validate |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7) |
docprocess_xml_to_json
Convert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_urls | string[] | Yes | — | Required: Array of URLs to XML files to convert to JSON |
store_xml_json | boolean | No | false | Optional: If true, stores the converted JSON as a file and returns download URL. If false (default), returns the full JSON data directly in the response. |
file_links_expire_in_days | number | No | 7 | Number of days before the download links expire (default: 7, only applies when store_xml_json=true) |
docprocess_invoice_extract
Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts). Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
fileUrl | string | Yes | — | Required: URL to the invoice file. Supported formats: PDF (.pdf), JPEG (.jpg/.jpeg), PNG (.png) |
pagesPerShard | number | No | 3 | Optional: Number of pages per processing shard for PDFs (default: 3). Smaller values may improve accuracy for dense invoices. |
docprocess_invoice_extract_poll
Check the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files. Parameters:| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
jobId | string | Yes | — | Required: The jobId returned by docprocess_invoice_extract |

