Skip to main content
Server path: /docprocess | Type: Embedded | PCID required: No

Tools

ToolDescription
docprocess_csv_to_xlsxConvert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL.
docprocess_html_to_pdfConvert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs.
docprocess_md_to_docxConvert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL.
docprocess_md_to_pdfConvert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL.
docprocess_xlsx_to_csvConvert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet.
docprocess_docx_to_txtExtract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL.
docprocess_pdf_to_txtExtract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL.
docprocess_pptx_to_txtExtract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL.
docprocess_ocrExtract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status.
docprocess_ocr_pollCheck the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”.
docprocess_fill_pdfFill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled.
docprocess_create_wordCreate a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document.
docprocess_word_aiProcess Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status.
docprocess_word_ai_pollCheck the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”.
docprocess_fill_word_tplFill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document.
docprocess_validate_csvValidate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings.
docprocess_xml_to_jsonConvert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead.
docprocess_invoice_extractExtract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts).
docprocess_invoice_extract_pollCheck the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files.

docprocess_csv_to_xlsx

Convert CSV files to Excel format (.XLSX). Returns the converted Excel file with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to CSV files to convert to Excel
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_html_to_pdf

Convert HTML files to PDF format. Provide one or more URLs to HTML files (must be direct links to .html files, not web pages). Returns PDF download URLs. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to HTML files to convert (e.g., [“https://example.com/document.html”]). Must be direct links to .html files.
file_links_expire_in_daysnumberNo7Number of days before the PDF download links expire (default: 7)

docprocess_md_to_docx

Convert Markdown files (.MD) to Word document format (.DOCX). Returns the converted Word document with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to Markdown files (.MD) to convert to Word
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_md_to_pdf

Convert Markdown files (.MD) to PDF format. Returns the converted PDF file with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to Markdown files (.MD) to convert to PDF
pdf_formatstringNo"a4"Page format: “a4” (default) or “letter”
pdf_orientationstringNo"portrait"Page orientation: “portrait” (default) or “landscape”
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_xlsx_to_csv

Convert Excel files (.XLSX) to CSV format. Supports both single-sheet and multi-sheet workbooks. Multi-sheet files create separate CSV files for each sheet. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to Excel files (.XLSX) to convert to CSV
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_docx_to_txt

Extract text content from Word documents (.DOCX) to plain text format. Returns the extracted text file with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to Word documents (.DOCX) to extract text from
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_pdf_to_txt

Extract text content from PDF files (.PDF) to plain text format. Returns the extracted text file with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to PDF files to extract text from
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_pptx_to_txt

Extract text content from PowerPoint presentations (.PPTX) to plain text format. Returns the extracted text file with download URL. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to PowerPoint files (.PPTX) to extract text from
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_ocr

Extract text from images and PDFs using OCR (Optical Character Recognition). Supports PNG, JPEG, GIF, WebP, BMP, TIFF, SVG images and PDF documents. Returns extracted text or detailed OCR data with bounding boxes. For large files, returns a responseId - use docprocess_ocr_poll to check status. Parameters:
ParameterTypeRequiredDefaultDescription
fileUrlsstring[]YesRequired: Array of URLs to image files (PNG, JPEG, GIF, WebP, BMP, TIFF, SVG) or PDF documents to process
languageHintsstring[]NoOptional: Array of language codes for better OCR accuracy (e.g., [“en”, “es”]). Defaults to [“en”]
extractTextOnlybooleanNotrueWhen true (default), returns only extracted text. When false, includes detailed OCR data like bounding boxes and word positions for layout analysis.
collectionIdstringNoOptional: Filestorage collection ID to store OCR results. If not provided, uses default collection.
asyncbooleanNofalseWhen true, processes files asynchronously in background. Automatically enabled for large files. Returns responseId for polling.

docprocess_ocr_poll

Check the status of an OCR processing job. Call this after docprocess_ocr with async=true to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:
ParameterTypeRequiredDefaultDescription
responseIdstringYesRequired: The responseId returned by docprocess_ocr when async=true

docprocess_fill_pdf

Fill a PDF form with provided data using automatic field matching. Supports text fields, checkboxes, and dropdowns. Uses multiple fallback strategies: (1) exact field name match, (2) case-insensitive partial name match. For checkboxes, use boolean values or “true”/“checked”. For dropdowns, matches option text case-insensitively. Returns the URL to the filled PDF with a summary of fields filled. Parameters:
ParameterTypeRequiredDefaultDescription
pdf_urlstringYesRequired: URL to the PDF form to fill. The PDF must have fillable form fields (not just a static document).
form_dataanyYesRequired: JSON object with field names and values. For text fields: {“fieldName”: “value”}. For checkboxes: {“checkboxName”: true} or {“checkboxName”: “checked”}. For dropdowns: {“dropdownName”: “optionText”}. Field matching is case-insensitive and supports partial matches.
output_filenamestringNo"filled_form.pdf"Optional: Name for the output file (default: “filled_form.pdf”)
file_links_expire_in_daysnumberNo7Number of days before the download link expires (default: 7)

docprocess_create_word

Create a Word document (.docx) from scratch using a JSON specification. Structure: sections contain children (headings, paragraphs, bullets, tables). Paragraphs can contain simple text or a children array of formatted text runs. Tables contain rows (arrays of cell content). Supports headings (levels 1-6), paragraphs with text formatting (bold, italic, underline, strike, doubleStrike, highlight, superScript, subScript, allCaps, smallCaps, color, font, size), bullet lists, and tables. All text content must be in paragraph elements (including table cells). Returns the URL to the created Word document. Parameters:
ParameterTypeRequiredDefaultDescription
document_specanyYesDocument specification JSON. Structure: {“sections”: [{“children”: [elements]}]}. Element types: “heading” (requires level 1-6, text), “paragraph” (text or children array for formatting), “bullet” (text, optional level), “table” (rows array). Example: {“sections”: [{“children”: [{“type”: “heading”, “level”: 1, “text”: “Report”, “alignment”: “center”}, {“type”: “paragraph”, “children”: [{“text”: “Bold ”, “bold”: true}, {“text”: “normal text”}]}, {“type”: “bullet”, “text”: “Item 1”}, {“type”: “table”, “rows”: [[“Header 1”, “Header 2”], [“Value 1”, “Value 2”]]}]}]}
output_filenamestringNo"created_document.docx"Optional: Name for the output file (default: “created_document.docx”)
file_links_expire_in_daysnumberNo7Number of days before the download link expires (default: 7)

docprocess_word_ai

Process Word documents with AI while preserving ALL formatting (bold, italic, fonts, colors, tables, lists, headers, images). Supports translation, grammar correction, rewriting, summarization, and any text transformation. Returns a responseId - use docprocess_word_ai_poll to check status. Parameters:
ParameterTypeRequiredDefaultDescription
documentUrlstringYesRequired: URL to the Word document (.DOCX) to process
taskstringYesRequired: Natural language description of what to do (e.g., “translate to Spanish”, “fix grammar errors”, “rewrite in formal tone”, “summarize to 2 paragraphs”)
modelstringNoOptional: LLM model to use. Options: claude-sonnet-4-5-20250929 (default), gpt-4.1, gpt-4o, gemini-2.5-flash
strategystringNoOptional: SPARSE_CHANGES for minor edits (grammar, spelling), DENSE_CHANGES for major changes (translation, rewriting). Auto-detected if omitted.

docprocess_word_ai_poll

Check the status of a Word document AI processing job. Call this after docprocess_word_ai to check if processing is complete. Poll every 5-10 seconds until status is “completed” or “failed”. Parameters:
ParameterTypeRequiredDefaultDescription
responseIdstringYesRequired: The responseId returned by docprocess_word_ai

docprocess_fill_word_tpl

Fill a Word document template (.docx) with provided data. Supports: simple placeholders {name}, nested objects {user.firstName}, loops {#items}{name}{/items}, conditionals {#condition}…{/condition}, inverted conditionals {^condition}…{/condition}, and expressions {price * quantity}. All formatting from the original template is preserved. Loop tags must be closed: {#items}…{/items}. Placeholders are case-sensitive. Returns the URL to the filled Word document. Parameters:
ParameterTypeRequiredDefaultDescription
template_urlstringYesRequired: URL to the Word document template (.docx). Template should contain placeholders like {name}, {user.email}, {#items}{name}{/items} for loops (must close with {/items}), {#condition}text{/condition} for conditionals. Templates must be .docx format (not older .doc).
dataanyYesRequired: JSON object with data to fill placeholders. Example: {“name”: “John”, “items”: [{“name”: “Widget”, “price”: 10}], “isPremium”: true}. Keys must match placeholder names exactly (case-sensitive). For loops, provide arrays: {“items”: [{“name”: “A”}, {“name”: “B”}]}. For conditionals, provide booleans: {“isPremium”: true}.
output_filenamestringNo"filled_template.docx"Optional: Name for the output file (default: “filled_template.docx”)
file_links_expire_in_daysnumberNo7Number of days before the download link expires (default: 7)

docprocess_validate_csv

Validate CSV file structure, data quality, and consistency. Can validate CSV files directly or CSV files extracted from Excel. Returns validation results with errors and warnings. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to CSV files to validate
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7)

docprocess_xml_to_json

Convert XML files to JSON format. By default returns the full JSON data in the response. Set store_xml_json=true to store as a file and get a download URL instead. Parameters:
ParameterTypeRequiredDefaultDescription
file_urlsstring[]YesRequired: Array of URLs to XML files to convert to JSON
store_xml_jsonbooleanNofalseOptional: If true, stores the converted JSON as a file and returns download URL. If false (default), returns the full JSON data directly in the response.
file_links_expire_in_daysnumberNo7Number of days before the download links expire (default: 7, only applies when store_xml_json=true)

docprocess_invoice_extract

Extract structured line items from invoices (PDF or image). Uses AI to discover columns dynamically from the document and extract all line items. Supports multi-page PDFs with automatic sharding and parallel processing. Returns a jobId - use docprocess_invoice_extract_poll to check status and retrieve results (CSV, JSON, and summary artifacts). Parameters:
ParameterTypeRequiredDefaultDescription
fileUrlstringYesRequired: URL to the invoice file. Supported formats: PDF (.pdf), JPEG (.jpg/.jpeg), PNG (.png)
pagesPerShardnumberNo3Optional: Number of pages per processing shard for PDFs (default: 3). Smaller values may improve accuracy for dense invoices.

docprocess_invoice_extract_poll

Check the status of an invoice line-item extraction job. Call this after docprocess_invoice_extract to check if processing is complete. Poll every 10-15 seconds until status is “completed” or “failed”. When completed, returns artifact URLs for the extracted CSV, JSON, and summary files. Parameters:
ParameterTypeRequiredDefaultDescription
jobIdstringYesRequired: The jobId returned by docprocess_invoice_extract