Universal Document Text Extractor

Extract text from documents, images, and ZIP archives using AI-powered processing

✨ Now supports large files with chunked upload technology

Upload Document, Image, or ZIP Archive
Select a file to extract text content. Large files are automatically uploaded in chunks for reliability.
Supported Formats
File types supported by our text extraction API
PDF

Portable Document Format

DOCX

Microsoft Word Document

XLSX

Microsoft Excel Spreadsheet

XLS

Legacy Excel Format

PNG

PNG Image (OCR)

JPEG

JPEG Image (OCR)

GIF

GIF Image (OCR)

ZIP

ZIP Archive (Multiple Files)

Large File Support: Files over 4MB are automatically uploaded in 3MB chunks for maximum reliability on Vercel's platform. No file size limits!

API Usage
How to use the API endpoints directly
# One-time setup (initialize database tables)
curl -X GET https://agent-doc-tool.vercel.app/api/init-db
# Small files (Direct upload)
curl -X POST https://agent-doc-tool.vercel.app/api/extract-docx \
-H "Content-Type: application/json" \
-d '"base64": "UEsDBBQABgAIAAAAIQ...", "fileName": "doc.pdf", "fileType": "application/pdf"'
# Large files (Chunked upload)
# Use the same sessionId for all chunks of one file
# chunkIndex starts at 0 and increments by 1
# 1. Upload chunks
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_0", "chunkIndex": 0, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_1", "chunkIndex": 1, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_2", "chunkIndex": 2, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
# 2. Process assembled chunks
curl -X POST https://agent-doc-tool.vercel.app/api/process-chunks \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456"'
# 3. Optional: cleanup the session
curl -X POST https://agent-doc-tool.vercel.app/api/cleanup \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456"'
# Automatic chunking for files > 4MB base64
# Supports: PDF, DOCX, XLSX, XLS, PNG, JPEG, GIF, ZIP