Universal Document Text Extractor

Extract text from documents and images using AI-powered processing

✨ Now supports large files with chunked upload technology

Upload Document or Image
Select a file to extract text content. Large files are automatically uploaded in chunks for reliability.
Supported Formats
File types supported by our text extraction API
PDF

Portable Document Format

DOCX

Microsoft Word Document

RTF

Rich Text Format

XLSX

Microsoft Excel Spreadsheet

XLS

Legacy Excel Format

PNG

PNG Image (OCR)

JPEG

JPEG Image (OCR)

GIF

GIF Image (OCR)

ZIP

ZIP Archive (extracts all files)

Large File Support: Files over 4MB are automatically uploaded in 3MB chunks for maximum reliability. ZIP archives are automatically extracted and all contained files are processed.

API Usage
How to use the API endpoints directly
# One-time setup (initialize database tables)
curl -X GET https://agent-doc-tool.vercel.app/api/init-db
# Small files (Direct upload)
curl -X POST https://agent-doc-tool.vercel.app/api/extract-docx \
-H "Content-Type: application/json" \
-d '"base64": "UEsDBBQABgAIAAAAIQ...", "fileName": "doc.pdf", "fileType": "application/pdf"'
# Large files (Chunked upload)
# Use the same sessionId for all chunks of one file
# chunkIndex starts at 0 and increments by 1
# 1. Upload chunks
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_0", "chunkIndex": 0, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_1", "chunkIndex": 1, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456", "chunk": "base64_chunk_2", "chunkIndex": 2, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'
# 2. Process assembled chunks
curl -X POST https://agent-doc-tool.vercel.app/api/process-chunks \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456"'
# 3. Optional: cleanup the session
curl -X POST https://agent-doc-tool.vercel.app/api/cleanup \
-H "Content-Type: application/json" \
-d '"sessionId": "session_123456"'
# Automatic chunking for files > 4MB base64
# Supports: PDF, DOCX, RTF, XLSX, XLS, PNG, JPEG, GIF, ZIP