Universal Document Text Extractor

Extract text from documents and images using AI-powered processing

✨ Now supports large files with chunked upload technology

Upload Document or Image

Select a file to extract text content. Large files are automatically uploaded in chunks for reliability.

Choose file

Supported Formats

File types supported by our text extraction API

PDF

Portable Document Format

DOCX

Microsoft Word Document

RTF

Rich Text Format

XLSX

Microsoft Excel Spreadsheet

XLS

Legacy Excel Format

PNG

PNG Image (OCR)

JPEG

JPEG Image (OCR)

GIF

GIF Image (OCR)

ZIP

ZIP Archive (extracts all files)

Large File Support: Files over 4MB are automatically uploaded in 3MB chunks for maximum reliability. ZIP archives are automatically extracted and all contained files are processed.

API Usage

How to use the API endpoints directly

# One-time setup (initialize database tables)

curl -X GET https://agent-doc-tool.vercel.app/api/init-db

# Small files (Direct upload)

curl -X POST https://agent-doc-tool.vercel.app/api/extract-docx \

-H "Content-Type: application/json" \

-d '"base64": "UEsDBBQABgAIAAAAIQ...", "fileName": "doc.pdf", "fileType": "application/pdf"'

# Large files (Chunked upload)

# Use the same sessionId for all chunks of one file

# chunkIndex starts at 0 and increments by 1

# 1. Upload chunks

curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \

-H "Content-Type: application/json" \

-d '"sessionId": "session_123456", "chunk": "base64_chunk_0", "chunkIndex": 0, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'

curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \

-H "Content-Type: application/json" \

-d '"sessionId": "session_123456", "chunk": "base64_chunk_1", "chunkIndex": 1, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'

curl -X POST https://agent-doc-tool.vercel.app/api/upload-chunk \

-H "Content-Type: application/json" \

-d '"sessionId": "session_123456", "chunk": "base64_chunk_2", "chunkIndex": 2, "totalChunks": 3, "fileName": "big.pdf", "fileType": "application/pdf"'

# 2. Process assembled chunks

curl -X POST https://agent-doc-tool.vercel.app/api/process-chunks \

-H "Content-Type: application/json" \

-d '"sessionId": "session_123456"'

# 3. Optional: cleanup the session

curl -X POST https://agent-doc-tool.vercel.app/api/cleanup \

-H "Content-Type: application/json" \

-d '"sessionId": "session_123456"'

# Automatic chunking for files > 4MB base64

# Supports: PDF, DOCX, RTF, XLSX, XLS, PNG, JPEG, GIF, ZIP