Document processing reimagined with LLMs

Make any file
LLM ready
with our API

Stop writing parsers for every format.

Define your Pydantic model or JSON

schema once, extract from PDFs, Excel,

or emails using your favorite LLM

with your own API keys.

The painful, current solution:
def parse_earnings_report(file_path):
    if file_path.endswith('.pdf'):
        # PDFs: deal with non-standard layouts, scanned documents.
        # Some are text-based; others are just scanned images.
        # And don't forget that sometimes the image is vertical, sometimes horizontal.
        messages = extract_pdf_text(file_path)
    elif file_path.endswith('.xlsx'):
        # Excel files: manage inconsistent formatting, merged cells, and broken sheets.
        # Watch out for hidden rows, weird formulas, or bizarrely nested tables.
        df = pd.read_excel(file_path)
        messages = convert_df_to_text(df)
    elif file_path.endswith('.eml'):
        # Emails: unstructured chaos hidden inside multipart MIME formats.
        # Extract attachments, inline content, and encoded text—each a potential rabbit hole.
        messages = extract_email(file_path)
    else:
        # Add support for Word, images, CSVs, ZIPs... because someone always sends those.
        raise ValueError("Unsupported file format")
    
    # And now the *real* pain begins:
    # - Manage 10+ dependencies—each with its quirks, bugs, and breaking updates.
    # - Extract fields using brittle regex patterns or costly, slow LLM calls.
    # - Handle endless format variations: Excel tables, PDF layouts, freeform emails...
    # - Vertical vs. horizontal image orientations? Of course, those matter too.
    # - Deal with edge cases like:
    #     * Dates in every possible format known to man.
    #     * Fields that move to entirely new places between files.
    #     * Corrupted files that somehow still open but break your pipeline.
    # - Carefully validate JSON output against schemas—debugging mismatches is *so much fun.*

    # Oh, and don't forget scalability:
    # Multiply all of this by thousands of files a day, each with its own quirks and surprises.
    # And the kicker? You'll only notice the worst issues after deploying to production.

    pass  # This "pass" represents hours of debugging and countless cups of coffee.
    
With UiForm:
from uiform import UiForm
from openai import OpenAI

uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
    document = "earnings_report.xlsx"
)

# Now you can use your favorite model to analyze your document
client = OpenAI()
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=doc_msg.openai_messages + [
        {
            "role": "user",
            "content": "Summarize the document"
        }
    ]
)

Compatible with

Get your API key

Don't waste time reinventingdocument pre-processing

Our API support all major file formats, enabling seamless data extraction from any source (documents, emails, excels and more). We automatically tedious things like handle page rotations, running your Excel file in a separate container, formatting, and more, making it easy to process any file format that your business works with.

Images

  • jpg
  • png
  • gif
  • tiff
  • webp

Office docs

  • docx
  • xlsx
  • pptx
  • odt, ods, odp
  • doc, xls, ppt

Text and Emails

  • pdf
  • txt
  • rtf
  • eml
  • msg

Code

  • json
  • xml
  • html
  • csv
  • yaml

Not a black box API

UiForm is a set of building blocks for quickly adding document processing features to your app, leveraging foundational models. We are compatible with major AI providers - all you have to do is plug your API key and start processing documents. You'll be shipping quickly with a market-proven solution for your customers.

1UIFORM_API_KEY=sk-xxxxxxxxxx
2OPENAI_API_KEY=YOUR_API_KEY
3CLAUDE_API_KEY=YOUR_API_KEY
4XAI_API_KEY=YOUR_API_KEY
5GEMINI_API_KEY=YOUR_API_KEY

Built for developers

Built with care to be compatible with the modern tooling that you already use and love

Natively compatible with OpenAI structures and formats

Pydantic Streamline Icon: https://streamlinehq.comPydantic

Normalized responses. Pydantic objects are first class citizens

Our JSON Schema follows the OpenAPI specification

Linter-friendly with native types. Your IDE will love it

Compatible with 140+ document formats

Integrates easily in your application

Unified API for text, image, audio, emails and more

Compatibility with OpenAI, Anthropic, Gemini, xAI

Extractions Over Time

0%
Overall Accuracy
0s
Avg. Processing
0
Daily Extractions
0%
System Uptime
File-type Distribution
PDF 0%
Images 0%
Word 0%
Other 0%
Processing Speed
Preprocessing0.3s
TTFirst token0.5s
Inference0.4s
Field-level Accuracy
Company Name98.5%
Invoice Number99.2%
VAT Number97.8%
Total Amount96.9%
Date99.5%
Address95.8%

Pro

Most popular

A perfect user experience powered by our Infrastructure.

$20

/month
Document API
Unlimited
Dataset storage
Forever
All Starter features

To put UiForm in production