UiForm
Make any file LLM-readywith our APIMake any fileLLM ready with our API
Stop writing parsers for every format. Define your Pydantic model or JSON schema once,
extract from PDFs, Excel, Emails, and more using your favorite LLM with your own API keys.
Stop writing parsers for every format.
Define your Pydantic model or JSON
schema once, extract from PDFs, Excel,
or emails using your favorite LLM
with your own API keys.
def parse_earnings_report(file_path):
if file_path.endswith('.pdf'):
# PDFs: deal with non-standard layouts, scanned documents.
# Some are text-based; others are just scanned images.
# And don't forget that sometimes the image is vertical, sometimes horizontal.
messages = extract_pdf_text(file_path)
elif file_path.endswith('.xlsx'):
# Excel files: manage inconsistent formatting, merged cells, and broken sheets.
# Watch out for hidden rows, weird formulas, or bizarrely nested tables.
df = pd.read_excel(file_path)
messages = convert_df_to_text(df)
elif file_path.endswith('.eml'):
# Emails: unstructured chaos hidden inside multipart MIME formats.
# Extract attachments, inline content, and encoded text—each a potential rabbit hole.
messages = extract_email(file_path)
else:
# Add support for Word, images, CSVs, ZIPs... because someone always sends those.
raise ValueError("Unsupported file format")
# And now the *real* pain begins:
# - Manage 10+ dependencies—each with its quirks, bugs, and breaking updates.
# - Extract fields using brittle regex patterns or costly, slow LLM calls.
# - Handle endless format variations: Excel tables, PDF layouts, freeform emails...
# - Vertical vs. horizontal image orientations? Of course, those matter too.
# - Deal with edge cases like:
# * Dates in every possible format known to man.
# * Fields that move to entirely new places between files.
# * Corrupted files that somehow still open but break your pipeline.
# - Carefully validate JSON output against schemas—debugging mismatches is *so much fun.*
# Oh, and don't forget scalability:
# Multiply all of this by thousands of files a day, each with its own quirks and surprises.
# And the kicker? You'll only notice the worst issues after deploying to production.
pass # This "pass" represents hours of debugging and countless cups of coffee.
from uiform import UiForm
uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
document = "earnings_report.xlsx"
)
# Now you can use your favorite model to analyze your document
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=doc_msg.openai_messages + [
{
"role": "user",
"content": "Summarize the document"
}
]
)
def parse_earnings_report(file_path):
if file_path.endswith('.pdf'):
# PDFs: deal with non-standard layouts, scanned documents.
# Some are text-based; others are just scanned images.
# And don't forget that sometimes the image is vertical, sometimes horizontal.
messages = extract_pdf_text(file_path)
elif file_path.endswith('.xlsx'):
# Excel files: manage inconsistent formatting, merged cells, and broken sheets.
# Watch out for hidden rows, weird formulas, or bizarrely nested tables.
df = pd.read_excel(file_path)
messages = convert_df_to_text(df)
elif file_path.endswith('.eml'):
# Emails: unstructured chaos hidden inside multipart MIME formats.
# Extract attachments, inline content, and encoded text—each a potential rabbit hole.
messages = extract_email(file_path)
else:
# Add support for Word, images, CSVs, ZIPs... because someone always sends those.
raise ValueError("Unsupported file format")
# And now the *real* pain begins:
# - Manage 10+ dependencies—each with its quirks, bugs, and breaking updates.
# - Extract fields using brittle regex patterns or costly, slow LLM calls.
# - Handle endless format variations: Excel tables, PDF layouts, freeform emails...
# - Vertical vs. horizontal image orientations? Of course, those matter too.
# - Deal with edge cases like:
# * Dates in every possible format known to man.
# * Fields that move to entirely new places between files.
# * Corrupted files that somehow still open but break your pipeline.
# - Carefully validate JSON output against schemas—debugging mismatches is *so much fun.*
# Oh, and don't forget scalability:
# Multiply all of this by thousands of files a day, each with its own quirks and surprises.
# And the kicker? You'll only notice the worst issues after deploying to production.
pass # This "pass" represents hours of debugging and countless cups of coffee.
from uiform import UiForm
from openai import OpenAI
uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
document = "earnings_report.xlsx"
)
# Now you can use your favorite model to analyze your document
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=doc_msg.openai_messages + [
{
"role": "user",
"content": "Summarize the document"
}
]
)
Compatible with
Don't waste time reinventingdocument pre-processing
Our API support all major file formats, enabling seamless data extraction from any source (documents, emails, excels and more). We automatically tedious things like handle page rotations, running your Excel file in a separate container, formatting, and more, making it easy to process any file format that your business works with.
Images
- jpg
- png
- gif
- tiff
- webp
Office docs
- docx
- xlsx
- pptx
- odt, ods, odp
- doc, xls, ppt
Text and Emails
- pdf
- txt
- rtf
- eml
- msg
Code
- json
- xml
- html
- csv
- yaml
Not a black box API
UiForm is a set of building blocks for quickly adding document processing features to your app, leveraging foundational models. We are compatible with major AI providers - all you have to do is plug your API key and start processing documents. You'll be shipping quickly with a market-proven solution for your customers.
1UIFORM_API_KEY=sk-xxxxxxxxxx
2OPENAI_API_KEY=YOUR_API_KEY
3CLAUDE_API_KEY=YOUR_API_KEY
4XAI_API_KEY=YOUR_API_KEY
5GEMINI_API_KEY=YOUR_API_KEY
UIFORM_API_KEY=sk-xxxxxxxxxx
OPENAI_API_KEY=YOUR_API_KEY
CLAUDE_API_KEY=YOUR_API_KEY
XAI_API_KEY=YOUR_API_KEY
GEMINI_API_KEY=YOUR_API_KEY
Built for developers
Built with care to be compatible with the modern tooling that you already use and love
Natively compatible with OpenAI structures and formats
Normalized responses. Pydantic objects are first class citizens
Our JSON Schema follows the OpenAPI specification
Linter-friendly with native types. Your IDE will love it
Compatible with 140+ document formats
Integrates easily in your application
Unified API for text, image, audio, emails and more
Compatibility with OpenAI, Anthropic, Gemini, xAI
1import json
2from uiform.client import UiForm
3
4client = UiForm()
5
6with open("json_schema.json", 'r') as f:
7 json_schema = json.load(f)
8
9text_operations = {
10 'regex_instructions' : [
11 {
12 "name": "vat_number",
13 "pattern": r"[Ff][Rr]s*(ds*){11}",
14 "description": "VAT number in the format XX999999999"
15 }
16 ]
17}
18
19client.documents.extractions.parse(json_schema=json_schema,
20 document="example.pdf",
21 text_operations=text_operations,
22 model="gpt-4o-mini",
23 temperature=0)
Extractions Over Time
Starter
Free foreverFor developers that want to get started quickly.
Free
Pro
Most popularA perfect user experience powered by our Infrastructure.
$20
/month