Skip to content
Regolo Logo

Chandra OCR 2

chandra‑ocr‑2 is a 4B‑parameter, layout‑aware OCR model from Datalab that converts complex documents into structured Markdown/HTML/JSON across 90+ languages, achieving SOTA olmOCR scores with 2× the throughput of Chandra 1.
Custom Model
OCR

How to Get Started

Step 1

Sign Up and get your Api Key and use with UNLIMITED tokens for 30 days.

Step 2

Paste the URL from Huggingface repository. (Ex: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4)

Step 3

Choose the GPU machine to deploy.

That’s all! You’re ready to use the model in few minutes without infrastructure complexity in few minutes.


Applications & Use Cases

  • High‑accuracy OCR for PDFs, scans, and textbooks with complex layouts, achieving 85.9 overall on olmOCR and strong results across ArXiv, math, tables, and multi‑column text.
  • Structured document extraction that converts invoices, financial filings, and reports into Markdown/HTML/JSON with preserved tables (including merged cells) and layout metadata.
  • Math- and STEM-heavy workflows that require robust LaTeX extraction for inline and block equations from research papers, worksheets, and technical books.
  • Handwriting and form digitization for notes, medical forms, and checklists, including accurate reconstruction of checkboxes and form elements.
  • Multilingual OCR (90+ languages) for legal, governmental, and enterprise archives, where chandra‑ocr‑2 improves multilingual benchmark scores by 12% over the previous generation.
  • RAG and AI pipeline preprocessing, where chandra‑ocr‑2 acts as the ingestion layer that turns messy real‑world documents into clean, structured inputs for downstream LLMs and search indexes.