BharatParse API

Extract structured JSON from any Indian invoice, bill, or receipt with a single POST request. No cloud setup, no IAM config, no per-metric billing.

13 Document Types GST Validated PDFs & Images Handwriting Support Gemini 2.5 Flash India-first

Overview

BharatParse is a document extraction API built specifically for Indian businesses. Send any invoice, bill, or receipt as a base64-encoded file and receive clean, validated JSON in return - with confidence scores, field-level warnings, and GST-specific validation built in.

Auto-detect mode: Not sure which schema to use? Set schema: "auto" and BharatParse will identify the document type and extract accordingly.

Why BharatParse vs AWS Textract / Google Vision

Feature	BharatParse	AWS Textract	Google Vision
Setup time	5 minutes	Days (IAM, VPC, SDK)	Days
Handwritten receipts	Yes (LLM-powered)	Limited	Limited
GST/GSTIN validation	Built-in	No	No
India-specific schemas	13 schemas	Generic only	Generic only
Pricing	Flat monthly	Per page + per query	Per 1000 units
Confidence scores	Document-level + warnings	Per word	Per word

Quick Start

Get your first extraction in under 2 minutes.

1. Get your API key

Subscribe to any plan on RapidAPI. Your API key will be shown in the dashboard.

2. Make your first request

# Python example
import base64, requests

# Load your document
with open("invoice.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

# Call the API
response = requests.post(
    "https://bharatparse-indian-invoice-bill.p.rapidapi.com/v1/extract",
    headers={
        "X-RapidAPI-Key": "YOUR_RAPIDAPI_KEY",
        "X-RapidAPI-Host": "bharatparse-indian-invoice-bill.p.rapidapi.com",
        "Content-Type": "application/json"
    },
    json={
        "file_b64": b64,
        "file_type": "pdf",
        "schema": "auto",
        "country": "IN"
    }
)
print(response.json())

3. That's it

You'll receive structured JSON with all extracted fields, a confidence score, and any warnings about low-quality fields.

Authentication

BharatParse uses RapidAPI's standard authentication. Include your RapidAPI key in every request header:

Header	Value
`X-RapidAPI-Key`	Your RapidAPI subscription key
`X-RapidAPI-Host`	`bharatparse-indian-invoice-bill.p.rapidapi.com`
`Content-Type`	`application/json`

Keep your key secret. Never expose it in client-side code or public repositories. Rotate it immediately from your RapidAPI dashboard if compromised.

Supported File Formats

BharatParse accepts both PDFs and images. You can send a scanned document, a phone photograph, a screenshot, or a downloaded PDF - all are handled by the same endpoint.

Format	file_type value	Best for
PDF	`pdf`	Downloaded invoices, e-tickets, multi-page bills (Jio, bank statements)
JPEG / JPG	`jpeg` or `jpg`	Phone photos of receipts, restaurant bills, fuel pump memos
PNG	`png`	Screenshots of digital invoices, e-commerce order pages
WebP	`webp`	WhatsApp-shared images, modern browser screenshots
TIFF / TIF	`tiff` or `tif`	High-resolution scanner output, archival document scans

Tip for best results: For phone photos, ensure good lighting and hold the camera parallel to the document. The API handles tilt, blur, and partial shadows well - but extreme angles or very dark images reduce confidence scores.

File size limit

Maximum 20MB per request. Most invoices and bills are well under 1MB. Bank statements and multi-page PDFs are typically 2–5MB. If your file exceeds 20MB, compress the PDF or reduce image resolution before sending.

Endpoint

POST /v1/extract

Extract structured JSON from any Indian invoice, bill, or receipt. Accepts PDFs and images (JPEG, PNG, WebP, TIFF) up to 20MB.

Base URL: https://bharatparse-indian-invoice-bill.p.rapidapi.com

Request Parameters

All parameters are sent as a JSON body.

Parameter	Type	Required	Description
`file_b64`	string	Required	Base64-encoded file content. Max 20MB.
`file_type`	string	Required	File format: `pdf`, `jpeg`, `jpg`, `png`, `webp`, `tiff`, `tif`
`schema`	string	Optional	Document type. Default: `auto`. See schemas for all options.
`country`	string	Optional	Country code. Default: `IN`. Use for non-Indian documents.

Response Format

All successful responses (HTTP 200) follow this structure:

{
  "schema_detected": "restaurant",     // Document type identified
  "confidence": 0.92,               // 0.0–1.0 extraction quality score
  "data": { ... },                   // Extracted fields (schema-specific)
  "warnings": [                      // Field-level issues (empty if none)
    "Invoice date not visible in scan"
  ],
  "processing_ms": 1843            // Processing time in milliseconds
}

Confidence Score Guide

Score	Meaning	Recommended action
0.90 – 1.00	Excellent - clean document	Use directly
0.70 – 0.89	Good - minor issues	Review warnings
0.50 – 0.69	Fair - poor scan quality	Human review recommended
Below 0.50	Low - heavily degraded	Re-scan or manual entry

Document Schemas

Use the schema parameter to specify the document type, or use auto to let BharatParse detect it automatically.

auto

Auto-detects document type. Best for mixed document pipelines.

gst_invoice

B2B GST tax invoices with GSTIN, HSN/SAC codes, and tax breakdowns.

restaurant

Restaurant and cafe bills including Starbucks, Zomato, Swiggy printouts.

fuel

Petrol/diesel/CNG receipts - including handwritten pump memos (BPCL, HPCL, IOC).

telecom

Mobile and broadband bills - Jio, Airtel, BSNL, Vi, ACT Fibernet.

travel

IRCTC e-tickets (ERS), train bookings, bus tickets. Extracts PNR, GST invoice and passenger details.

utility

Electricity bills (BESCOM, MSEDCL, CESC), gas (Mahanagar Gas, IGL), water bills.

medical

Pharmacy/chemist bills, hospital invoices, diagnostic lab reports. Insurance claim ready.

ecommerce

Amazon, Flipkart, Meesho, Myntra, Nykaa order invoices including seller GSTIN.

rent

Rent receipts with landlord PAN extraction. HRA claim ready for Section 10(13A).

bank_statement

Bank account statements - HDFC, SBI, ICICI, Axis, Kotak. Full transaction extraction.

credit_card

Credit card monthly statements with transaction categorization and reward points.

generic

Fallback for any other Indian bill or receipt not covered by specific schemas.

Example: Restaurant Bill

Input: a blurry, angled phone scan of a Starbucks receipt. The OCR text was heavily degraded - yet all key fields extracted correctly.

Input document

Restaurant_020525.pdf - page 1

Extracted JSON

{ "schema_detected": "restaurant", "confidence": 0.92, "data": { "restaurant_name": "Starbucks", "hsn_code": "996331", "line_items": [ { "name": "Tall Cold Coffee", "quantity": 1, "total": 320.00 } ], "taxable_value": 320.00, "cgst_rate": 2.5, "cgst_amount": 8.00, "sgst_rate": 2.5, "sgst_amount": 8.00, "grand_total": 336.00, "payment": { "mode": "starbucks_card", "card_last4": "1821" } }, "warnings": ["Invoice date not visible in scan"], "processing_ms": 1843 }

Example: Handwritten Fuel Receipt

Input: a handwritten BPCL cash memo where the quantity "94:14" was written with a colon instead of a decimal point. BharatParse interpreted the context and extracted the correct value.

Input document

050625fuel.pdf - handwritten pump memo

Extracted JSON

# Response for handwritten BPCL receipt
{
  "schema_detected": "fuel",
  "confidence": 0.90,
  "data": {
    "dealer_name": "N. M. Shamsuddin & Sons",
    "oil_company": "BPCL",
    "invoice_date": "2025-06-05",
    "fuel_items": [
      {
        "fuel_type": "Speed",
        "litres": 94.14,
        "rate_per_litre": 21.24,
        "amount": 2000.00
      }
    ],
    "total_amount": 2000.00
  },
  "warnings": ["Litres value '94:14' is handwritten and interpreted as 94.14"],
  "processing_ms": 8598
}

Example: Telecom Bill

Input: a 7-page Jio Fiber PDF bill. BharatParse extracted only the billing summary - ignoring 80 rows of itemised data usage - and returned the fields that actually matter for accounting.

Input document

SEPTEMBER2025.pdf - Jio Fiber bill (page 1 of 7)

Extracted JSON

# Response for Jio Fiber bill
{
  "schema_detected": "telecom",
  "confidence": 1.0,
  "data": {
    "provider": "Jio",
    "customer_name": "Mr. Rajesh Kumar",
    "account_number": "XXXXXXXXXXXX9305",
    "due_date": "2025-09-30",
    "plan_name": "Postpaid_399_6M: Unlimited Data @ 30 Mbps",
    "vendor_gstin": "24AABCI6363G1ZP",
    "charges": {
      "current_taxable_charges": 399.00,
      "cgst_rate": 9.0, "cgst_amount": 35.91,
      "sgst_rate": 9.0, "sgst_amount": 35.91
    },
    "total_payable": 470.82
  },
  "warnings": [],
  "processing_ms": 24406
}

Example: IRCTC Train Ticket

Input: an IRCTC ERS PDF with the GST invoice on page 2. BharatParse extracted passenger details from page 1 and the full GST breakdown from page 2 in a single API call.

Input document

24th_Jan.pdf - IRCTC ERS ticket

Extracted JSON

# Response for IRCTC Tejas Express ticket
{
  "schema_detected": "travel",
  "confidence": 0.95,
  "data": {
    "pnr": "8543381796",
    "train_number": "82902",
    "train_name": "IRCTC TEJAS EXP",
    "journey_date": "2026-01-24",
    "from_station": "AHMEDABAD JN (ADI)",
    "boarding_station": "VADODARA JN (BRC)",
    "to_station": "BORIVALI (BVI)",
    "passengers": [
      { "name": "RAJESH KUMAR", "age": 67, "current_status": "WL/44" }
    ],
    "fare": { "ticket_fare": 1680.00, "total_fare": 1715.40 },
    "gst": { "igst_rate": 5.0, "igst_amount": 80.00, "total_tax": 80.00 }
  },
  "warnings": [],
  "processing_ms": 11990
}

Example: GST Invoice

Full B2B tax invoice extraction with automatic GSTIN checksum validation.

# Response for B2B GST invoice
{
  "schema_detected": "gst_invoice",
  "confidence": 0.97,
  "data": {
    "invoice_number": "INV-2024-009182",
    "invoice_date": "2024-11-15",
    "vendor": {
      "name": "Tata Power Ltd",
      "gstin": "27AAACT2727Q1ZW",
      "pan": "AAACT2727Q"
    },
    "line_items": [
      {
        "description": "IT Services",
        "hsn_sac": "998313",
        "taxable_amount": 50000.00,
        "cgst_rate": 9, "cgst_amount": 4500.00,
        "sgst_rate": 9, "sgst_amount": 4500.00
      }
    ],
    "totals": { "taxable_value": 50000, "grand_total": 59000 }
  },
  "warnings": [],
  "processing_ms": 2100
}

Error Codes

HTTP Code	Error	Cause & Fix
400	Invalid file_type	Use one of: pdf, jpeg, jpg, png, webp, tiff, tif
400	Invalid base64	Ensure file_b64 is valid base64 encoded content
400	File too large	Max file size is 20MB
429	Rate limit exceeded	You've hit your monthly quota. Upgrade your plan.
502	Gemini API error	Upstream model error. Retry after a few seconds.

Error Response Format

{
  "detail": "Unsupported file_type 'bmp'. Use: pdf, jpeg, jpg, png, webp, tiff, tif"
}

Plans & Pricing

Basic

Free forever

50 requests/month

Pro

$29

per month

500 requests/month

Ultra ★

$79

per month

2,500 requests/month

Mega

$199

per month

10,000 requests/month

Subscribe at rapidapi.com. All plans include all 13 document schemas, confidence scores, and GST validation.

FAQ

What file formats are supported?

BharatParse accepts both PDFs and images. Supported formats: PDF, JPEG (.jpeg or .jpg), PNG, WebP, and TIFF (.tiff or .tif). Maximum file size is 20MB per request. This means you can send a phone photo of a receipt (JPEG), a scanned document (PDF or TIFF), or a screenshot (PNG) - all work equally well. For best accuracy, use PDF or high-resolution JPEG at 150 DPI or above.

Does it work with handwritten receipts?

Yes. BharatParse uses an LLM rather than traditional OCR. This means it reads documents the way a person would - understanding context rather than just recognising characters. The fuel schema is specifically tuned for handwritten pump memos. Confidence scores will reflect extraction uncertainty.

How accurate is GSTIN validation?

Every GSTIN goes through full 15-character format validation and checksum verification automatically. Invalid GSTINs are flagged in the warnings array rather than silently passed through.

What happens when the document type is unclear?

Use schema: "auto". The model will identify the document type and apply the appropriate extraction schema. The detected type is returned in schema_detected.

Can I use it for non-Indian documents?

Yes - set the country parameter to your ISO country code (e.g., AE for UAE, SG for Singapore). The extraction adapts tax field names for that country, though India-specific checks like GSTIN validation won't apply.

Is my data stored?

No. Documents are processed in memory and discarded immediately after extraction. Nothing is logged or stored.

What's the average response time?

Typically 2–10 seconds depending on document complexity and size. Simple single-page bills average around 2 seconds. Multi-page PDFs like bank statements take longer.