Loading post content...

Loading...

Discovering amazing open source projects

Docling: Open‑Source Document Intelligence for PDFs, Docs, Audio & More

Docling turns any file—PDFs, Word docs, presentations, spreadsheets, images, and even audio—into structured, searchable data. It offers advanced PDF layout understanding, OCR, VLM integration, and local‑only execution, giving developers a privacy‑first alternative to costly proprietary services.

OSD

Written by

Open Source Daily

August 27, 2025

7 min read

Docling: Open‑Source Document Intelligence for PDFs, Docs, Audio & More

Turning Unstructured Files into Actionable Knowledge

Every organization that works with documents—whether legal contracts, research papers, or multimedia reports—spends countless hours extracting, cleaning, and structuring data before it can be used. Commercial AI‑powered document services promise to automate this, but they often come with hefty subscription fees, vendor lock‑in, and the risk of sending sensitive information to the cloud.

Docling flips the script. It is a fully open‑source library that parses a massive variety of formats (PDF, DOCX, PPTX, XLSX, HTML, images, audio, and more) locally, delivering rich, hierarchical representations that can be fed directly into LLM pipelines, vector stores, or traditional analytics tools. No data leaves your environment, and you keep full control over costs, customization, and deployment.

github.com

GitHub - docling-project/docling: Get your documents ready for gen AI

Get your documents ready for gen AI. Contribute to docling-project/docling development by creating an account on GitHub.

docling-project.github.io

Docling - Docling

Why Choose Docling?

Privacy‑First Execution – All processing runs on‑premise or in an air‑gapped environment, eliminating the need to upload confidential files to third‑party clouds.
Broad Format Coverage – From classic office files to images, audio (ASR), and even complex visual content like charts and chemical structures, Docling handles them all under a single API.
Advanced PDF Understanding – Beyond simple text extraction, Docling detects page layout, reading order, tables, code blocks, formulas, and classifies images, delivering a faithful reconstruction of the original document.
Plug‑and‑Play AI Integration – Ready‑made adapters for LangChain, LlamaIndex, Crew AI, and Haystack let you enrich documents with VLMs (SmolDocling) or OCR models (Tesseract, RapidOCR) in a single line of code.
Extensible Export Options – Export to Markdown, HTML, lossless JSON, or custom DoclingDocument formats, making downstream processing painless.
Zero Licensing Costs – Fully MIT‑licensed, you can use Docling in commercial products without worrying about per‑page fees or usage quotas.

Spotlight on Key Features

Feature	What It Does	Why It Matters
Unified `DoclingDocument` Model	Provides a single, expressive representation for any input type, with metadata, confidence scores, and chunking information.	Simplifies downstream pipelines—no need to write format‑specific parsers.
Multimodal OCR & VLM Pipelines	Integrated OCR (Tesseract, RapidOCR) and visual language models (SmolDocling) for scanned PDFs, images, and video frames.	Turns pixel‑only content into searchable text and semantic embeddings.
Audio Transcription (ASR)	Built‑in Whisper‑compatible pipelines convert WAV/MP3 files into text, then into the same `DoclingDocument` structure.	Enables unified processing of meeting recordings, podcasts, and webinars.
Rich Export Formats	Markdown, HTML, JSON, DocTags, plus lossless JSON that preserves layout and hierarchy.	Choose the format that best fits your downstream system—no conversion headaches.
CLI & Python SDK	`docling <source>` for quick one‑off conversions; full Python API for programmatic use.	Supports both ad‑hoc tasks and large‑scale batch pipelines.
Local‑Only & Air‑Gapped Support	No external service calls required; all models can be run on CPU/GPU in your own environment.	Guarantees compliance with GDPR, HIPAA, or any internal data‑handling policy.

Getting Started

Installation

Docling is distributed via PyPI and works on macOS, Linux, and Windows (both x86_64 and arm64). Install with a single command:

pip install docling

For GPU‑accelerated OCR or VLM models, consult the installation guide for optional dependencies.

Quick Python Example

from docling.document_converter import DocumentConverter

# You can pass a local path, a URL, or a BytesIO object
source = "https://arxiv.org/pdf/2408.09869.pdf"

converter = DocumentConverter()
result = converter.convert(source)

# Export to Markdown for easy reading or further processing
print(result.document.export_to_markdown())

The snippet above fetches a PDF from arXiv, parses its full layout (including tables and figures), and prints a clean Markdown version.

Using the CLI

docling https://arxiv.org/pdf/2206.01062.pdf

The command writes a docling_output folder containing the chosen export format(s) and a JSON dump of the full document graph.

Comparison Table: Open‑Source vs. Proprietary Document AI

Capability	Docling (Open‑Source)	Adobe Acrobat Pro DC	AWS Textract	Google Document AI
Local‑Only Processing	✅ (runs on your hardware)	❌ (cloud‑based add‑ons)	❌ (cloud service)	❌ (cloud service)
Supported Formats	PDF, DOCX, PPTX, XLSX, HTML, images (PNG, JPEG, TIFF...), audio (WAV, MP3), CSV, XML, custom	PDF, limited image types	PDF, images (PNG, JPG)	PDF, images, handwritten forms
Advanced PDF Layout	Page layout, reading order, tables, code, formulas, image classification	Basic text & table extraction	Table extraction, limited layout	Form extraction, limited layout
OCR Quality	Tesseract, RapidOCR, custom models	Built‑in OCR (moderate)	High‑quality OCR (AWS)	High‑quality OCR (Google)
Audio Transcription	Whisper‑compatible ASR pipeline	❌	❌	❌
VLM Integration	SmolDocling (local) & remote VLM adapters	❌	❌	❌
Export Flexibility	Markdown, HTML, JSON, DocTags, lossless JSON	PDF, Word, Excel	JSON, CSV	JSON, TXT
Cost	Free (MIT)	Subscription (~$15/user/mo)	Pay‑per‑page ($0.0015/ page)	Pay‑per‑page (varies)
Vendor Lock‑In	None	High	Medium	High
Community & Extensibility	Active GitHub, plugins for LangChain, LlamaIndex, Haystack	Proprietary ecosystem	AWS SDKs	Google Cloud SDKs

Bottom line: If you need privacy, multimodal support, or want to avoid per‑page fees, Docling offers a compelling, cost‑free alternative without sacrificing core capabilities.

Advanced Use Cases

RAG Pipelines – Combine Docling’s chunked output with vector stores (Milvus, Weaviate, Qdrant) and LLMs for enterprise search over internal knowledge bases.
Compliance Audits – Run Docling on confidential contracts in an air‑gapped environment, extract clauses, and feed them into rule‑based engines.
Multimedia Summarization – Convert meeting recordings to text, enrich with VLM‑generated image captions, and generate concise summaries for stakeholders.
Scientific Data Extraction – Parse PDFs containing charts, formulas, and chemical structures, then export to JSON for downstream analytics.

Contributing & Community

Docling welcomes contributions ranging from bug fixes to new model integrations. The repository includes a detailed contribution guide, issue templates, and a vibrant Discord channel where developers share pipelines and custom plugins.

Report Issues: GitHub Issues
Join the Discussion: Discord community (link on the homepage)
Submit Pull Requests: Follow the dev branch workflow outlined in the repo’s CONTRIBUTING.md.

For weekly news in the tech-world check out The Infinity Dev Newsletter

theinfinity.dev

The Infinity Tech XXI

Dive into Infinity Tech XXI: Nubank’s AI insights, Qwen3 innovations, GitOps for Kubernetes, Unitree robot, and OpenAI’s ChatGPT agent breakthrough for tech enthusiasts.

Call to Action

Docling proves that you don’t need to sacrifice privacy or spend a fortune to unlock the full potential of your documents. Give it a spin, integrate it into your AI stack, and help shape the future of open‑source document intelligence.

Try the project: https://docling-project.github.io/docling
Star on GitHub: https://github.com/docling-project/docling
Read the full docs: Installation Guide

Empower your data workflows with a tool that puts you—and your data—first.

OSD

Open Source Daily

Curating the best open source projects every day. Follow us for daily discoveries of amazing tools and libraries.

AzuraCast: Your All-in-One Open Source Radio Station in a Box

Featured

Daily UseDeveloper Tools

AzuraCast: Your All-in-One Open Source Radio Station in a Box

Launch a professional web radio station in minutes with AzuraCast. This self-hosted suite handles everything from media management and auto-DJ to listener analytics and public player pages.

Dec 22, 2025•4 min read

Lokus: The Local-First Knowledge Base That Outpaces Notion and Obsidian

Featured

ProductivityDeveloper Tools

Lokus: The Local-First Knowledge Base That Outpaces Notion and Obsidian

Lokus combines a Rust-powered backend with a local-first philosophy to deliver a markdown editor that is 100x faster than traditional tools. Stop dealing with cloud lag and take back control of your personal knowledge.

Dec 21, 2025•4 min read

Anubis: The Lightweight Web AI Firewall for the Small Internet

Featured

Developer ToolsCloud

Anubis: The Lightweight Web AI Firewall for the Small Internet

Protect your website from the relentless storm of AI scrapers and bot traffic. Anubis weighs every connection with specialized challenges to ensure only humans and approved bots reach your resources.

Dec 20, 2025•4 min read

Open Source Daily

Loading post content...

Open Source Daily

Developer Tools AI

Docling: Open‑Source Document Intelligence for PDFs, Docs, Audio & More

OSD

Written by

Open Source Daily

August 27, 2025

7 min read

Turning Unstructured Files into Actionable Knowledge

github.com

GitHub - docling-project/docling: Get your documents ready for gen AI

Get your documents ready for gen AI. Contribute to docling-project/docling development by creating an account on GitHub.

docling-project.github.io

Docling - Docling

Why Choose Docling?

Privacy‑First Execution – All processing runs on‑premise or in an air‑gapped environment, eliminating the need to upload confidential files to third‑party clouds.
Broad Format Coverage – From classic office files to images, audio (ASR), and even complex visual content like charts and chemical structures, Docling handles them all under a single API.
Advanced PDF Understanding – Beyond simple text extraction, Docling detects page layout, reading order, tables, code blocks, formulas, and classifies images, delivering a faithful reconstruction of the original document.
Plug‑and‑Play AI Integration – Ready‑made adapters for LangChain, LlamaIndex, Crew AI, and Haystack let you enrich documents with VLMs (SmolDocling) or OCR models (Tesseract, RapidOCR) in a single line of code.
Extensible Export Options – Export to Markdown, HTML, lossless JSON, or custom DoclingDocument formats, making downstream processing painless.
Zero Licensing Costs – Fully MIT‑licensed, you can use Docling in commercial products without worrying about per‑page fees or usage quotas.

Spotlight on Key Features

Feature	What It Does	Why It Matters
Unified `DoclingDocument` Model	Provides a single, expressive representation for any input type, with metadata, confidence scores, and chunking information.	Simplifies downstream pipelines—no need to write format‑specific parsers.
Multimodal OCR & VLM Pipelines	Integrated OCR (Tesseract, RapidOCR) and visual language models (SmolDocling) for scanned PDFs, images, and video frames.	Turns pixel‑only content into searchable text and semantic embeddings.
Audio Transcription (ASR)	Built‑in Whisper‑compatible pipelines convert WAV/MP3 files into text, then into the same `DoclingDocument` structure.	Enables unified processing of meeting recordings, podcasts, and webinars.
Rich Export Formats	Markdown, HTML, JSON, DocTags, plus lossless JSON that preserves layout and hierarchy.	Choose the format that best fits your downstream system—no conversion headaches.
CLI & Python SDK	`docling <source>` for quick one‑off conversions; full Python API for programmatic use.	Supports both ad‑hoc tasks and large‑scale batch pipelines.
Local‑Only & Air‑Gapped Support	No external service calls required; all models can be run on CPU/GPU in your own environment.	Guarantees compliance with GDPR, HIPAA, or any internal data‑handling policy.

Getting Started

Installation

Docling is distributed via PyPI and works on macOS, Linux, and Windows (both x86_64 and arm64). Install with a single command:

pip install docling

For GPU‑accelerated OCR or VLM models, consult the installation guide for optional dependencies.

Quick Python Example

from docling.document_converter import DocumentConverter

# You can pass a local path, a URL, or a BytesIO object
source = "https://arxiv.org/pdf/2408.09869.pdf"

converter = DocumentConverter()
result = converter.convert(source)

# Export to Markdown for easy reading or further processing
print(result.document.export_to_markdown())

The snippet above fetches a PDF from arXiv, parses its full layout (including tables and figures), and prints a clean Markdown version.

Using the CLI

docling https://arxiv.org/pdf/2206.01062.pdf

The command writes a docling_output folder containing the chosen export format(s) and a JSON dump of the full document graph.

Comparison Table: Open‑Source vs. Proprietary Document AI

Capability	Docling (Open‑Source)	Adobe Acrobat Pro DC	AWS Textract	Google Document AI
Local‑Only Processing	✅ (runs on your hardware)	❌ (cloud‑based add‑ons)	❌ (cloud service)	❌ (cloud service)
Supported Formats	PDF, DOCX, PPTX, XLSX, HTML, images (PNG, JPEG, TIFF...), audio (WAV, MP3), CSV, XML, custom	PDF, limited image types	PDF, images (PNG, JPG)	PDF, images, handwritten forms
Advanced PDF Layout	Page layout, reading order, tables, code, formulas, image classification	Basic text & table extraction	Table extraction, limited layout	Form extraction, limited layout
OCR Quality	Tesseract, RapidOCR, custom models	Built‑in OCR (moderate)	High‑quality OCR (AWS)	High‑quality OCR (Google)
Audio Transcription	Whisper‑compatible ASR pipeline	❌	❌	❌
VLM Integration	SmolDocling (local) & remote VLM adapters	❌	❌	❌
Export Flexibility	Markdown, HTML, JSON, DocTags, lossless JSON	PDF, Word, Excel	JSON, CSV	JSON, TXT
Cost	Free (MIT)	Subscription (~$15/user/mo)	Pay‑per‑page ($0.0015/ page)	Pay‑per‑page (varies)
Vendor Lock‑In	None	High	Medium	High
Community & Extensibility	Active GitHub, plugins for LangChain, LlamaIndex, Haystack	Proprietary ecosystem	AWS SDKs	Google Cloud SDKs

Bottom line: If you need privacy, multimodal support, or want to avoid per‑page fees, Docling offers a compelling, cost‑free alternative without sacrificing core capabilities.

Advanced Use Cases

RAG Pipelines – Combine Docling’s chunked output with vector stores (Milvus, Weaviate, Qdrant) and LLMs for enterprise search over internal knowledge bases.
Compliance Audits – Run Docling on confidential contracts in an air‑gapped environment, extract clauses, and feed them into rule‑based engines.
Multimedia Summarization – Convert meeting recordings to text, enrich with VLM‑generated image captions, and generate concise summaries for stakeholders.
Scientific Data Extraction – Parse PDFs containing charts, formulas, and chemical structures, then export to JSON for downstream analytics.

Contributing & Community

Report Issues: GitHub Issues
Join the Discussion: Discord community (link on the homepage)
Submit Pull Requests: Follow the dev branch workflow outlined in the repo’s CONTRIBUTING.md.

For weekly news in the tech-world check out The Infinity Dev Newsletter

theinfinity.dev

The Infinity Tech XXI

Dive into Infinity Tech XXI: Nubank’s AI insights, Qwen3 innovations, GitOps for Kubernetes, Unitree robot, and OpenAI’s ChatGPT agent breakthrough for tech enthusiasts.

Call to Action

Try the project: https://docling-project.github.io/docling
Star on GitHub: https://github.com/docling-project/docling
Read the full docs: Installation Guide

Empower your data workflows with a tool that puts you—and your data—first.

OSD

Open Source Daily

Curating the best open source projects every day. Follow us for daily discoveries of amazing tools and libraries.

Featured

Daily UseDeveloper Tools

AzuraCast: Your All-in-One Open Source Radio Station in a Box

Launch a professional web radio station in minutes with AzuraCast. This self-hosted suite handles everything from media management and auto-DJ to listener analytics and public player pages.

Dec 22, 2025•4 min read

Featured

ProductivityDeveloper Tools

Lokus: The Local-First Knowledge Base That Outpaces Notion and Obsidian

Dec 21, 2025•4 min read

Featured

Developer ToolsCloud

Anubis: The Lightweight Web AI Firewall for the Small Internet

Protect your website from the relentless storm of AI scrapers and bot traffic. Anubis weighs every connection with specialized challenges to ensure only humans and approved bots reach your resources.

Dec 20, 2025•4 min read

Loading...

Turning Unstructured Files into Actionable Knowledge

GitHub - docling-project/docling: Get your documents ready for gen AI

Docling - Docling

Why Choose Docling?

Spotlight on Key Features

Getting Started

Installation

Quick Python Example

Using the CLI

Comparison Table: Open‑Source vs. Proprietary Document AI

Advanced Use Cases

Contributing & Community

The Infinity Tech XXI

Call to Action

Open Source Daily

Related Posts

AzuraCast: Your All-in-One Open Source Radio Station in a Box

Lokus: The Local-First Knowledge Base That Outpaces Notion and Obsidian

Anubis: The Lightweight Web AI Firewall for the Small Internet

Subscribe to our Newsletter

Turning Unstructured Files into Actionable Knowledge

GitHub - docling-project/docling: Get your documents ready for gen AI

Docling - Docling

Why Choose Docling?

Spotlight on Key Features

Getting Started

Installation

Quick Python Example

Using the CLI

Comparison Table: Open‑Source vs. Proprietary Document AI

Advanced Use Cases

Contributing & Community

The Infinity Tech XXI

Call to Action

Open Source Daily

Related Posts

AzuraCast: Your All-in-One Open Source Radio Station in a Box

Lokus: The Local-First Knowledge Base That Outpaces Notion and Obsidian

Anubis: The Lightweight Web AI Firewall for the Small Internet

Subscribe to our Newsletter