OpenDataLoader PDF

Safe, Open, High-Performance — PDF for AI OpenDataLoader PDF converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG). It reconstructs document layout (headings, lists, tables, and reading order) so the content is easier to chunk, index, and query. Powered by fast, heuristic, rule-based inference, it runs entirely on your local machine and delivers high-throughput processing for large document sets. AI-safety is enabled by default and automatically filters likely prompt-injection content embedded in PDFs to reduce downstream risk.

Overview

Integration details

Class	Package	Local	Serializable	JS support
OpenDataLoader PDF	langchain-opendataloader-pdf	✅	❌	❌

Loader features

Source	Document Lazy Loading	Native Async Support
OpenDataLoaderPDFLoader	✅	❌

The OpenDataLoaderPDFLoader component enables you to parse PDFs into structured Document objects.

Requirements

Python >= 3.9
Java 11 or newer available on the system PATH
opendataloader-pdf >= 1.1.1

Installation

pip install -U langchain-opendataloader-pdf

Quick start

from langchain_opendataloader_pdf import OpenDataLoaderPDFLoader

loader = OpenDataLoaderPDFLoader(
    file_path=["path/to/document.pdf", "path/to/folder"],
    format="text"
)
documents = loader.load()

for doc in documents:
    print(doc.metadata, doc.page_content[:80])

Parameters

Parameter	Type	Required	Default	Description
`file_path`	`List[str]`	✅ Yes	—	One or more PDF file paths or directories to process.
`format`	`str`	No	`None`	Output formats (e.g. `"json"`, `"html"`, `"markdown"`, `"text"`).
`quiet`	`bool`	No	`False`	Suppresses CLI logging output when `True`.
`content_safety_off`	`Optional[List[str]]`	No	`None`	List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`).

Additional Resources

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Overview

Integration details

Loader features

Requirements

Installation

Quick start

Parameters

Additional Resources

Popular Providers

Integrations by component

​Overview

​Integration details

​Loader features

​Requirements

​Installation

​Quick start

​Parameters

​Additional Resources

Overview

Integration details

Loader features

Requirements

Installation

Quick start

Parameters

Additional Resources