Local document toolkit · macOS & Windows

Get your documents ready for AI. Without handing them over.

AI DocPrep turns PDFs, Word docs, decks, and spreadsheets into clean Markdown, and redacts the private parts. All of it happens on your computer. Nothing gets uploaded.

open source · 100% offline · free to build from source
your computer · offline
board-deck.pdf 18,412 tokens
…</w:rPr><w:t> Q3 revenue</w:t> contact j.rivera@acme.com · SSN 402-19-8823
converted + redacted here −83% tokens
board-deck.md 3,120 tokens
## Q3 revenue contact EMAIL · SSN SSN
network requests during conversion: 0
the short version

Other “convert for AI” tools upload your file to their server first.

AI DocPrep doesn't have a server. The work happens on your computer, which keeps the privacy question short. And since the code is open source, you can go read exactly what it does.

what it does

Convert, shrink, and scrub. One pass.

01

Drag, drop, done

Drop in files or whole folders, click once, get Markdown. That's the entire workflow.

02

Fully offline

No account, no upload, no analytics. Works exactly the same with Wi‑Fi turned off.

03

Fewer tokens

Office files carry invisible formatting bloat. AI DocPrep strips it, so models read more and you pay less.

04

Redaction built in

Emails, SSNs, card numbers, names, API keys: removed before the text goes anywhere.

how it works

A messy folder in, clean Markdown out.

STEP 01

Drop your files

Drag any mix of PDFs, Word docs, decks, spreadsheets, web pages, or transcripts onto the window. Folders welcome.

STEP 02

Convert & redact

AI DocPrep parses each file into clean Markdown and, if you want, removes personal details along the way. All on your own processor.

STEP 03

Use it anywhere

Paste into ChatGPT or Claude, drop into your Obsidian vault, or keep the combined master file. It's plain Markdown.

who uses it

Four kinds of people, one habit: convert first.

IndividualsYour private records

Ask AI about your medical records. Keep your SSN out of it.

Want a chatbot to explain a lab result or summarize a bank statement? AI DocPrep strips the account numbers, addresses, and IDs first, on your laptop, so you get the help without the exposure.

medicalfinancialtax
Legal & complianceClient-confidential work

Use AI on client documents and keep your duty of confidentiality.

Guidance like ABA Formal Opinion 512 expects lawyers to protect client data before it reaches an AI tool. AI DocPrep redacts privileged details locally. Files stay inside the firm, and no vendor logs a copy.

ABA 512privilegeruns on-prem
AI power usersCheaper, sharper prompts

Stop paying to send invisible XML to the model.

Raw office files are stuffed with formatting metadata that burns tokens and muddies answers. Convert first and the same document costs a fraction of the context, with structure the model can actually follow.

tokenscontext windowRAG prep
Obsidian & PKMA cleaner knowledge base

Turn a folder of documents into linkable Markdown notes.

Bulk-convert PDFs and slides into tidy notes with YAML frontmatter and a generated table of contents. Ready for Obsidian, Notion, or Logseq, and much friendlier to your vault's search and AI plugins.

frontmattertable of contentsvault-ready
privacy

It runs locally. The code is public. You can check both.

No server involved

AI DocPrep makes zero network calls during conversion. No account, no sync, no telemetry. Turn off Wi‑Fi and run it on a plane; it behaves exactly the same. Your documents stay wherever they already live.

$ network requests during conversion → 0

Read the source

The full code is public under the MIT license. You, your IT team, or anyone on the internet can read it, build it, and confirm what it does. A privacy page asks for trust. Source code settles it.

$ license → MIT · every line on GitHub
tokens

The same document, a fraction of the tokens.

Upload a raw PDF and the model pays for every page image and layout artifact. Convert it first and you send only the words. More of your document fits in the context window, and it costs less to put it there.

88% fewer tokens on a typical 40-page report, versus uploading it raw.
redaction

Scrub the private parts before anything reaches a chatbot.

Three levels of thoroughness, all local:

  • Instant patterns. Emails, phone numbers, SSNs, credit cards, API keys and secrets.
  • On-device AI. A bundled model catches names, organizations, and places.
  • Local LLM. The deepest, context-aware pass, through your own Ollama server.
formats

Built on Microsoft's MarkItDown engine.

PDF · reports, scans
DOCX · Word
PPTX · slides
XLSX · spreadsheets
HTML · web pages
VTT · transcripts

Each format gets a purpose-built converter, so tables, slides, and spreadsheets survive the trip into Markdown.

pricing

Free if you build it. Pay what you want if you'd rather not.

Same code either way. The paid download is the signed, ready-to-run build that installs in one click. It also keeps a solo developer shipping.

Build from source
Free
For tinkerers and teams who'd rather compile it themselves.
  • Full source under the MIT license
  • The complete app and command-line tool
  • No features held back
View the source ↗
The app
Pay what you want
Ready to run on macOS and Windows.
  • One-click install
  • Right-click “Convert to Markdown” in Finder & Explorer
  • Priority fixes and support
Get it on Gumroad

Direct downloads on GitHub Releases · coming to the Mac App Store and Microsoft Store

questions

Fair things to ask before you trust it.

Why not just upload the file straight to ChatGPT?

For a coffee-shop menu, go ahead. But a raw upload sends the whole file, private details included, to a vendor's servers, where it may be retained or used for training. It also wastes context on formatting the model ignores. AI DocPrep sends only clean text, and only the parts you choose to keep.

Is it actually private?

There's no server behind AI DocPrep and no account to sign in to. Conversion makes zero network requests; turn off Wi‑Fi and see for yourself. And because the code is public, your security team can read it instead of taking a policy page's word.

Do I need a vector database or RAG setup?

Usually no. Modern models hold hundreds of pages in context, so for personal and project-sized document sets, one clean Markdown file beats chunked retrieval, with zero infrastructure. If you do run RAG at scale, clean Markdown makes your chunks noticeably more accurate.

Which formats and platforms are supported?

PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx), HTML, and VTT transcripts today, on macOS and Windows. Conversion runs in parallel across your CPU, and Office temp files are skipped automatically.

What happens to files I've already converted?

Nothing you didn't ask for. AI DocPrep writes new Markdown next to your originals and keeps both by default, so it never overwrites your own notes. When combining a folder, it merges only the files it just converted; an existing Obsidian vault is never swept in.