Why Does Formatting Keep Breaking?
Honestly, every time I convert Word or PDF to Markdown, I hold my breath. Last week I helped a colleague convert a 50-page technical doc. Thought it'd be one click and done. Opened the output: tables turned to garbage, flowcharts vanished, even code block indentation was gone. Frustrating, right?
The thing is, Markdown was designed to be "lightweight." It doesn't natively support Word's fancy formatting. But don't panic—after years of painful lessons, I've found some solid fixes.
Problem 1: Tables Turn Into Gibberish
This is the most common issue. Complex Word tables (merged cells, multi-row headers) often come out completely broken.
The Fix
If you're using Pandoc, try adding this parameter:
pandoc input.docx -f docx -t gfm --extract-media=./media -o output.md
Think of it as giving your converter a "HD lens." If it's still messy, here's my go-to hack: screenshot complex tables. Instead of spending an hour tweaking Markdown table syntax, take 1 minute to screenshot. Preserves accuracy and saves time.
Also, doc2markdown.com recently improved its table engine. It automatically breaks merged cells into standard Markdown format. Not 100% perfect, but at least readable.
Problem 2: Images Go Missing
"The Word doc had images, but after conversion there's just filenames?"
That's because Markdown doesn't store images—it stores links. During conversion, images need to be extracted to a folder.
How to Recover Images
- Use extraction flags: Like
--extract-mediamentioned above—it automatically pulls images into a folder. - Absolute vs relative paths: Often images aren't lost, just wrongly linked. Check your
![]()paths. Using absolute paths that break when shared? Switch to relative paths likeimages/pic1.png.
Problem 3: Special Characters Become Garbage
Ever seen © turn into ``? Or math formulas like $\alpha$ showing as raw code?
Usually an encoding issue. Make sure both source file and editor use UTF-8. For math, if your converter supports MathJax, great. Otherwise, wrap formulas in LaTeX syntax—most modern Markdown editors can render it.
Pro Tip: Right Tool, Half the Effort
Manually fixing formatting is exhausting. If your doc is full of complex formatting:
- Simplify at source: Strip unnecessary styles in Word first.
- Use specialized tools: Don't trust "universal" converters. Converting academic papers? Use LaTeX-optimized tools. Converting blog posts? Use doc2markdown—it's optimized for web.
There's no silver bullet for format conversion, but with these tricks, you'll save hours. Next time formatting goes haywire, try these before rewriting from scratch!