Skip to content

hwpkit vs pyhwp, pyhwpx & olefile

If you're trying to read, parse, or edit .hwp / .hwpx files in Python, these are the options. Here's how they actually differ.

pyhwp pyhwpx olefile hwpkit
Pure Python (no Hancom / no Windows) (needs Hancom + COM)
Extract text — .hwp
Extract text — .hwpx
Edit text without corrupting the file (via Hancom)
Rewrite a stream that grew / shrank n/a
Insert an image (seal / signature) (via Hancom)
One API across .hwp and .hwpx
Runs in CI / Linux / containers

When to use which

hwpkit

Use it when you need to read or edit HWP/HWPX anywhere — Linux servers, CI, containers, serverless — with no Hancom license and no Windows. It extracts clean text (great for RAG), fills forms, ticks checkboxes, and stamps seal/signature images, then writes a file Hancom opens cleanly. One API (open_document) covers both formats.

pyhwp

The reference implementation for the binary HWP 5.0 record format and for semantic HWP → XML (OWPML) conversion. Read-oriented; it doesn't edit files in place or handle .hwpx. hwpkit learned the binary record format from reading its source — if you need a full structural HWP→XML conversion, reach for pyhwp.

pyhwpx

Automates the Hancom Office application over Windows COM. Extremely capable because it drives the real editor — but that's also the catch: it requires Windows and a Hancom installation, so it can't run on a Linux server, in CI, or in a container. Great for desktop RPA; wrong tool for a backend pipeline.

olefile

A low-level MS-CFB container reader/writer. It can read raw HWP streams and rewrite a stream only if its byte length is unchanged — which it almost never is once you insert Korean text. hwpkit uses olefile on the read side and adds the full container rewrite on top.

The short version

hwpkit is the only option that reads, edits, and stamps images into both .hwp and .hwpx in portable, pure Python — no Hancom, no Windows, no COM bridge. If your code runs anywhere other than a Windows desktop, that's the difference that matters.

Get started →