Changelog¶
All notable changes to hwpkit are documented here. The format is based
on Keep a Changelog, and this
project adheres to Semantic Versioning.
1.0.0 — 2026-06-09¶
First stable release. The public API is now committed to Semantic Versioning: no breaking changes without a 2.0.
Added¶
open_document(path)— opens a.hwpor.hwpx(detected by container content) and returns a uniform editor. One calling pattern across both formats; downstream code never branches on format.HwpFile— an object-oriented editor for binary.hwpmirroringHwpxFile:open/paragraphs/describe/paragraph_text/inject_text/replace_text/swap_in_para_text/place_image/save. Paragraphs are indexed across allBodyTextsections.
Notes¶
- Backward compatible: the functional helpers (
fill_hwp, therecords-list editors, the file-to-fileplace_image, and all extraction functions) are unchanged and remain supported. - Development status promoted to Production/Stable.
0.2.1 — 2026-06-09¶
Docs & discoverability — no code changes; fully compatible with 0.2.0.
Changed¶
- Rewrote the README around both formats and the pure-Python / no-Hancom story, with a clearer feature comparison and use-case framing.
- Expanded PyPI keywords and classifiers (HWPX, OWPML, RAG/LLM, NLP, RPA, Python 3.13); bumped Development Status to Beta; sharpened the summary.
0.2.0 — 2026-06-09¶
The dual-format release: hwpkit now spans both Hancom serializations —
binary .hwp and XML .hwpx — through one API, and gains image
insertion on both. .hwp and .hwpx are two encodings of one document
model, now documented in docs/OBJECT_MODEL.md.
Added¶
.hwpx(OWPML) support — a newhwpkit.hwpxmodule:extract_text_from_hwpx(path)— clean text from every section in document order, table-cell content included.HwpxFile— an editor mirroring the binary verbs:inject_text,replace_text,swap_in_para_text, plusdescribe()/paragraphs()for locating fields.save()re-serializes only the parts that changed and copies the rest byte-for-byte (mimetype kept stored and first).fill_hwpx(in, out, edit_fn)— the one-call analogue offill_hwp.- Image insertion on both formats:
place_image(in, out, img, paragraph_index, width_mm=…)for binary.hwp— adds aBinDatastream, registers aBIN_DATArecord + bumps theID_MAPPINGScount, and anchors agsopicture object.HwpxFile.place_image(paragraph_index, img, width_mm=…)for.hwpx— adds theBinData/part, registers it incontent.hpf(isEmbeded="1"), and anchors an inline<hp:pic>.- Image extents follow the shared HWPUNIT rule (pixels × 75 = 7200/96).
- CFB entry insertion —
cfb.add_stream,cfb.add_storage, andcfb.find_entry, built on a correct red-black-tree insert (so Hancom's on-open directory-tree validation passes). This is what makes adding a newBinDatastream possible without corrupting the container. - Unified text extraction —
extract_text_from_file(path)detects.hwpvs.hwpxby container content (not extension); thehwpkit-textCLI dispatches the same way. - Docs —
docs/OBJECT_MODEL.md(binary-record ↔ OWPML map and the shared HWPUNIT unit);docs/RECORD_FORMAT.mdgains an "Embedded images" section;docs/GOTCHAS.mdgains §5 on image-extent sizing. - Optional extras —
hwpkit[hwpx](lxml),hwpkit[image](Pillow),hwpkit[full]. Both heavy deps are imported lazily, so the core install stays at justolefile.
Notes¶
- Backward compatible: every existing import and function is unchanged.
lxml/Pilloware only required for the features that use them; a binary-only.hwpworkflow needs neither.
0.1.4 — 2026-05-19¶
- Credit Ebenworks in the README footer and PyPI sidebar.
[0.1.1] – [0.1.3]¶
- Packaging and presentation polish: logo and GitHub social preview, absolute logo URL for PyPI rendering, expanded metadata/keywords, PyPI trusted-publishing workflow, badge cleanup.
0.1.0¶
- Initial release (renamed from
hwpedit): binary HWP 5.0 plain-text extraction, form filling (inject_text/replace_text/swap_in_para_text), the CFB rewriter that preserves directory-tree topology, per-scriptCharShapefont control, and thehwpkit-inspect/hwpkit-textCLIs.