The .lex & .lexis file format.
A small, human-readable, citation-aware file format for sharing passages and reading-states between people and across decades. Open, plain text, future-proof.
1 · Why a new format
Existing annotation formats (Highlights APIs, proprietary databases, "exports" in JSON) all share a flaw: they belong to a vendor. When the vendor disappears, the marginalia become unreadable. We wanted a format that would still open in 2074 with a text editor, and that any tool — not only Lexis — could read and write.
.lex is a single passage. .lexis is a whole reading: the work, the version, the marks. Both are plain text with a small front-matter envelope; both are citation-aware; both render legibly even if the reader has no software at all.
2 · The .lex envelope
%LEX 1.0
%CITATION Eliot, George. Middlemarch. Edinburgh: Blackwood, 1872.
%LOCATION Book III, Chapter 28, page 287, line 14
%READ-AT 2026-03-14 21:08 UTC+00
%READER Mira Aldon <m@example.net>
%TAGS marriage, casaubon, dorothea
---
> To know intense joy without a strong bodily frame,
> one must have an enthusiastic soul.
A note to myself: this is the sentence that
explains the whole of book III, in retrospect.
Not Casaubon's failure but Dorothea's hunger.
⤷ see also: Woolf, The Common Reader, p.166
%END
Six fields, one passage, one note, one cross-reference, one signature line. Anything starting with % is metadata; anything starting with > is the quoted source; everything else is the reader's own writing. A .lex file is, at heart, a letter about a passage.
3 · The .lexis envelope
A .lexis file is a small archive of .lex entries plus a manifest:
%LEXIS 1.0
%TITLE Middlemarch
%AUTHOR Eliot, George
%EDITION Penguin Classics, 2003 (ISBN 978-0-14-143954-9)
%READER Mira Aldon
%STARTED 2025-12-22
%FINISHED 2026-03-14
%MARKS 128
%PAGES 808
---
[entries/0001.lex]
[entries/0002.lex]
…
[entries/0128.lex]
%END
Each entry follows the .lex spec. The container is a tar-like ordered list — readable with any text editor; mountable as a folder by Lexis and other tools.
4 · Citations
Citations are stored as a single line of plain text in the format used by the work's authoritative bibliography (Chicago for humanities, APA for sciences, MLA for everything else). Lexis parses common forms, but the canonical store is the line as written — not a fragile structured record. A citation that a human can read is a citation a future tool can also read.
5 · Locations
Locations are durable, not numeric. Pages are noted as page-number / line-offset, but the canonical address is a short quoted phrase from the surrounding text — strong enough to survive re-typesetting, scanning errors, and edition changes.
6 · Versioning
Backward compatibility is a hard promise. Files with %LEX 1.0 will be readable by every Lexis version that ships, ever. New fields may be added under future minor versions; readers must skip unknown fields silently rather than refuse the file.
7 · Reference implementations
- lexis-rs — the canonical parser, in Rust.
- lexis.py — a 200-line Python parser; sufficient for any scripting need.
- cat / less — yes, that
cat. The format renders fine.
8 · A note on the dot-prefix
The format is to be registered as application/vnd.lex+text with IANA. The extensions .lex and .lexis are reserved for this purpose. We humbly ask that no one use them for anything else.
Continue: ‹ Manual · Changelog ›