Little eBook Format

tl;dr. There should be an ebook format that is significantly simpler than ePub. It should be human-readable (no XML or LaTeX or other fancy tags) and to create an ebook in this format you should need nothing more than a text editor and gzip. At the same time it should be slightly more advanced than the plain text files that Project Gutenberg is famous for–there should be a standard way to indicate emphasis and block quotes, there should not be hard line breaks, there should be a navigable table of contents, and so forth. To this end I propose a format which is the lovechild of Comic Book Archive (.cbr and .cbz files) and MultiMarkDown called “Little eBook.” A .le file is a zipped archive of sequentially-named text files and images. Ideally a dedicated application would be used to read .le books and take care of certain formatting niceties (indenting out block quotes, bolding text that is marked for emphasis with asterisks, and so forth). However, it should be possible to simply read the raw text as-is in a text editor: there’s no distinction between the source and the “compiled” book (apart from zip compression.)

The name is a homage to Little Blue Books, which I am a fan of.

This is not an original idea; many people have had ideas for lightweight ebook markup systems, or creative uses of Markdown. See Fountain, and CriticMarkup. Also, the hard part would be actually coding apps that display .le files nicely, not describing the idea. But this has been bugging me for a while so I thought I’d write it down.

A few things I’ve seen recently have led me to believe that most ebook formats are too complicated for many uses. There’s a lot of overhead in them that is simply not necessary for, for example, a Jane Austen ebook.

For example, I have been noticing bad-looking and weirdly over-complex ebooks on the iBookstore, created with Apple’s iBooks Author, and reading horror stories about creating ePubs and .mobis. There’s got to be a better way!

The workflow for creating ebooks is complicated and if you’re creating a layout-heavy book, with a lot of graphics and tables, good luck to you. You need complicated. But I think there’s probably a better solution for electronic versions of regular books that just present sequential text.

For these text-oriented books, what’s needed is a format that is only slightly more sophisticated than the text files that Project Gutenberg has been producing since the 1970s. ePub has a lot of overhead that is overkill for most traditional books. Your average book in a bookstore has zero illustrations, zero tables, and a very simple structure. It’s text with a little bit of metadata and a cover. Plain text files, by contrast, do not provide support for consistent marking of emphasis, internal structure, metadata, and the like.

My initial thought was to just use Markdown. The design philosophy of Markdown is perfect:

> The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

But Markdown does not support a few features that many simple books use, such as footnotes. MultiMarkDown, an extension of Markdown, does, and does so in a way that does not sacrifice readability. At first I though that the even MultiMarkDown format would need to be modifed to suppport tables of contents or other “book” things. But tables of contents can be generated by having a reader application look at the existing metadata.

It would be possible to just require a reader to scan the file, look for headers, and build a ToC out of that. Or, you can require the ebook creator to manually create a table of contents using MultiMarkDown’s support for tables and automatic cross-referencing. But there’s a simpler way that I think has added benefits.

Take what would otherwise be a big long MultiMarkDown file and break it up into smaller text files and sequentially number them (the basic way Comic Book Archive files work). (The files might be given .md extensions instead of .txt.) The beginining of each of these files has any headers that the reader app looks for to build the ToC. Any header info not at the begining of a file is ignored for ToC purposes. If a file has no header info it’s just treated as a continuation of the previous file.

Breaking up the text into different files makes editing easier since you can just load up a particular part of a book, and processing the headers in this way means that not all headers have to be part of the ToC. Also, it means that a reader app can just load up (and process the MultiMarkDown of) smaller portions of a larger work. There is no need to chug through a file looking for headers; they’re always going to be in a specific place. (The header at the beginning of a file would use normal Markdown conventions and could translate to H1, H2, H3 etc; the ToC would reflect that hierarchy. Also, there could be multiple headers at the beginning of a file–so there could be a chapter then a subchapter, and then a new file for the next subchapter.)

Since all of the text files are in MultiMarkdown format a reader should display them accordingly (use actual bold and not asterisks, don’t actually display metadata, etc). Breaking up the book into different files makes it so this work doesn’t all need to be done up front.

Other than this business of splitting the single book into multiple text files the book is treated as one big MultiMarkDown document; e.g. metadata at the begining of the first file applies to the work as a whole.

One final point on breaking up the book into multiple files: There are already going to be multiple files if you have any graphics. The idea here is that you take these files, zip them all up together and give it an extension of .le and you’re done. (Many modern “file formats” are just compressed archives of folders.)

Add to the archive a cover.jpg or .png that reader applications may choose to display. This cover file can be linked to like any other image but it needs a special name if readers are to know where to look for cover art.

It is unlikely for this to go anywhere–the ebook community would have to embrace the format just as the comic community embraced Comic Book Archives and frankly, there’s a benefit to just using ePub for everything even when it’s overkill. A format like this does not and probably could not every support pinpoint citations, and this also could make it hard to keep track of page-read positions when syncing between devices and other niceties.

On the other hand, John Gruber’s Markdown (which I have been used since it was announced in 2004) and Fletcher Penney’s MultiMarkdown have been more successful than I ever would have thought. There is a cottage industry of cool Markdown-related software like Marked and MultiMarkdown Composer (which supports ePub export).

It may be possible to just do all of this with a single text file. Maybe the days of readers and editors chugging away at giant blocks of text is over and it would be easy to automatically generate a navigable table of contents. In which case all of this reduces to “just use MultiMarkDown.” However, there’s still good reason to compress a text file and package it in an archive along with any graphic resources.

Just as Comic Book Archives complement PDFs I think there’s a role to play for a simple, human-readable ebook format like this, and I very much want there to be one.