ishani.org v.2016

<< Projects

Kindlmatic

Update 2014

I had assumed that this tool would be obsolete by now, however it seems that nicely formatting Kindle books in Microsoft Word is still a well-discussed fiddly process .. to the degree that people are even offering paid services to fix and streamline the conversion.

I have just tried Kindlmatic with the latest Kindlegen (2.9) and exporting my example content from Word 2013 - all still seems to work just fine!


Original Project

The Kindlmatic is a little HTML conditioning tool, designed to make writing eBooks using Microsoft Word a little easier. It was designed to streamline working with Word 2010 and the Amazon Kindle, but should help with any reader and conversion tool that accepts OPF/NCX/HTML inputs. Underneath, it’s just a big pile of regular expressions.

Word exports to HTML and tries to retain all the styling and layout present in the original document. Because Kindle doesn’t support the full gamut of CSS and HTML layout options, many things may end up being ignored and what you see on device may not match up with your original document. By using some constrained styling and pushing the exported result through Kindlmatic, hopefully most of the differences can be sifted away. Plus you should find your eBooks are smaller too!

alt text

The app does the following:

  • Stripping superfluous tags, inline styling, metadata and font-face specifications (Kindle doesn’t care that you want your titles in Comic Sans, sorry)

  • Provides a mechanism for retaining original artwork on export – Word seems to relentlessly compress, resize or generally screw about with embedded images. I can find no reliable way of stopping it doing so, so Kindlmatic offers to patch exported images back to their source files, ensuring the best-quality artwork in the resulting eBook.

  • Auto-slicing a single export into separate Front Matter, TOC and Content files – required for use with OPF/NCX manifest files to take advantage of all the Go To...* features on Kindle.

An example book is available alongside the source code. This contains a set of styles and configurations based on Aaron Shepard’s excellent guide to Kindle publishing (available archived), as well as demonstrating how to configure the book to take advantage of the artwork patching and auto-slicing (see ‘how-to’ below)

Kindlmatic has been built and tested against Microsoft Word 2010′s “Filtered HTML” export (as recommended by Amazon).

The source code is available on GitHub.
A built executable and example content can be downloaded here.

From Word 2010 to Kindle

Here is a quick end-to-end guide to using Word 2010 and Kindlmatic to produce an eBook with as little fuss as possible.

Use Simple Styling

The example book (.docx) in the Kindlmatic package has a set of styles setup that try to closely match what the Kindle will accept and display. The key to getting your content to look the same on-device is to use as little complex styling as possible – don’t rely on fancy fonts, the Kindle lets the reader choose a font to render with, so just stick with something simple. Stay within the 10pt – 18pt font size range. Don’t use styling options like ‘all caps’, ‘small caps’, and so on, as they won’t be exported to HTML.

Keep it simple and you’ll have less to fix later on.

Manual Indenting

By default, the Kindle renderer will add indenting to new paragraphs. If you have added paragraph indents yourself, they will double up. The approach I’ve used in the example book is to set Normal text to have a tiny 0.01cm first-line indent, as described in the Shepard article. This forces the Kindle to render paragraphs as left-aligned, so you can manually control the indenting.

I can also recommend turning off automatic indenting when using Tab. Word’s Auto-Formatting will change the first-line indent using inline styling in the exported HTML, which will then get stripped out by Kindlmatic. I prefer to let the tabs be converted to non-breaking spaces.

In Word, navigate to Options -> Proofing -> Auto-Correct Options… -> AutoFormat As You Type and ensure the option shown below is disabled.

alt text

Keeping Original Artwork

I could not figure out how to make Word just emit the original images I was embedding in my document. Turning off the image compression options made no difference. PNG artwork would be recompressed into JPEG and variably resized – both of which can damage the quality of the displayed image on Kindle. Amazon recommend authoring artwork at 300DPI for future-proofing, so I wanted to be able to store these high quality, high-resolution artwork in my eBook without Word getting in the way.

Bafflingly, if I added an Alt Text tag to an image, it would be emitted as a PNG when exporting. Without the tag, it was written out as a JPG. I have no idea why Word chooses to do that.

alt text Anyway, to activate the patching, set any inserted images to link to their original source files using a hyperlink. Right-click on an image, choose Hyperlink… from the menu and find the file you used to insert into the document. Kindlmatic will find these linked images and modify the HTML links so they refer back to the original file as chosen. The hyperlink will be removed in the process.

Kindlmatic will also report the image paths that it has patched as it runs, so you can make sure it has found them correctly.

Table Of Contents and “Auto-Slicing”

Although Amazon’s Kindlegen tool can convert from a single HTML page, to produce a “proper” .MOBI file with all the appropriate metadata embedded in it you need to use an OPF file. This is an XML manifest file, describing the book metadata (title, description, ISBN, author, etc) as well as it’s hierarchy and high-level layout. To support the Go To... features on the Kindle (as well as embedded cover art in the .MOBI), you need to have an OPF (as far as I can tell, at least)

An OPF can bind several HTML files together to form a final eBook package. Things like the TOC need to be held in a completely separate page, it seems, otherwise the Kindle doesn’t know how to jump to it.

Kindlmatic can be used to automatically slice up an exported HTML file into 3 sections:

  • Front Matter – everything up to the TOC, eg. title, dedication, prologue.
  • TOC - a list of hyperlinks to chapters in the main book content.
  • Content – the main book content itself, eg. starting at Chapter 1

To do this, create a Table Of Contents page using Word’s References tab. Turn off page numbers (as they make no sense when the book is on a Kindle), keep it simple.

TOC

Once generated, add a title to the page (eg. “Contents”), select the title and use Insert -> Bookmark -> type “TOC”. This anchor will let Kindlmatic know where the TOC begins so it can cut it out of the exported HTML.

Check the OPF and NCX files included with Kindlmatic for a simple example; it collects the split content together, sets up links to the TOC and beginning of the book, and so on.

Export to ASCII

When exporting to HTML, open the Tools drop-down and choose Web Options... -> Encoding. Set the document encoding to US-ASCII, ensuring any Unicode characters are written out with appropriate HTML tokens.

Run Kindlmatic

Give Kindlmatic the exported HTML to process.

kindlmatic.exe book.htm

If it finds a TOC anchor (and everything else goes to plan) it will emit 3 files:

  • book.welcome.html
  • book.toc.html
  • book.content.html

The names of which are referenced in the OPF file. When you run the tool against your own exported files, make sure to change filenames in the corresponding OPF. Pass the OPF to Kindlegen,

kindlegen.exe book.opf

To generate the final .MOBI file ready for testing on-device or in Kindle Previewer.

Example