HTML, EPUB, MOBI, PDF, WTF – creating an ebook

Yesterday I launched my new book, The Profitable Side Project Handbook. The launch went really well, I’ve already had some lovely feedback, and I’ll be writing about some of the things I learned while self-publishing and marketing my own book, over the next few weeks.

This post is about the baffling world of ebook formats, and what I learned while wrangling my book into shape. Partly so I remember next time I need to build a revision, but also because this stuff took me some time to figure out, and if it helps someone else that would be excellent. If you have also been through this, or know about converting from other formats, please add your advice in the comments.

Writing the book

I wrote the first draft in Scrivener. I really like Scrivener, in particular for work like this that consists of lots of different sections. I could work on whichever section took my fancy each day. This is a great way to prevent any writer’s block or to save procrastination due to things that needed more research before writing. I made the decision to not get distracted by formatting issues while I was writing, and just to focus on the words.

Having created a first draft I used the compilation functionality in Scrivener to send a first draft to contributors and people who were reviewing the book, and then made any changes in Scrivener. It was at this point that all the fun started.

Formats for ebooks

The formats that you would ideally create for ebooks are PDF, EPUB – used by devices such as the iPad, and MOBI – used by the Kindle. While it is possible to just read a PDF on various devices, it isn’t as a nice an experience than a proper ebook format. As my book is text heavy, I wanted to give people a good experience on each device.

Format once, publish everywhere

Something that I see as a huge benefit of self-publishing in digital only format it the ability to make corrections. If I spot issues, or the landscape for something I mention in the book changes dramatically then I want to be able to create an update and send it out to purchasers – rather than maintaining an errata list as you would do with a physical book. I wanted to ensure that in creating multiple formats I didn’t make it impossible to easily rebuild an updated book.

Scrivener can output to HTML, and the EPUB and MOBI formats are HTML, so it made sense to compile my book as an HTML document and work from there, finding a way to create all three formats out of the HTML. I could then keep my HTML copy in source control, make any changes as I needed to and rebuild the book formats when I felt an update was required.

Getting to EPUB

I tackled EPUB first, I had read a post on the Puppetlabs site about how they created the formats for their ebook. This led me to pandoc, an excellent document converter that can transform a number of formats into other formats.

Pandoc is a commandline tool. It is very simple to use, download the package for your system and install it.

Pandoc will use your HTML (or Markdown) heading levels to create the chapters and outline for your book. Level 1 headings will become chapters. Pandoc can then create you a table of contents for the epub, displaying your tree as deep as you like. This means that you need to make sure that your document is well structured in your HTML or Markdown.

Inside the folder where your book HTML is you can also add a cover image, CSS to style the book and a metadata xml file, this file needs to include the following.

<dc:title id="t1">The Profitable Side Project Handbook</dc:title>
  <dc:language>en-GB</dc:language>
  <dc:creator opf:file-as="Andrew, Rachel" opf:role="aut">Rachel Andrew</dc:creator>
  <dc:publisher>Rachel Andrew</dc:publisher>
  <dc:date opf:event="publication">2014-01-06</dc:date>
  <dc:rights>Copyright ©2004 by Rachel Andrew</dc:rights>

You can then build your book by opening a terminal window, changing to your book directory and running the following command.

pandoc -o book.epub book.html —epub-metadata=metadata.xml
—toc —toc-depth=2 —epub-stylesheet=book.css —epub-cover-image=cover.png

You should then end up with an EPUB that you can open up in iBooks or any other reader that supports this format. A couple of things to note.

I encountered a bug in pandoc where it did not read my custom CSS file, I found a discussion on this and it seems that this is a bug and if you run into it, adding your CSS file named epub.css into a hidden directory .pandoc in my home folder (I had to create this directory) fixes the issue.

The second issue becomes a problem in the next stage, in that if you have a title in your HTML document as well as in the meta xml file, you get two titles and creating your MOBI will fail. I just removed the title from the HTML document.

It is worth reading the full breakdown of commands on the Puppetlabs post as they do a great job of explaining everything.

If you are having issues or just want to poke around your EPUB, then it is simply an archive with .epub extension. I found that BetterZip on the Mac allowed me to open the archive and have a look at the contents. I have seen suggestions that some tools will require you to change the extension to .zip before they will open it.

Making a MOBI

This next step is easy once you have your epub. Download the KindleGen tool from Amazon. It is another command line tool but you are there already, so you may as well continue.

I unpacked the archive and placed the KindleGen tool into /usr/local/bin on my Mac which should work for you if you are also on a Mac.

Back on the command line, still in my folder run the KindleGen command:

kindlegen book.epub

As if by magic a MOBI file appears! The only issue I had was with my duplicate titles as mentioned above, however the error messaging from the KindleGen tool was pretty good and I was able to figure out the issue.

If you are hitting up against errors then the first thing to check would be that your epub file is valid. I used a tool called FlightCrew which is basic, but does the job.

I should note that the less you mess about with formatting for the Kindle the better. I eventually ended up using different stylesheets for the EPUB that I shipped and for the EPUB I used to create the MOBI. The Kindle does a pretty nice job of making readable books, the less we mess with it the better.

The PDF

I had thought that the previous two formats would be the difficult ones, however going from HTML into other HTML formats is not so tricky. My troubles really began when I tried to create a PDF.

Pandoc can generate a PDF, however to style it you need to use LaTex which I really didn’t want to have to get into when I already had a nice looking HTML document styled with CSS.

I had read the article on How to write a book by Jonathan Snook on how he was using the commercial, and rather expensive, Prince software to generate his book. Prince looked like a great option but I saw in the comments there were other tools available. As my book was all text – no images or code to worry about – surely a basic tool could do the job.

I installed wkhtmltopdf a tool that uses webkit to convert HTML pages to PDF. The OS X version doesn’t use libraries that would enable the generation of page number or a table of contents so I ended up building my own TOC in HTML first (I actually opened up the EPUB, grabbed the one that had been generated and fixed the links).

If you want to try wkhtmltopdf yourself then installation is straightforward and it is another commandline tool. On the Mac to run it, change into the wkhtmltopdf directory:

cd /Applications/wkhtmltopdf.app/Contents/MacOS/

Then run it:

./wkhtmltopdf -B 30mm -T 20mm /Users/rachel/path_to_book/book.html /Users/rachel/path_to_book/book.pdf

You will get a PDF of your book. The user manual for wkhtmltopdf explains the options, such as the -B for margin-bottom and -T for margin-top as used in my command above.

WTF!

This all seemed to be working nicely enough, until I realised that the PDF that was being generated was 180MB. I tried some options, tried to compress the PDF via various means, with no results. A quick Google confirmed I was not the only person with this issue.

The problem seemed unresolvable, it was Sunday night and I wanted to launch Monday morning. The book was ready and I was going to be held up by a PDF. Suddenly that expensive bit of software didn’t seem so bad! I downloaded the non-profit version that adds a logo just to check that Prince would solve the issue.

Prince is also a commandline tool on the Mac but is essentially a drop-in replacement for wkhtmltopdf. Download it, install it then run:

prince book.html -o book.pdf

A non-giant PDF will appear! Having confirmed it would work I then purchased a license, the site does not deliver licenses immediately so I was faced with a 48 hour delay. Knowing that Jonathan Snook had a copy I send him a begging email to see if he would generate my PDF for me, which he very kindly did.

Jonathan also helped me out with a CSS snippet that would create numbering and improved page sizes in my PDF, as my license for Prince came through overnight I was able to quickly rebuild the PDF to take advantage of that before launching the book. I can recommend the A List Apart article Building Books with CSS if you are using Prince and want to take advantage of the Paged Media spec that Prince supports.

I hope that is all useful to anyone else searching for answers. Jonathan takes a slightly different approach in his 24 ways article, so that is another useful resource.

If you have also worked through this issue let me know in the comments any useful tips you might have to share.

4 Comments

Eliot January 18, 2014 Reply

I’ve been wondering whether to use HTML for writing a book and how to create an acceptable PDF conversion.

This is the first time I’ve heard about Prince, and the $495 price tag was a little surprising. Did you see the Prince download page links to an online conversion service called DocRaptor that has slightly more palatable pricing? Have you tried it? It wasn’t clear if DocRaptor is using the same underlying software as Prince.

Oliver Doepner March 15, 2014 Reply

Hi Rachel,

I read your article with interest because I am looking for an efficient approach to “write once, convert to many formats” authoring, too.

Have you tried AsciiDoc?
http://asciidoctor.org/docs/what-is-asciidoc/

It is touted as being as powerful as DocBook but without the XML clutter. Free and Open Source authoring and conversion tools are available.

Thanks
Oliver

Pradeep Verma September 13, 2014 Reply

Exactly what I was looking for, thanks Rachel!

Bojan May 10, 2015 Reply

How about convert to docx (using pandoc), and then use libre office to convert docx to pdf?

Leave a Reply