Amazon Kindle - Command line tools for converting .txt to .html to .mobi


TXT2PDBDoc

PyrPub

Calibre

Grutatxt

Additional Information

Introduction

Both txt2pdbdoc & pyrpub convert HTML losing all HTML links including
Table of Contents.  But, you get a .pdb or AKA .prc file without using any proprietory tools or dealing with the requisites of installing X or Qt.

Calibre (ebook-convert) does a better formating job & preserving links.  However, it still requires X running, unlong with Qt, even though ebook-convert is a command line tool.  As such, I tend to avoid this solution.  However, the author of Calibre seems to be really into trying to understand the .mobi format.  (See the MobileRead Calibre forums for more information.)

Another, much better option for converting text to HTML, is something new called Grutatxt.  I think it does an awesome job!  The resulting HTML file is readily compatible with the kindlegen tool!

TXT2PDBDoc

app-text/txt2pdbdoc
http://homepage.mac.com/pauljlucas/software/txt2pdbdoc/
Text/HTML to Doc file converter for the Palm Pilot

Opinion: Not an easy method for converting these simple text/html files, but this open source solution's output is less flawed compared to PyrPub.

If you're working with an HTML file, you'll first need to use txt2pdbdoc's tools to convert the HTML file to TEXT first, then convert the text file to PDB/PRC.

1) Convert HTML file to TEXT file
html2pdbtxt -u [Optional URL] [HTML File] [Output Text file]

$ html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt

2) Convert TEXT file to PDB file
txt2pdbdoc [Document Name] [Input Text file] [Output PDB File]

$ txt2pdbdoc `head -1 alice.txt` alice.txt alice.pdb

Lastly, rename the .pdb to .mobi (or .prc) suffix as the Kindle only sees the .mobi or .prc suffixes.

$ cp alice.pdb alice.mobi

PyrPub

app-text/pyrpub
http://www.pyrite.org/publisher/
content conversion tool for Palm

Personal Opinion: Probably the most simplest open source method for converting Text/HTML to PDB/PRC.

Convert HTML to .pdb and rename as the Kindle prefers the .prc suffix.

$ pyrpub blah.html

$ mv blah.pdb blah.prc (or blah.mobi)

Calibre

app-text/calibre
http://calibre-ebook.com/
Ebook management application.

Opinion:  Largely bloated code involving Python and Qt.  Requires 10-20 package dependecies and cannot execute ebook-convert without running X!  I strongly suggest using one of the more simpler methods for convert text/html files to prc/mobi.  The author however, has done quite a bit of python coding & research of the MOBI file specification.  I would just suggest he be a little more structured when coding rather then linking in the whole world to perform such simple tasks!

Convert an .html file to .mobi

$ ebook-convert file.html blah.mobi --output-profile kindle_dx


To convert all *.html files within the current folder to .mobi:

$ for file in ./*.html; do ebook-convert $file `basename $file .html`".mobi" --output-profile kindle_dx ; done


See for further options:
http://manual.calibre-ebook.com/cli/ebook-convert-3.html#html-input-to-mobi-output

Can use --authors "blah".
(Anybody know how to specify a \n null field for author without getting
"Unknown" on the device?)

Grutatxt & KindleGen to convert basic ASCII Text files to HTML

app-text/grutatxt
http://triptico.com/software/grutatxt.html
A converter from plain text to HTML and other markup languages

Opinion: Another failsafe method, with Grutatxt providing some troff like formating qualities to basic text.  (You might be able to use pyrpub to convert the resulting HTML, rather then using KindleGen (aka MobiGen)

This is a better option then the above txt2pdbdoc and other text to HTML options.  It's very quick, lightweight on system resources.  Only downside, need kindlegen to convert .html to .mobi.  And, the output of grutatxt seems to be very kindlegen friendly!

First, edit the text file and indent all lists (per grutatxt man page).  With this trick, grutatxt to treats all indents as lists.  This, in turn, prevents grutatxt from omitting the newline (carriage return) at the end of each list item creating a paragraph instead.  We don't want a paragraph for lists.  We want lists, and this trick can be used to prevent any lines of text from getting their newline stripped.  Later, if you don't like the look of indentions or free space prefixing your lists within the final HTML, you can edit the HTML page and simple delete the white space within the lists (between the <pre> tags).

To convert the text file into a new HTML file:
$ grutatxt < MyFile.txt > MyFile.html


To add a Title, edit the resulting HTML file with a text editor and enter a Title within the <Title></Title> HTML tags.  ie:
<Title>A Stone for Danny Fisher</Title>.

To add an Author, add the following entry directly below the first and only <META> HTML tag entry found at the top of the HTML file:

<meta name="Author" content="Harold Robbins">

Use kindlegen to convert the .html to a .mobi file:
$ kindlegen MyFile.html

Copy the file to your Kindle device:
$ copy MyFile.mobi to /media/Kindle/documents/recipes/


Convert the Grutatxt README.bz2 to HTML
$  bzcat /usr/share/doc/grutatxt-2.0.14/README.bz2 | grutatxt > grutatxt.html

Additional Information

For creating a clickable table of contents (TOC), you just add HTML anchor tags, as you would in any other HTML document.  But getting the Kindle to jump to the TOC (and jump between chapters via joystick) requires a .NCX file.  The .NCX file is an XML file to define jumps for the joystick.  Also, an .OPF file which is a manifest file defining the cover image and tells the kindlegen where to find the HTML and .NCX files.

An excellent guide, (but might need to email her for a copy in .mobi format?)
Kindle Formating

(Thanks to James Martin for the above info relayed via Amazon Kindle Publishing Forums!)


Amazon Kindle Publishing Guidelines:  Find and download a copy AmazonKindlePublishingGuidelines.pdf, as it instructs how to create a HTML document to pass to kindlegen.


Lazy Methods of creating an HTML page:

Use Seamonkey (Mozilla) Composer and type or copy text into, then save the page.  Or, use Seamonkey to view an existing page and then save the page.  This saves the page, along with all it's additional image files into a subfolder.  Use kindlegen on the initial or main HTML file saved.

Use VIM :TOhtml<CR>, to convert an existing open text document to HTML.  Now you need to edit the HTML and remove the font/background colors (and font styles if you wish) else you'll see a blank file.  Save the HTML file using :write.  A quick sed method of dropping color tags:

$ sed 's/\(background-color\|color\): #[A-Z|a-z|0-9]\{6\}; //g' yourfile.html


MOBI file specification:  Can be found within Calibre's sources as "calibre/calibre/format_docs
/pdb/mobi.txt" or on my server as mobi.txt or mobi-format.mobi.  The author/maintainer of Calibre regularly updates this file and the latest can be found at his site.

TODO: XML to HTML documentation needs to be done. (ie: Gentoo XML documentation to MOBI/EPUB)