Amazon Kindle - Command line
tools for converting .txt to .html to .mobi
Additional Information
Introduction
Both txt2pdbdoc & pyrpub convert HTML losing all HTML links
including
Table of Contents. But, you get a .pdb or AKA .prc file
without using any proprietory tools or dealing with the requisites
of installing X or Qt.
Calibre (ebook-convert) does a better formating job & preserving
links. However, it still requires X running, unlong with Qt,
even though ebook-convert is a command line tool. As such, I
tend to avoid this solution. However, the author of Calibre
seems to be really into trying to understand the .mobi format.
(See the MobileRead Calibre
forums for more information.)
Another, much better option for converting text to HTML, is
something new called Grutatxt. I think it does an awesome
job! The resulting HTML file is readily compatible with the
kindlegen tool!
TXT2PDBDoc
app-text/txt2pdbdoc
http://homepage.mac.com/pauljlucas/software/txt2pdbdoc/
Text/HTML to Doc file converter for the Palm Pilot
Opinion: Not an easy method for converting these simple text/html
files, but this open source solution's output is less flawed
compared to PyrPub.
If you're working with an HTML file, you'll first need to use
txt2pdbdoc's tools to convert the HTML file to TEXT first, then
convert the text file to PDB/PRC.
1) Convert HTML file to TEXT file
html2pdbtxt -u [Optional URL]
[HTML File] [Output Text file]
$ html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
2) Convert TEXT file to PDB file
txt2pdbdoc [Document Name]
[Input Text file] [Output PDB File]
$ txt2pdbdoc `head -1
alice.txt` alice.txt alice.pdb
Lastly, rename the .pdb to .mobi (or .prc) suffix as the Kindle only
sees the .mobi or .prc suffixes.
$ cp alice.pdb alice.mobi
PyrPub
app-text/pyrpub
http://www.pyrite.org/publisher/
content conversion tool for Palm
Personal Opinion: Probably the most simplest open source method for
converting Text/HTML to PDB/PRC.
Convert HTML to .pdb and rename as the Kindle prefers the .prc
suffix.
$ pyrpub blah.html
$ mv blah.pdb blah.prc (or
blah.mobi)
Calibre
app-text/calibre
http://calibre-ebook.com/
Ebook management application.
Opinion: Largely bloated code involving Python and Qt.
Requires 10-20 package dependecies and cannot execute ebook-convert
without running X! I strongly
suggest using one of the more simpler methods for convert text/html
files to prc/mobi. The author however, has done quite a bit of
python coding & research of the MOBI file specification. I
would just suggest he be a little more structured when coding rather
then linking in the whole world to perform such simple tasks!
Convert an .html file to .mobi
$ ebook-convert file.html
blah.mobi --output-profile kindle_dx
To convert all *.html files within the current folder to .mobi:
$ for file in ./*.html; do
ebook-convert $file `basename $file .html`".mobi" --output-profile
kindle_dx ; done
See for further options:
http://manual.calibre-ebook.com/cli/ebook-convert-3.html#html-input-to-mobi-output
Can use --authors "blah".
(Anybody know how to specify a \n null field for author without
getting
"Unknown" on the device?)
Grutatxt & KindleGen to convert basic
ASCII Text files to HTML
app-text/grutatxt
http://triptico.com/software/grutatxt.html
A converter from plain text to HTML and other markup languages
Opinion: Another failsafe method, with Grutatxt providing some troff
like formating qualities to basic text. (You might be able to
use pyrpub to convert the resulting HTML, rather then using
KindleGen (aka MobiGen)
This is a better option then the above txt2pdbdoc and other text to
HTML options. It's very quick, lightweight on system
resources. Only downside, need kindlegen to convert .html to
.mobi. And, the output of grutatxt seems to be very kindlegen
friendly!
First, edit the text file and indent all lists (per grutatxt man
page). With this trick, grutatxt to treats all indents as
lists. This, in turn, prevents grutatxt from omitting the
newline (carriage return) at the end of each list item creating a
paragraph instead. We don't want a paragraph for lists.
We want lists, and this trick can be used to prevent any lines of
text from getting their newline stripped. Later, if you don't
like the look of indentions or free space prefixing your lists
within the final HTML, you can edit the HTML page and simple delete
the white space within the lists (between the <pre> tags).
To convert the text file into a new HTML file:
$ grutatxt < MyFile.txt >
MyFile.html
To add a Title, edit the resulting HTML file with a text editor and
enter a Title within the <Title></Title> HTML
tags. ie:
<Title>A Stone for Danny Fisher</Title>.
To add an Author, add the following entry directly below the first
and only <META> HTML tag entry found at the top of the HTML
file:
<meta name="Author" content="Harold Robbins">
Use kindlegen to convert the .html to a .mobi file:
$ kindlegen MyFile.html
Copy the file to your Kindle device:
$ copy MyFile.mobi to
/media/Kindle/documents/recipes/
Convert the Grutatxt README.bz2 to HTML
$ bzcat
/usr/share/doc/grutatxt-2.0.14/README.bz2 | grutatxt >
grutatxt.html
Additional Information
For creating a clickable table of contents (TOC), you just add HTML
anchor tags, as you would in any other HTML document. But
getting the Kindle to jump to the TOC (and jump between chapters via
joystick) requires a .NCX file. The .NCX file is an XML file
to define jumps for the joystick. Also, an .OPF file which is
a manifest file defining the cover image and tells the kindlegen
where to find the HTML and .NCX files.
An excellent guide, (but might need to email her for a copy in .mobi
format?)
Kindle Formating
(Thanks to James Martin for the above info relayed via Amazon Kindle
Publishing Forums!)
Amazon Kindle Publishing Guidelines: Find and download a copy
AmazonKindlePublishingGuidelines.pdf, as it instructs how to create
a HTML document to pass to kindlegen.
Lazy Methods of creating an HTML page:
Use Seamonkey (Mozilla) Composer and
type or copy text into, then save the page. Or, use
Seamonkey to view an existing page and then save the page.
This saves the page, along with all it's additional image files
into a subfolder. Use kindlegen on the initial or main HTML
file saved.
Use VIM :TOhtml<CR>, to
convert an existing open text document to HTML. Now you need
to edit the HTML and remove the font/background colors (and font
styles if you wish) else you'll see a blank file. Save the
HTML file using :write. A quick sed method of dropping color
tags:
$ sed
's/\(background-color\|color\): #[A-Z|a-z|0-9]\{6\}; //g'
yourfile.html
MOBI file specification: Can be found within Calibre's sources
as "calibre/calibre/format_docs
/pdb/mobi.txt" or on my server as mobi.txt
or mobi-format.mobi.
The author/maintainer of Calibre regularly updates this file and the
latest can be found at his site.
TODO: XML to HTML documentation needs to be done. (ie: Gentoo XML
documentation to MOBI/EPUB)