A Roadmap for TexInfo without Info
TexInfo is a decent system for writing documentation. Its weakness is the “info” file format, which is an obsolete kludge but replacing it is a multi-pronged task.
The paper attempts to specify a replacement format and associated tooling.
I will call this format “hinfo”
as it is a replacement for info format using html syntax.
(However “hinfo” is not proposed as a file extension;
hinfo files should use the
Problems with info
- Info is a non-standard format used by no-one else. Hence there is very little tooling.
- Paragraphs are pre-split into lines, so they cannot adjust to different screen widths.
- Info requires a monospace font, and so valuable semantic information is lost.
Info cannot distinguish
@code. It indicates
@varusing upper-case. Info-reading programs have limited ability to recover the lost semantic information. Being able to use proportional fonts for descriptive text and monospaced fonts for code and examples makes documentation easier to read.
- Info documentation looks ugly. Using info as the publicly visible front-end for GNU documentation presents a bare-bones and behind-the-times image for texinfo and GNU generally. This is bad marketing.
It is possible to improve html/DocBook support without deprecating or dropping info format support. However, that has its own problems:
- Installing both info files and html files wastes disk space.
- Having two primary formats for documentation is likely to lead to inconsistencies and other problems. What if one package only installs html, another package only installs info, and a third installs both? We would need more complex and brittle installation directory and search path standards.
- Which format should Emacs info prefer? If you accept what I wrote above, clearly “hinfo” rather than “info”. But in that case, why continue to install info files, or (longer-term) maintain tooling for them?
XHTML as an Info (format) replacement
The obvious replacement for Info is some variant of HTML or XML.
The format should follow the recommendations for Polyglot markup. This means documents are well-formed as both HTML and XML.
If hinfo is well-formed XML then various processing tools
(such as XSLT)
can be used to analyze or transform the output.
For that to be useful, it is a goal that hinfo contain all or most
of the semantic information from the texinfo file.
Specifically, it should include all the information currently
makeinfo --xml or
That can be done with
class and other attributes.
It is hoped hinfo can make at least
makeinfo --xml obsolete.
EPUB uses the
epub:type attribute to optionally indicate semantics.
Texinfo could do the same.
Note the html currently generated by
is rather poorly structured, and it should be cleaned up.
See this thread for discussion on this topic.
The Info UI in a plain browser
It would be useful to have
info keyboard shortcuts when
reading an hinfo file in a vanilla web browser. Most of the
This message discusses a proof of concept.
table-of-contents-file, which is just a nested HTML list.
(This is the ToC format used by EPUB3.)
Below I describe a prototype that implements navigation with a “smart” sidebar, and which could be straight-forwardly enhanced to support key-board navigation and searching.
Emacs Info mode
For the html-reading it makes sense to use eww mode. An “hinfo mode” would be a hybrid of the existing info and eww modes, with the file handling and layout mosting using eww mode, while the keymap and user interface would come from info mode.
The standalone Info program
info program needs to be able to
read hinfo files. There are multiple web-browsers
that work in a terminal; one of these can be used.
However, it seems to make more sense to just use emacs
in terminal (
-nw) mode. We might add an option to Emacs
to leave off the menubar and other undesired “decoration”.
There is special case when the terminal emulator is
DomTerm, since DomTerm is built
on web technologies: In that case we could have DomTerm create
<iframe> and load the html file into it.
Installation of hinfo
There is standard for installing info files (possibly compressed) in a central location so info mode can find just given the name of a manual. The existing standard installs all info files in the same directory.
For hinfo, I think a better structure would be to have a separate directory for each manual. For example:
/usr/share/hinfo emacs index.html Search.html ... kawa index.html Tutorial.html screenshot-1.png ...
Using epub for packaging
To save disk space, it is desirable to compress each manual. The epub format is a standard for electronic books, and it satisfies texinfo’s needs pretty well. An epub file is essentially a zip archive with web pages (xhtml), images, a table of contents, and resources like css styling. There are many epub-reading devices, programs, and browser plugins.
For example the Kawa binary distribution (see final section) ships the texinfo-derived manual
in epub format.
Kawa includes a
--browse-manual option that works by starting a mini-webserver
that reads and uncompresses the epub manual, and then displays it in
Emacs can already process zip archives, including epub files. So it shouldn’t be difficult enhance info mode to deal with epub files.
Recommendation: Change the preferred output format of
be an epub file.
“Installing” an hinfo file would involve copying the epub
to a system location. For example:
/usr/share/hinfo emacs.epub kawa.epub ...
If someone wants to publish a manual on the web, they can just unzip the epub file into a server directory.
Prototype of a browser interface
You can try it out at http://per.bothner.com/kawa/invoke/.
To download the manual, grab http://per.bothner.com/kawa/kawa-manual.epub.
If you read the latter in an epub reader you will get the latter’s
user interface, rather than the interface discussed here. However, you
can unzip the epub file and browse
The prototype manual is generated from
kawa.texi in a
rather convoluted manner, using
plus the DocBook XSLT stylesheets, plus some
sed script kludges.
I hope in the future we can just do
as discussed later.
Smart navigation bar
On startup, the code creates a navigation sidebar.
It does this by loading the table-of-contents file into
an internal frame (
<iframe>). What makes it “smart”
is that it only display “interesting” links, rather than
displaying the entire table of contents.
An “interesting” link is the link to the current node,
its ancestors, siblings, and immediate children.
As you navigate the document, the sidebar is automatically
updated (using JavaScrpt and CSS), rather than being re-loaded.
The sidebar does include a “Table of contents” link, which takes you to the full table-of-contents.
Lazy loading of pages
Certain manuals can be very big, so it is desirable to split them into muliple web pages, and only load pages as needed.
However, this complicates (not-yet-implemented) whole-document search. Navigating back and forth may also cause wasteful re-loading of pages, depending on the browser’s caching strategy,
The implemented solution is to load each page into its
<iframe> (internal frame).
The “master page” stays loaded as long as the document is being read
(i.e. its window is open) and you don’t navigate from it.
When it starts up, it creates an empty placeholder
for each page (using the table-of-contents file loaded into the sidebar).
When a page is first visited, an
<iframe> is created for the page,
and added as a child of the placeholder
click event handler overrides the default action of
internal (same-document) links, by if necessary creating the
and then hiding all the other pages, using CSS styling.
When a page is first loaded, same-document links are re-written
(so “hover” shows the correct URL), while external links get
target="_blank" attribute, so they open in a fresh window/tab.
The initial “welcome” page is not loaded in an
but is directly in the top-level
The main reason for this is to have a clean fall-back
A secondary reason is to speed up loading of the welcome page.
The initial contents of the welhome are moved into a
element, to make it easy to hide it when navigating to another page.
Works using http or file
The same set of files should be readable using either
when served by a web server (
or other mechanism (such as packaged in an archive).
Browsers enforce a “same-origin” security policity,
which limits interaction between frames.
The Google Chrome browser views
as having different origins, so communicating between frames is restricted.
The solution is to use
communicate between frames.
Clean bookmarkable URLs
When a page is selected, we want to update the browser’s location bar so it contains a URL for that page. However, in general we can only update the hash part of the location, without causing the entre page to be re-loaded.
Thus when navigating to the
Buffers page in the Emacs manual,
we can’t update
However, we can change the location bar
to end with
This is what the prototype does.
Such URLs work in external links, because when the document is
initially loaded, the prototype checks for a “hash” string.
If one is specified, the corresponding page is loaded.
The browser’s Back button does not yet work, but that can
presumably be fixed with the
Newer browers have a history feature which gives you more flexibility
in updating the location bar. Thus you could update it to
say (for example)
emacs/Buffers.html. However, this is
limited by the “same-origin” policy: Updating
http: URLs works
on both Firefox and Google Chrome;
file: URLs work on Firefox;
file: does not work on Google Chrome.
It may make sense to use the
emacs/Buffers.html style for
https:, but use the
style otherwise. Especially for existing websites that have established
URLs using the
This would require some modest changes to startup code:
Now any page can be the initial “master page”,
index.html is not special.
index.html is for intial page
It is desirable that the initial pathname ends with
because web servers generally have that as the default.
This initial page is the same as the “master page” mentioned before.
which allows you to appreviate it as
Buffers page could be accessed as either
Unimplemented features of the browser interface
We should be able to read, navigate and search using the keyboard only. To the extent possible the key bindings (at least the default ones) should match those of the info program or mode.
Note that some commands will need to request a string to be
typed by the user. Do not create a popup window for this. Instead
create a temporary input field inside an absolutely-positioned
<div> on top of the regular context. This
<input> element or just set
Allow whole-document search
Info mode and the info program allow you to “search for a sequence of characters throughout an entire Info file”.
<iframe> per page, as in the prototype,
enables searcing the whole document. It works
by having the master page load any pages that haven’t yet
be loaded, and then sending a message to each page to
have it search its own contents. Each page reports
the search result to a master page (using another
which takes action based on those result.
A scrolling interface
A different style option (depending on user preference) could conceptually show all the pages as one big page, allowing scrolling between them. Pages that haven’t been loaded yet would initially be represented by a empty placeholder of approximately correct height. The page is automatically loaded if the placeholder is scrolled into view.
HTML or XHTML
The existing prototype uses files with the
for most of the pages, with the exception of
This is because ereaders expect xhtml files, and the DocBook
stylesheets generate that.
Using xhtml files is mostly invisible when using the prototype,
because of URLs it makes visible look like
However, if we use URLs of the form
then it would be preferable for the actual files to have the
extension. And this would be less confusing in general.
The EPUB3 specification states content files
SHOULD use the file extension
.xhtml., but note it does not say MUST.
What it does require is that the documents must meet the conformance constraints for XML documents and be be an [HTML5] document that conforms to the XHTML syntax. With a little care,
makeinfo can generate
html file that conform to both HTML and XML, and that
should be our goal.
navigation with Back button
This should be implemented using the
The following two projects should be suitable for Google Summer of Code or similar.
Implement –xhtml and –epub output formats
makeinfo program converts texinfo into a number of
different output formats, including
I see this this project as these parts:
- Create a new output format
xhtml, similar to
html. However, each output file has the
xhtmlextension, and must conform to the syntax for the xhtml variant of HTML5. Links should use the
This would be a hybrid of the existing html and xml output formats.
- Clean up the generated xhtml so it is well-structured, and the
logical structure follows the xhtml follows the structure of the texinfo.
We also want xhtml format to preserve all the “interesting” information present in the texinfo source. This is so it can be used for further processing using xml tools such as xslt, and allow flexible styling with css. By presering “interesting” information we mean whatever information is currently emitted by the existing xml or docbok output formats.
- Another new output format possibly called
--phtmlfor “polyglot HTML”. This could be the same as
--xhtmlformat, but using
.htmlfile extensions. The important thing is that each file be valid both as XML and HTML.
At some point in the future, the
--htmloption could be switched to act as
- Create a new output format
epuboutput format. This is essentially the same as the xhtml or phtml output format, but all the output files (including any image files) are packaged in an epub archive, which is essentially just a zip archive with a few extra files, such as table of contents.
Generating file with extension
.html(and thus using phtml format) has practical benefits, in that one can distribute documentation as epub, and then unzip it to yield html files. (A web server can of course do this on-the-fly.)
makeinfo program is written in Perl,
so this project will require some familiarity with Perl.
The main goal is to re-implement the user-interface features
of the terminal-based
info program (such as convenient
keyboard navigation from a document), but in the context of a
web browser displaying xhtml, as generated by the previous
The high-level summary:
- Start with the above-referenced prototype.
- Implement basic (non-search)
keyboard navigation, similar to the
- Implement search-based navigation, again similar to the
- Implement an option for URLs to have the form
emacs/Buffers.html, rather than
emacs/index.html#Buffers. (This should only be used for
For more details, see the above discussion.
If the previous project (to create xhtml format) has not been completed, you can use the above-referenced Kawa manual as a test-bed.