A Roadmap for TexInfo without Info
TexInfo is a decent system for writing documentation. Its weakness is the “info” file format, which is an obsolete kludge but replacing it is a multi-pronged task.
The paper attempts to specify a replacement format and associated tooling.
I will call this format “hinfo”
as it is a replacement for info format using html syntax.
(However “hinfo” is not proposed as a file extension;
hinfo files should use the .html
or .xhtml
extensions.)
Problems with info
- Info is a non-standard format used by no-one else. Hence there is very little tooling.
- Paragraphs are pre-split into lines, so they cannot adjust to different screen widths.
- Info requires a monospace font, and so valuable semantic information is lost.
Info cannot distinguish
@samp
and@code
. It indicates@var
using upper-case. Info-reading programs have limited ability to recover the lost semantic information. Being able to use proportional fonts for descriptive text and monospaced fonts for code and examples makes documentation easier to read. - Info documentation looks ugly. Using info as the publicly visible front-end for GNU documentation presents a bare-bones and behind-the-times image for texinfo and GNU generally. This is bad marketing.
It is possible to improve html/DocBook support without deprecating or dropping info format support. However, that has its own problems:
- Installing both info files and html files wastes disk space.
- Having two primary formats for documentation is likely to lead to inconsistencies and other problems. What if one package only installs html, another package only installs info, and a third installs both? We would need more complex and brittle installation directory and search path standards.
- Which format should Emacs info prefer? If you accept what I wrote above, clearly “hinfo” rather than “info”. But in that case, why continue to install info files, or (longer-term) maintain tooling for them?
XHTML as an Info (format) replacement
The obvious replacement for Info is some variant of HTML or XML.
The format should follow the recommendations for Polyglot markup. This means documents are well-formed as both HTML and XML.
If hinfo is well-formed XML then various processing tools
(such as XSLT)
can be used to analyze or transform the output.
For that to be useful, it is a goal that hinfo contain all or most
of the semantic information from the texinfo file.
Specifically, it should include all the information currently
produced by makeinfo --xml
or makeinfo --docbook
.
That can be done with class
and other attributes.
It is hoped hinfo can make at least makeinfo --xml
obsolete.
EPUB uses the epub:type
attribute to optionally indicate semantics.
Texinfo could do the same.
Note the html currently generated by makeinfo -html
is rather poorly structured, and it should be cleaned up.
See this thread for discussion on this topic.
The Info UI in a plain browser
It would be useful to have info
keyboard shortcuts when
reading an hinfo file in a vanilla web browser. Most of the
navigation would be trivial to implement using JavaScript.
This message discusses a proof of concept.
JavaScript can also add a navigation sidebar,
using the makeinfo
-generated
table-of-contents-file, which is just a nested HTML list.
(This is the ToC format used by EPUB3.)
Below I describe a prototype that implements navigation with a “smart” sidebar, and which could be straight-forwardly enhanced to support key-board navigation and searching.
Emacs Info mode
For reading of hinfo file in Emacs we want something with the user interface of traditional info mode, but able to read hinfo files. (It does not need to process the JavaScript in an hinfo file, since elisp can be used instead,)
For the html-reading it makes sense to use eww mode. An “hinfo mode” would be a hybrid of the existing info and eww modes, with the file handling and layout mosting using eww mode, while the keymap and user interface would come from info mode.
The standalone Info program
The standalone info
program needs to be able to
read hinfo files. There are multiple web-browsers
that work in a terminal; one of these can be used.
However, it seems to make more sense to just use emacs
in terminal (-nw
) mode. We might add an option to Emacs
to leave off the menubar and other undesired “decoration”.
There is special case when the terminal emulator is
DomTerm, since DomTerm is built
on web technologies: In that case we could have DomTerm create
an <iframe>
and load the html file into it.
Installation of hinfo
There is standard for installing info files (possibly compressed) in a central location so info mode can find just given the name of a manual. The existing standard installs all info files in the same directory.
For hinfo, I think a better structure would be to have a separate directory for each manual. For example:
/usr/share/hinfo emacs index.html Search.html ... kawa index.html Tutorial.html screenshot-1.png ...
Using epub for packaging
To save disk space, it is desirable to compress each manual. The epub format is a standard for electronic books, and it satisfies texinfo’s needs pretty well. An epub file is essentially a zip archive with web pages (xhtml), images, a table of contents, and resources like css styling. There are many epub-reading devices, programs, and browser plugins.
For example the Kawa binary distribution (see final section) ships the texinfo-derived manual
in epub format.
Kawa includes a --browse-manual
option that works by starting a mini-webserver
that reads and uncompresses the epub manual, and then displays it in
a browser.
Emacs can already process zip archives, including epub files. So it shouldn’t be difficult enhance info mode to deal with epub files.
Recommendation: Change the preferred output format of makeinfo
be an epub file.
“Installing” an hinfo file would involve copying the epub
to a system location. For example:
/usr/share/hinfo emacs.epub kawa.epub ...
If someone wants to publish a manual on the web, they can just unzip the epub file into a server directory.
Prototype of a browser interface
I have implemented a JavaScript package that I believe to be a good baseline for a documentation browser. It has some missing features, most notably keyboard navigation and searching, which I will discuss in the next section. In this section I will focus on the design and features of the existing prototype.
You can try it out at http://per.bothner.com/kawa/invoke/.
To download the manual, grab http://per.bothner.com/kawa/kawa-manual.epub.
If you read the latter in an epub reader you will get the latter’s
user interface, rather than the interface discussed here. However, you
can unzip the epub file and browse OEBPS/index.html
.
The prototype manual is generated from kawa.texi
in a
rather convoluted manner, using makeinfo --docbook
,
plus the DocBook XSLT stylesheets, plus some sed
script kludges.
I hope in the future we can just do makeinfo --epub
,
as discussed later.
Smart navigation bar
On startup, the code creates a navigation sidebar.
It does this by loading the table-of-contents file into
an internal frame (<iframe>
). What makes it “smart”
is that it only display “interesting” links, rather than
displaying the entire table of contents.
An “interesting” link is the link to the current node,
its ancestors, siblings, and immediate children.
As you navigate the document, the sidebar is automatically
updated (using JavaScrpt and CSS), rather than being re-loaded.
The sidebar does include a “Table of contents” link, which takes you to the full table-of-contents.
Lazy loading of pages
Certain manuals can be very big, so it is desirable to split them into muliple web pages, and only load pages as needed.
However, this complicates (not-yet-implemented) whole-document search. Navigating back and forth may also cause wasteful re-loading of pages, depending on the browser’s caching strategy,
The implemented solution is to load each page into its
own <iframe>
(internal frame).
The “master page” stays loaded as long as the document is being read
(i.e. its window is open) and you don’t navigate from it.
When it starts up, it creates an empty placeholder <div>
element
for each page (using the table-of-contents file loaded into the sidebar).
When a page is first visited, an <iframe>
is created for the page,
and added as a child of the placeholder <div>
.
A click
event handler overrides the default action of
internal (same-document) links, by if necessary creating the <iframe>
,
and then hiding all the other pages, using CSS styling.
When a page is first loaded, same-document links are re-written
(so “hover” shows the correct URL), while external links get
a target="_blank"
attribute, so they open in a fresh window/tab.
Clean fallback when JavaScript or CSS is missing/disabled
The initial “welcome” page is not loaded in an <iframe>
,
but is directly in the top-level index.html
.
The main reason for this is to have a clean fall-back
if JavaScript is missing or disabled.
A secondary reason is to speed up loading of the welcome page.
The initial contents of the welhome are moved into a <div>
element, to make it easy to hide it when navigating to another page.
Works using http or file
The same set of files should be readable using either
directly using file:
URLs,
when served by a web server (http:
or https
),
or other mechanism (such as packaged in an archive).
Browsers enforce a “same-origin” security policity,
which limits interaction between frames.
The Google Chrome browser views file:
frames
as having different origins, so communicating between frames is restricted.
The solution is to use postMessage
to
communicate between frames.
Clean bookmarkable URLs
When a page is selected, we want to update the browser’s location bar so it contains a URL for that page. However, in general we can only update the hash part of the location, without causing the entre page to be re-loaded.
Thus when navigating to the Buffers
page in the Emacs manual,
we can’t update emacs/index.html
to emacs/Buffers.html
.
However, we can change the location bar
to end with emacs/index.html#Buffers
.
This is what the prototype does.
Such URLs work in external links, because when the document is
initially loaded, the prototype checks for a “hash” string.
If one is specified, the corresponding page is loaded.
The browser’s Back button does not yet work, but that can
presumably be fixed with the history
mechanism.
Newer browers have a history feature which gives you more flexibility
in updating the location bar. Thus you could update it to
say (for example) emacs/Buffers.html
. However, this is
limited by the “same-origin” policy: Updating http:
URLs works
on both Firefox and Google Chrome; file:
URLs work on Firefox;
file:
does not work on Google Chrome.
It may make sense to use the emacs/Buffers.html
style for
http:
and https:
, but use the emacs/index.html#Buffers
style otherwise. Especially for existing websites that have established
URLs using the emacs/Buffers.html
style.
This would require some modest changes to startup code:
Now any page can be the initial “master page”,
so index.html
is not special.
Use index.html
is for intial page
It is desirable that the initial pathname ends with index.html
because web servers generally have that as the default.
This initial page is the same as the “master page” mentioned before.
Instead of http://example.com/docs/emacs/start.html
use http://example.com/docs/emacs/index.html
,
which allows you to appreviate it as http://example.com/docs/emacs
.
The Buffers
page could be accessed as either
http://example.com/docs/emacs/index.html#Buffers
or
http://example.com/docs/emacs/#Buffers
(preferred).
Unimplemented features of the browser interface
Keyboard navigation
We should be able to read, navigate and search using the keyboard only. To the extent possible the key bindings (at least the default ones) should match those of the info program or mode.
Note that some commands will need to request a string to be
typed by the user. Do not create a popup window for this. Instead
create a temporary input field inside an absolutely-positioned
<div>
on top of the regular context. This <div>
can
contain an <input>
element or just set contenteditable
.
Allow whole-document search
Info mode and the info program allow you to “search for a sequence of characters throughout an entire Info file”.
Using an <iframe>
per page, as in the prototype,
enables searcing the whole document. It works
by having the master page load any pages that haven’t yet
be loaded, and then sending a message to each page to
have it search its own contents. Each page reports
the search result to a master page (using another postMessage
),
which takes action based on those result.
A scrolling interface
A different style option (depending on user preference) could conceptually show all the pages as one big page, allowing scrolling between them. Pages that haven’t been loaded yet would initially be represented by a empty placeholder of approximately correct height. The page is automatically loaded if the placeholder is scrolled into view.
HTML or XHTML
The existing prototype uses files with the xhtml
extension
for most of the pages, with the exception of index.html
.
This is because ereaders expect xhtml files, and the DocBook
stylesheets generate that.
Using xhtml files is mostly invisible when using the prototype,
because of URLs it makes visible look like emacs/index.html#Buffers
.
However, if we use URLs of the form emacs/Buffers.html
then it would be preferable for the actual files to have the html
extension. And this would be less confusing in general.
The EPUB3 specification states content files
SHOULD use the file extension .xhtml
., but note it does not say MUST.
What it does require is that the documents must meet the conformance constraints for XML documents and be be an [HTML5] document that conforms to the XHTML syntax. With a little care, makeinfo
can generate
an html
file that conform to both HTML and XML, and that
should be our goal.
navigation with Back button
This should be implemented using the history
mechanism.
Student Projects
The following two projects should be suitable for Google Summer of Code or similar.
Implement –xhtml and –epub output formats
The makeinfo
program converts texinfo into a number of
different output formats, including html
and docbook
.
I see this this project as these parts:
- Create a new output format
xhtml
, similar tohtml
. However, each output file has thexhtml
extension, and must conform to the syntax for the xhtml variant of HTML5. Links should use theid
attribute (not<a name="N">
).This would be a hybrid of the existing html and xml output formats.
- Clean up the generated xhtml so it is well-structured, and the
logical structure follows the xhtml follows the structure of the texinfo.
We also want xhtml format to preserve all the “interesting” information present in the texinfo source. This is so it can be used for further processing using xml tools such as xslt, and allow flexible styling with css. By presering “interesting” information we mean whatever information is currently emitted by the existing xml or docbok output formats.
- Another new output format possibly called
--phtml
for “polyglot HTML”. This could be the same as--xhtml
format, but using.html
file extensions. The important thing is that each file be valid both as XML and HTML.At some point in the future, the
--html
option could be switched to act as--phtml
. - Create a new output format
epub
output format. This is essentially the same as the xhtml or phtml output format, but all the output files (including any image files) are packaged in an epub archive, which is essentially just a zip archive with a few extra files, such as table of contents.Generating file with extension
.html
(and thus using phtml format) has practical benefits, in that one can distribute documentation as epub, and then unzip it to yield html files. (A web server can of course do this on-the-fly.)
The makeinfo
program is written in Perl,
so this project will require some familiarity with Perl.
Improve JavaScript navigation
The main goal is to re-implement the user-interface features
of the terminal-based info
program (such as convenient
keyboard navigation from a document), but in the context of a
web browser displaying xhtml, as generated by the previous
project.
The high-level summary:
- Start with the above-referenced prototype.
- Implement basic (non-search)
keyboard navigation, similar to the
info
program. - Implement search-based navigation, again similar to the
info
program. - Implement an option for URLs to have the form
emacs/Buffers.html
, rather thanemacs/index.html#Buffers
. (This should only be used forhttp:
orhttps:
.)
For more details, see the above discussion.
If the previous project (to create xhtml format) has not been completed, you can use the above-referenced Kawa manual as a test-bed.
This project requires some familiarity with JavaScript, CSS, and DOM, or interest in learning more about these technlogies.