Section8Special, Reserved, and Escape Characters
Subsection8.1Reserved Characters
One of the goals of PreTeXt is to relieve an author of managing the numerous conflicts when mixing languages that use different characters for special purposes. But, of course, XML has its own special characters.
Everybody wants the ampersand, it is the most-dangerous character. It is the escape character for XML, and LaTeX uses it to organize tables and arrays, and for aligning equations. Consistently use the element <ampersand /> to make a literal ampersand in normal text, such as in “A&P.” In mathematics, and other places where you are using LaTeX syntax, use the pre-defined \amp macro. For code listings and other verbatim text, use the escaped XML entity &.
The left angle bracket (<) is the second most-dangerous character in your source, since it looks to the XML processor like the start of a new XML element. The right angle bracket (>) is less dangerous, but for symmetry we treat it the same as the left. Consistently use the elements <less /> and <greater /> to make left and right angle brackets in normal text. In mathematics, and other places where you are using LaTeX syntax, use the pre-defined \lt and \gt macros. For code listings and other verbatim text, use the XML entities < and >.
Sage defines generators of algebraic structures with a syntax that might remind one of common notation for all “combinations” of some generators. It is non-standard Python, but is instead pre-parsed by Sage. No matter, at issue here is the left angle bracket used to specify generators. Here is an example, which can be doctested by Sage to verify the example behaves correctly. Look at the source to see how the generator syntax is created with the XML entities.
There is an alternate Sage syntax, which avoids the angle brackets.
Ampersands and angle brackets are likely to be necessary in source code, such as Sage code (think generators of field extensions) or TikZ code (think arrowheads), and in matrices (think separating entries). If you have a big matrix, or a huge chunk of TikZ code, you can protect it all at once from the XML processor by wrapping it in “<![CDATA[ ]]>.” It should be possible to write without ever using the “CDATA” mechanism, but it might get tedious in places to use the supplied macros or XML entities.
The other XML reserved characters are the quotation marks, single and double, ' and ". Their use is only constrained in attributes and so do not present a problem elsewhere. Here are the three XML reserved characters rendered as normal text, see the source to see how they were authored.
& < >
We test the three LaTeX macros for these characters with a pair of aligned equations:
\begin{align*} a^2 + b^2\amp\lt c^2\\ c^2\amp\gt a^2 + b^2 \end{align*}So as a summary of how to avoid conflicts with XML's reserved characters:
- “Normal” Text
Use <ampersand />, <less />, <greater />.
- Mathematics
Within m, me, men, and mrow elements, use \amp, \lt, \gt. Or use CDATA to enclose a large chunk of LaTeX with many of these characters.
- Verbatim, Code
Within verbatim text (c and pre elements), Sage code, program listings, and console sessions, use the XML entitites &, <, > to get exactly the characters desired.
It might be instructive to see how the paragraphs above about escape characters were written without inadvertently using an escape character improperly.
There are a handful of characters that might render just fine in HTML, but LaTeX reserves them for special purposes. So if they appear unadorned in your source, they will wreak havoc with the LaTeX processing. And if you escape them with backslashes to migrate to the LaTeX output, then you will see those backslashes in your HTML. And the backslash is the escape character for Markdown and JSON. You can't win. Thus, you need to be aware of these symbols and use the provided PreTeXt elements for each in order to get the right behavior in each type of output. Here are the outputs, look at the source of this document to see the input elements.
# $ % ^ & _ { } ~ \ ∗
Subsection8.2Pseudo-Characters and Constructions
There are a few other very common abbreviations of Latin phrases that can be achieved in HTML one way, and in LaTeX with a slightly different mechanism. These are due to LaTeX's treatment of a period (full stop), depending on its surroundings. So not reserved characters, but just divergent treatment. Again, outputs here, see the source for inputs. Using these will lead to the best quality in all your outputs. See Will Robertson's informative and arcane blog post on the topic if you want the full story for the treatment of a full stop in LaTeX.
e.g. i.e. etc. c.
There are a few other characters and marks that get special treatment. Some do not appear on your keyboard, such as the symbol for copright (and similar business or legal marks in common use). Then there are some characters that do not appear on your keyboard but frequently a keyboard character is used as a substitute. For example, a fraction bar and a forward slash (solidus and slash, respectively) have slightly different slopes. Also, compare a tilde and a swungdash. You can fake a midpoint in LaTeX by going to math mode, but the midpoint is really a text character. Again, outputs here, see the source for inputs. Using these uniformly will lead to the best quality in all your outputs, though some of these are very infrequent, or the distinctions are not always that important.
© ® ™ … · ⁓ ‰ ¶ § × / ⁄
We also distinguish between abbreviations (vs.), acronyms (SCUBA) and initialisms (XML). This is a test of the text version of a multiplication symbol: 2 × 4.
Subsection8.3URLs
An internet URL can very well contain some of the characters that LaTeX needs to escape. But the packages we use for embedded links should be smart about this. So we include a long URL for testing LaTeX output, with one reserved character, though maybe someday it will become stale and we need to change it out: www.pcc.edu/enroll/registration/dropping.html#withdraw. Notice in the source that you cannot put a tag inside the href attribute, and do need to use an element within the content (unless you like to wrap the content in a c element). Here is a totally bogus URL, which contains every possible legal character, so if this fails to convert there is some problematic character. Four combinations: with the content as normal text versus with the characters as verbatim text, and as a URL versus not.
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=
The source of the four above examples can be instructive.
Four ampersands need to be authored as &: two href attributes and two strings of verbatim text.
Two ampersands are authored as <ampersand />: two strings of normal text.
For LaTeX output, the verbatim c element will be automatically delimited by a character that is not in the string. The fault is a question mark, which you see here in the string. So we have twice used the latexsep attribute with the value | (the pipe character) which cannot ever appear in a URL.
When a url has no content, then its href attribute is displayed as the text, automatically in a verbatim font (so no need to consider the latexsep attribute in any way).
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=
We are not fans of footnotes, they are totally unstructured 1 Carleson's Theorem. A URL in a footnote migrates around, and so care must be taken with special characters, such as the percent and hash 2 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789%-._~:/?#[]@!$&'()*+,;=. This paragraph has two footnotes, one with a real URL from Jesse Oldroyd, another with a fake URL from the above suite (the fourth one). For good measure, we repeat the URL found in the first footnote: Carleson's Theorem. And we include a no-content version of the same link: https://en.wikipedia.org/wiki/Carleson%27s_theorem.
Subsection8.4Quotations
The q tag will provide beginning and ending double quotations, while the sq tag will behave similarly but provide single quotes.
“The roots of education are bitter, but the fruit is sweet.” (Aristotle)
‘It is always wise to look ahead, but difficult to look further than you can see.’ (Winston Churchill)
A large quote can be accomodated with the blockquote tag, which can carry within itself an attribution element.
The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it's a thing that forces kids to read on. You have this unconsummated feeling if you stop.
―Dr. Seuss
We say that again, to test a multiline attribution of a block quotation. Notice how the dash appears automatically, and that it is a quotation dash in HTML, distinct from other sorts of dashes.
The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it's a thing that forces kids to read on. You have this unconsummated feeling if you stop.
―Dr. Seuss
Children's Author
Sometimes a quote may extend across several paragraphs. Or a balanced pair of quotations marks crosses an XML boundary, so we need left, right, single and double versions. (For example, see Section 24 on poetry.) Here are all four in a haphazard order: ”, ‘, “, ’. These should be a last resort, and not a replacement for the q and sq tags. The left/right versions are used for the following quote from Abraham Lincoln, which we have edited into two paragraphs.
“I am not bound to win, but I am bound to be true. I am not bound to succeed, but I am bound to live by the light that I have.
I must stand with anybody that stands right, and stand with him while he is right, and part with him when he goes wrong.”
And as a tests, we try some crazy combinations of quotes, which would normally give LaTeX some trouble where the quotation marks are adjacent.
“we use ‘single quotes inside of double quotes’”
‘“double quotes inside of single quotes” with more’
“‘single quotes tight inside of double quotes’”
‘“double quotes tight inside of single quotes”’
An “‘‘“absurd test”’’” of two adjacent single quotes inside a pair of double quotes
you would never do this, but a ‘‘pair of single quotes’’
N.B. We have taken no special care to protect against interactions of the actual quote characters (described above) in LaTeX with themselves, or with the grouping tags.
Subsection8.5Groupings
It is possible to make some other groupings like quotations, such as {some emphasized text grouped within braces}, or [a Book Title inside brackets], 〈some foreign words inside angle brackets〉, or ⟦just a bit of text within double brackets⟧. Some of these are used extensively by scholars who study texts to note various restorations or deletions.
Subsection8.6Biological Names
The taxon element can be used all by itself to get an italicized scientific name, as in Escherichia coli. It can also be structured with the elements genus and species, as in using both together in Cyclops kolensis. Or the subelements can be used individually. Rules for capitalization are presently your responsibility as an author. Possible improvements include new subelements, attributes for database identifiers, and checks on capitalization. Also, we might automatically abbreviate the genus after first use.
There is an attribute, @ncbi that you can use on the taxon element to precisely identify the organism you are discussing using an identification number from the National Center for Biotechnology Information. Their taxonomy is at www.ncbi.nlm.nih.gov/taxonomy. Right now, we do not do anything with this attribute, but things like links are certainly possible. See the source of this document to see it in use with Drosophila miranda which could be used to construct a link to further information via id number or even further information via just the name.
Subsection8.7Verbatim in titles, \a&b#c%d~e{f}g$h_i^j, OK
You can test the migration of the LaTeX special characters in this section title by requesting a 2-deep Table of Contents with --stringparam toc.level 2.