Sortix 1.1dev ports manual
This manual documents Sortix 1.1dev ports. You can instead view this document in the latest official manual.
MANDOC_ESCAPE(3) | Library Functions Manual | MANDOC_ESCAPE(3) |
NAME
mandoc_escape — parse roff escape sequencesSYNOPSIS
#include <sys/types.h>#include <mandoc.h> enum mandoc_esc
mandoc_escape(const char **end, const char **start, int *sz);
DESCRIPTION
This function scans a roff(7) escape sequence. An escape sequence consists of- an initial backslash character (‘\’),
- a single ASCII character called the escape sequence identifier,
- and, with only a few exceptions, an argument.
- In brackets: [argument]
- The argument starts after the initial ‘[’, ends before the final ‘]’, and the escape sequence ends with the final ‘]’.
- Two-character argument short form: (ar
- This form can only be used for arguments consisting of exactly two characters. It has the same effect as [ar].
- One-character argument short form: a
- This form can only be used for arguments consisting of exactly one character. It has the same effect as [a].
- Delimited form: CargumentC
- The argument starts after the initial delimiter character C, ends before the next occurrence of the delimiter character C, and the escape sequence ends with that second C. Some escape sequences allow arbitrary characters C as quoting characters, some restrict the range of characters that can be used as quoting characters.
- recursively by itself, because some escape sequence arguments can in turn contain other escape sequences,
- for error detection internally by the roff(7) parser part of the mandoc(3) library, see the file roff.c,
- above all externally by the mandoc formatting modules, in particular -Tascii and -Thtml, for formatting purposes, see the files term.c and html.c,
- and rarely externally by high-level utilities using the mandoc library, for example makewhatis(8), to purge escape sequences from text.
RETURN VALUES
Upon function return, the pointer end is set to the character after the end of the escape sequence, such that the calling higher-level parser can easily continue. For escape sequences taking an argument, the pointer start is set to the beginning of the argument and sz is set to the length of the argument. For escape sequences not taking an argument, start is set to the character after the end of the sequence and sz is set to 0. Both start and sz may beNULL
; in that case, the argument and
the length are not returned.
For sequences taking an argument, the function
mandoc_escape() returns one of the following
values:
ESCAPE_FONT
- The escape sequence \f taking
an argument in standard form: \f[,
\f(,
\fa.
Two-character arguments starting with the character ‘C’ are
reduced to one-character arguments by skipping the ‘C’. More
specific values are returned for the most commonly used arguments:
argument return value R or 1 ESCAPE_FONTROMAN
I or 2 ESCAPE_FONTITALIC
B or 3 ESCAPE_FONTBOLD
P ESCAPE_FONTPREV
BI ESCAPE_FONTBI
ESCAPE_SPECIAL
- The escape sequence \C taking
an argument delimited with the single quote character and, as a special
exception, the escape sequences not having an
identifier, that is, those where the argument, in standard form, directly
follows the initial backslash: \C',
\[, \(,
\a. Note
that the one-character argument short form can only be used for argument
characters that do not clash with escape sequence identifiers.
If the argument matches one of the forms described below under
ESCAPE_UNICODE
, that value is returned instead. TheESCAPE_SPECIAL
special character escape sequences can be rendered using the functions mchars_spec2cp() and mchars_spec2str() described in the mchars_alloc(3) manual. ESCAPE_UNICODE
- Escape sequences of the same format as described above
under
ESCAPE_SPECIAL
, but with an argument of the forms uXXXX, uYXXXX, or u10XXXX where X and Y are hexadecimal digits and Y is not zero: \C'u, \[u. As a special exception, start is set to the character after the u, and the sz return value does not include the u either. Such Unicode character escape sequences can be rendered using the function mchars_num2uc() described in the mchars_alloc(3) manual. ESCAPE_NUMBERED
- The escape sequence \N
followed by a delimited argument. The delimiter character is arbitrary
except that digits cannot be used. If a digit is encountered instead of
the opening delimiter, that digit is considered to be the argument and the
end of the sequence, and
ESCAPE_IGNORE
is returned. Such ASCII character escape sequences can be rendered using the function mchars_num2char() described in the mchars_alloc(3) manual. ESCAPE_OVERSTRIKE
- The escape sequence \o followed by an argument delimited by an arbitrary character.
ESCAPE_IGNORE
-
- The escape sequence \s followed by an argument in standard form or by an argument delimited by the single quote character: \s', \s[, \s(, \sa. As a special exception, an optional ‘+’ or ‘-’ character is allowed after the ‘s’ for all forms.
- The escape sequences \F, \g, \k, \M, \m, \n, \V, and \Y followed by an argument in standard form.
- The escape sequences \A, \b, \D, \R, \X, and \Z followed by an argument delimited by an arbitrary character.
- The escape sequences
\H, \h,
\L, \l,
\S, \v, and
\x followed by an argument delimited by a
character that cannot occur in numerical expressions. However, if any
character that can occur in numerical expressions is found instead of
a delimiter, the sequence is considered to end with that character,
and
ESCAPE_ERROR
is returned.
ESCAPE_ERROR
- Escape sequences taking an argument but not matching any of the above patterns. In particular, that happens if the end of the logical input line is reached before the end of the argument.
ESCAPE_SKIPCHAR
- The escape sequence “\z”.
ESCAPE_NOSPACE
- The escape sequence “\c”.
ESCAPE_IGNORE
- The escape sequences “\d” and “\u”.
FILES
This function is implemented in mandoc.c.SEE ALSO
mchars_alloc(3), mandoc_char(7), roff(7)HISTORY
This function has been available since mandoc 1.11.2.AUTHORS
Kristaps Dzonsons <kristaps@bsd.lv>Ingo Schwarze <schwarze@openbsd.org>
BUGS
The function doesn't cleanly distinguish between sequences that are valid and supported, valid and ignored, valid and unsupported, syntactically invalid, or undefined. For sequences that are ignored or unsupported, it doesn't tell whether that deficiency is likely to cause major formatting problems and/or loss of document content. The function is already rather complicated and still parses some sequences incorrectly.January 21, 2015 | Debian |