charmap - Defines character symbols as character encodings
The character set description (charmap) file defines character
symbols as character encodings. This file is the
source file for a coded character set, or codeset. All
supported codesets have the Portable Character Set (PCS)
as a proper subset. The PCS consists of the following
character symbols (listed by their standardized symbolic
names) and hexadecimal encodings:
-------------------------------------------
Symbol Name Hexadecimal Encoding
-------------------------------------------
<NUL> \x00
<SOH> \x01
<STX> \x02
<ETX> \x03
<EOT> \x04
<ENQ> \x05
<ACK> \x06
<alert> \x07
<backspace> \x08
<tab> \x09
<newline> \x0A
<vertical-tab> \x0B
<form-feed> \x0C
<carriage-return> \x0D
<SO> \x0E
<SI> \x0F
<DLE> \x10
<DC1> \x11
<DC2> \x12
<DC3> \x13
<DC4> \x14
<NAK> \x15
<SYN> \x16
<ETB> \x17
<CAN> \x18
<EM> \x19
<SUB> \x1A
<ESC> \x1B
<IS4> \x1C
<IS3> \x1D
<IS2> \x1E
<IS1> \x1F
<space> \x20
<exclamation-mark> \x21
<quotation-mark> \x22
<number-sign> \x23
<dollar-sign> \x24
<percent> \x25
<ampersand> \x26
<apostrophe> \x27
<left-parenthesis> \x28
<right-parenthesis> \x29
<asterisk> \x2A
<plus-sign> \x2B
<comma> \x2C
<hyphen> \x2D
<period> \x2E
<slash> \x2F
<zero> \x30
<one> \x31
<two> \x32
<three> \x33
<four> \x34
<five> \x35
<six> \x36
<seven> \x37
<eight> \x38
<nine> \x39
<colon> \x3A
<semi-colon> \x3B
<less-than> \x3C
<equal-sign> \x3D
<greater-than> \x3E
<question-mark> \x3F
<commercial-at> \x40
<A> \x41
<B> \x42
<C> \x43
<D> \x44
<E> \x45
<F> \x46
<G> \x47
<H> \x48
<I> \x49
<J> \x4A
<K> \x4B
<L> \x4C
<M> \x4D
<N> \x4E
<O> \x4F
<P> \x50
<Q> \x51
<R> \x52
<S> \x53
<T> \x54
<U> \x55
<V> \x56
<W> \x57
<X> \x58
<Y> \x59
<Z> \x5A
<left-bracket> \x5B
<backslash> \x5C
<right-bracket> \x5D
<circumflex> \x5E
<underscore> \x5F
<grave-accent> \x60
<a> \x61
<b> \x62
<c> \x63
<d> \x64
<e> \x65
<f> \x66
<g> \x67
<h> \x68
<i> \x69
<j> \x6A
<k> \x6B
<l> \x6C
<m> \x6D
<n> \x6E
<o> \x6F
<p> \x70
<q> \x71
<r> \x72
<s> \x73
<t> \x74
<u> \x75
<v> \x76
<w> \x77
<x> \x78
<y> \x79
<z> \x7A
<left-brace> \x7B
<vertical-line> \x7C
<right-brace> \x7D
<tilde> \x7E
<DEL> \x7F
-------------------------------------------
The charmap file has the following components: An optional
special symbolic name declarations section
Each declaration in this section consists of a special
symbolic name, followed by one or more space
or tab characters, and a value. The following list
describes the special symbolic names that you can
include in the declarations section: Specifies the
name of the codeset for which the charmap file is
defined. This value determines the value returned
by the nl_langinfo (CODESET) subroutine. If
<code_set_name> is not declared, the name for the
Portable Character Set is used. Specifies the maximum
number of bytes in a character for the codeset.
Valid values are 1 to 4. The default value
is 1. Specifies the minimum number of bytes in a
character for the codeset. Since all supported
codesets have the Portable Character Set as a
proper subset, this value must be 1. Specifies the
escape character that indicates encodings in hexadecimal
or octal notation. The default value is a
\ (backslash). Specifies the character used to
indicate a comment within a charmap file. The
default value is a # (number sign). The CHARMAP
section header
This header marks the beginning of the section that
associates character symbols with encodings. Mapping
statements for characters in the codeset
Each statement lists a symbolic name for a character
and its associated encoding. The format of a
mapping statement is: <char_symbol> encoding
A symbolic name begins with the < (left-angle
bracket) character and ends with the > (right-angle
bracket) character. The characters for char_symbol
(between < and >) can be any characters from the
Portable Character Set, except for control and
space characters. The right-angle bracket (>) can
occur in char_symbol as well in the last position
of the name. You must precede all > characters but
the last one with the escape character (as specified
by the <escape_char> special symbolic name).
The format of a mapping statement is:
<char_symbol> encoding
An encoding is specified as one or more character
constants, with the maximum number of character
constants specified by the <mb_cur_max> special
symbolic name. The encoding may be listed as decimal,
octal, or hexadecimal constants with the following
formats: \xxx, where x is a hexadecimal
digit \ooo or \oo, where o is an octal digit \dddd
or \ddd, where d is a decimal digit
Some examples of character symbol definitions are
the following:
<A> \d65 #decimal constant <B>
\x42 #hexadecimal constant <j10101>
\x81\xA1 #multiple hexadecimal constants
A range of symbolic names and corresponding encoded
values may also be defined, where the nonnumeric
prefix for each symbolic name is common, and the
numeric portion of the second symbolic name is
equal to or greater than the numeric portion of the
first symbolic name. In this format, a symbolic
name value consists of zero or more nonnumeric
characters followed by an integer of one or more
decimal digits. This format defines a series of
symbolic names. For example, the string
<j0101>...<j0104> is interpreted as the <j0101>,
<j0102>, <j0103>, and <j0104> symbolic names, in
that order.
In statements defining ranges of symbolic names,
the encoded value listed is the value for the first
symbolic name in the range. Subsequent symbolic
names have encoded values in increasing order. For
example:
<j0101>...<j0104> \d129\d254
The preceding statement is interpreted as follows:
<j0101> \d129\d254 <j0102> \d129\d255 <j0103>
\d130\d0 <j0104> \d130\d1
Although you cannot assign multiple encodings to
one symbolic name, you can create multiple names
for one encoded value. This is allowed because
some characters have several common names. For
example, the "." character is called a period in
some parts of the world, and a full stop in others.
Both names may appear in the charmap. For example:
<period> \x2e <full-stop> \x2e
If used, comments must begin with the character
specified by the <comment_char> special symbolic
name. When an entire line is a comment, you must
specify <comment_char> in the first column of the
line. The END CHARMAP trailer
This entry denotes the end of character map statements.
The following example is a portion of a possible charmap
file:
CHARMAP <code_set_name> "ISO8859-1" <mb_cur_max>
1 <mb_cur_min> 1 <escape_char> \
<comment_char> #
<NUL> \x00 <SOH> \x01
<STX> \x02 <ETX> \x03
<EOT> \x04 <ENQ> \x05
<ACK> \x06 <alert> \x07
<backspace> \x09 <tab> \x09
<newline> \x0a <vertical-tab> \x0b
<form-feed> \x0c <carriage-return> \x0d
END CHARMAP
Character set description (charmap) source files for supported
locales. The /usr/lib/nls/loc/charmaps directory
does not exist when source files for installed locales are
not provided.
Commands: locale(1), localedef(1)
Files: locale(4)
charmap(4)
[ Back ] |