eucTW - A character encoding system (codeset) for Traditional
Chinese
The Taiwanese EUC (Extended UNIX Code), or eucTW, codeset
consists of the following character sets: ASCII CNS 11643
(Plane 1 to Plane 16)
Taiwanese EUC uses a combination of single-byte data and
2-byte data to represent ASCII characters, symbols, and
ideographic characters. Because too many character planes
were included, Taiwanese EUC uses different leading codes
to designate different character planes.
ASCII characters are represented in the form of single
byte 7-bit data in Taiwanese EUC; that is, the most significant
bit (MSB) of the byte that represents an ASCII
character is always set off. For more information, refer
to ascii(5).
Although the standard Taiwanese EUC codeset includes all
characters defined by the CNS 11643-1992 standard, the
operating system's eucTW implementation currently supports
the following: Characters defined in the first and second
planes of CNS 11643 The EDPC Recommended Character Set
(refer to dechanyu(5) for more information) CNS 11643-1986
and DTSCS characters that have been remapped into the
third and fourth character planes by the CNS 11643-1992
standard
Characters that were added to CNS 11643-1986 by the CNS
11643-1992 standard are not supported.
The characters that are defined in plane 1 and plane 2 of
CNS 11643-1992 and that are the same as those defined in
CNS 11643-1986 are as follows:
--------------------------------------------------------------------
Character Plane Character Type Number of Characters
--------------------------------------------------------------------
1 Special characters 651
Control characters 33
Frequently-used characters 5401
2 Less frequently-used char- 7650
acters
--------------------------------------------------------------------
The characters defined in plane 3 and plane 4 of CNS
11643-1992 are as follows:
--------------------------------------------------------------------------
Character Plane Character Type Number of
Characters
--------------------------------------------------------------------------
3 Rarely-used characters (EDPC Part I) 6148
4 Used for residency system, ISO 2nd 7298
edition DIS 10646 Han characters, 171
EDPC Part II Characters
--------------------------------------------------------------------------
The characters that have been remapped into the third and
fourth character planes of CNS 11643-1992 as specified by
the EDPC are as follows:
---------------------------------------------------------
EDPC Characters Character Plane Number of Characters
---------------------------------------------------------
Part I Plane 3 6148
Part II Plane 4 171
---------------------------------------------------------
Taiwanese EUC Encoding [Toc] [Back]
Except for characters in the first plane of CNS
11643-1986, Taiwanese EUC makes use of a leading code (the
8-bit Single-Shift 2 control character (SS2) and an additional
byte) to designate characters to a character plane.
The position of a character on a plane is specified by two
bytes. The first byte determines the character's row number
and the second byte determines the character's column
number. The MSB of both bytes is set on.
The following table shows the encoding of Taiwanese EUC
characters:
-------------------------------------------------------
CNS 11643-1986 Code Plane Leading Code Code Range
-------------------------------------------------------
1 [nil] A1A1 - FEFE
2 SS2 A2 A1A1 - FEFE
3 SS2 A3 A1A1 - FEFE
4 SS2 A4 A1A1 - FEFE
5 SS2 A5 A1A1 - FEFE
6 SS2 A6 A1A1 - FEFE
7 SS2 A7 A1A1 - FEFE
8 SS2 A8 A1A1 - FEFE
9 SS2 A9 A1A1 - FEFE
10 SS2 AA A1A1 - FEFE
11 SS2 AB A1A1 - FEFE
12 SS2 AC A1A1 - FEFE
13 SS2 AD A1A1 - FEFE
14 SS2 AE A1A1 - FEFE
15 SS2 AF A1A1 - FEFE
16 SS2 B0 A1A1 - FEFE
-------------------------------------------------------
Codeset Conversion [Toc] [Back]
The following codeset converter pairs are available for
converting Traditional Chinese characters between eucTW
and other encoding formats. Refer to iconv_intro(5) for
an introduction to codeset conversion. For more information
about the other codeset for which eucTW is the input
or output, see the reference page specified in the list
item. big5_eucTW, eucTW_big5
Converting from and to the Big-5 codeset: big5(5).
Note that Big-5 encoding is equivalent to the
Microsoft code-page format used on PCs for Traditional
Chinese. You can therefore use this set of
converters to convert Traditional Chinese text
between the eucTW and PC code-page formats. For
information about how the operating system supports
PC code pages, see code_page(5). dechanyu_eucTW,
eucTW_dechanyu
Converting from and to the DEC Hanyu codeset:
dechanyu(5). dechanzi_eucTW, eucTW_dechanzi
Converting from and to the DEC Hanzi codeset:
dechanzi(5). sbig5_eucTW, eucTW_sbig5
Converting from and to the Shift Big-5 codeset:
sbig5(5). telecode_eucTW, eucTW_telecode
Converting from and to the Telecode codeset: telecode(5). UTF-16_eucTW, eucTW_UTF-16
Converting from and to UTF-16 format: Unicode(5).
UCS-4_eucTW, eucTW_UCS-4
Converting from and to UCS-4 format: Unicode(5).
UTF-8_eucTW, eucTW_UTF-8
Converting from and to UTF--8 format: Unicode(5).
Fonts for Taiwanese EUC [Toc] [Back]
For both display devices and printers, the operating system
supports Taiwanese EUC through internal conversion to
DEC Hanyu code and use of DEC Hanyu fonts (see
dechanyu(5)).
For general information on printing non-English text,
refer to i18n_printing(5).
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5),
dechanzi(5), GBK(5), iconv_intro(5), i18n_intro(5),
i18n_printing(5), l10n_intro(5), sbig5(5), telecode(5),
Unicode(5)
eucTW(5)
[ Back ] |