dechanzi - A character encoding system (codeset) for Simplified
The DEC Hanzi (dechanzi) codeset consists of the following
character sets: ASCII GB2312-80 Extended GB
DEC Hanzi uses a 2-byte data representation for symbols
and ideographic characters that are defined in GB2312-80.
ASCII Characters [Toc] [Back]
All ASCII characters are represented in the form of single-byte,
7-bit data in the DEC Hanzi codeset; that is,
the most significant bit (MSB) of the byte that represents
an ASCII character is always set off. For more information
on ASCII characters, refer to ascii(5).
GB2312-80 Characters [Toc] [Back]
The code table for GB2312-80 characters is divided into 94
rows(Qu), numbered from 1 to 94. Each row has 94
columns(Wei), also numbered from 1 to 94. The code table
defines a total of 7445 characters, of which 6763 are Chinese
characters. Chinese characters are grouped as follows:
Graphic symbols
There are 682 graphic symbols, which occupy rows 1
to 9 in the code table. Frequently used (Level 1)
There are 3755 frequently used characters, which
occupy rows 16 to 55 in the code table. Less frequently
used (Level 2) characters
There are 3008 less frequently used characters,
which occupy rows 56-87 in the code table.
To differentiate GB2312-80 character codes from ASCII and
Extended GB character codes, the most significant bit
(MSB) of both the first byte and the second byte are set
on. The following formulas show how to calculate the value
for a GB2312-80 character from its row and column numbers:
1st byte = A0 + Row number
2nd byte = A0 + Column number
For example, if a GB2312-80 character is in the first column
of the 16th row, the character's value is B0A1, which
is calculated as follows:
1st byte = A0(hex) + 16 = B0(hex)
2nd byte = A0(hex) + 01 = A1(hex)
Extended GB Characters [Toc] [Back]
The Extended GB code table is similar to the GB2312 code
table and is divided into 94 rows and 94 columns (8894
code points). However, the Extended GB code table provides
code points for user-defined characters (UDC). The 8836
code points in this table are divided into two areas:
User-defined area
This area spans rows 1 to 87 and provides 8178 code
points. User-defined (reserved) area
This area spans rows 88 to 94 and provides 658 code
points. This area is where users can define special
and long-lasting user-defined characters.
To differentiate Extended GB codes from ASCII codes and
GB2312-80 codes, the most significant bit (MSB) of the
first byte is set on while that of the second byte is set
off. The following formulas show how the code value of an
Extended GB character is calculated from its row and column
1st byte = A0 + Row number
2nd byte = 20 + Column number
For example, if a character is positioned at the first
column of the 16th row on the GB2312-80 code plane, the
character's value is B021, which is calculated as follows:
1st byte = A0(hex) + 16 = B0(hex)
2nd byte = 20(hex) + 01 = 21(hex)
Codeset Conversion [Toc] [Back]
The following codeset converter pairs are available for
converting Simplified Chinese characters between dechanzi
and other encoding formats. Refer to iconv_intro(5) for an
introduction to codeset conversion. For more information
about the other codeset for which dechanzi is the input or
output, see the reference page specified in the list item.
big5_dechanzi, dechanzi_big5
Converting from and to the Big-5 codeset: big5(5)
dechanyu_dechanzi, dechanzi_dechanyu
Converting from and to the DEC Hanyu codeset:
dechanyu(5) eucTW_dechanzi, dechanzi_eucTW
Converting from and to Taiwanese Extended UNIX
Code: eucTW(5) UTF-16_dechanzi, dechanzi_UTF-16
Converting from and to UTF-16 format: Unicode(5)
UCS-4_dechanzi, dechanzi_UCS-4
Converting from and to UCS-4 format: Unicode(5)
UTF-8_dechanzi, dechanzi_UTF-8
Converting from and to UTF-8 format: Unicode(5)
DEC Hanzi encoding is identical to the Microsoft code-page
format (cp936) used for Simplified Chinese characters on
PC systems. However, DEC Hanzi supports fewer characters
than supported by the code page. Therefore, using converters
with dechanzi in the converter name to convert between
cp936 and other formats can result in some data loss.
Refer to code_page(5) for more information about PC code
DEC Hanzi Fonts [Toc] [Back]
The operating system provides both screen and printer
fonts for DEC Hanzi characters. The operating system also
provides bit map fonts in addition to the TrueType fonts
described in this section. For a complete description of
DEC Hanzi fonts, see the document, Technical Reference for
Using Chinese Features.
The following set of Simplified Chinese TrueType fonts are
installed as the operating system default fonts for DEC
Hanzi: -css_dongwen-fangsong-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
The following set of Simplified Chinese TrueType fonts are
available as an installation option: -huatian-fangsongmedium-r-normal--0-0-0-0-c-0-gb2312.1980-0
With either the default or optional font sets installed,
the SongTi fonts are the default screen fonts for the DEC
Hanzi codeset.
The operating system provides the following PostScript
printer fonts for DEC Hanzi characters: Hei-GB2312-80
For general information on printing Asian language text,
refer to i18n_printing(5).
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5),
dechanyu(5), eucTW(5), GB18030(5), GBK(5), i18n_intro(5),
i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5),
telecode(5), Unicode(5)
[ Back ] |