GB18030, gb18030 - A Chinese character set that extends
GBK by means of 4-byte code points
The GB18030-2000 character set, defined by the Chinese
national standard organization, is an extension of the GBK
character set, which itself is an extension to the
GB2312-80 character set. (See the GBK(5) reference page.)
GB18030 incorporates GBK support for all the Hanzi characters
specified by the Unicode Version 3.0 and ISO/IEC
10646-2001 standards.
GB18030 Code Space and Code Points [Toc] [Back]
The GB18030 character set has 1-byte, 2-byte, and 4-byte
encoding with the following structure:
Number of Bytes Code Space Total Code Points
1-byte 0x00 to 0x7F 128
2-byte 0x81 to 0xFE 23940
0x40 to 0xFE (except 0x7F)
4-byte 0x81 to 0xFE 1587600
0x30 to 0x39
0x81 to 0xFE
0x30 to 0x39
The GB18030 1-byte code provides support for ASCII. The
2-byte code provides support for all the CJK characters
(Chinese, Japanese, and Korean) defined in the Unicode 2.1
standard. The 4-byte code provides support for the Unicode
Version 3.0 additions to Version 2.1. The 4-byte code also
leaves a large number of unassigned codepoints that are
available for future use.
The GB18030 character set maps the invalid Unicode codepoints
U+FFFE and U+FFFF to 4-byte codes. Because these
two characters are invalid in UCS, this mapping can cause
problems with round-trip character conversions.
The GB18030 character set does no mapping from 4-byte code
to the UCS surrogate area (U+D800 through U+DFFF).
Codeset Converters for GB18030 [Toc] [Back]
The following codeset converter pairs are available for
converting Simplified Chinese characters between GB18030
and UCS formats. Refer to Unicode(5) for more information
about the UTF-16, UCS-4, and UTF-8 encoding formats. Refer
to iconv_intro(5) for an introduction to codeset conversion.
UTF-16_GB18030, GB18030_UTF-16
Converting from and to UTF-16 format UCS-4_GB18030,
Converting from and to UCS-4 format UTF-8_GB18030,
Converting from and to UTF-8 format
Fonts for GB18030 [Toc] [Back]
The operating system provides the following Simplified
Chinese TrueType fonts for GB18030: -css_dongwen-fangsongmedium-r-normal--0-0-0-0-c-0-iso8859-1
These fonts can be used for printing with Chinese text
printers. The operating system uses Unicode fonts and the
SongTi font style as the default screen font for the
GB18030 codeset. See wwpsof(8) for information on the
PostScript print filter and TrueType fonts.
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), dechanyu(5),
dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5), telecode(5)
[ Back ] |