code_page, cp437, cp737, cp775, cp850, cp852, cp855,
cp857, cp860, cp861, cp862, cp863, cp865, cp866, cp869,
cp874, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252,
cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, dingbats,
symbol - Coded character sets that are used on Microsoft
Windows and NT systems
Code pages are coded character sets that are used on
Microsoft Windows, Windows 95, and NT systems. Just as
there are different UNIX codesets, there are different PC
code pages, each supporting a particular set of character
encodings.
A Tru64 UNIX system supplies one locale, en_US.cp850, that
directly supports a PC code-page format (MS-DOS Latin 1).
For all other locales, data in code-page format is supported
only through codeset converters. These converters
can be run directly by users or by software or applications
that exchange data between PC and Tru64 UNIX systems.
Fonts and other kinds of character support are
available only for the native UNIX codeset to which a code
page can be converted. See the i18n_intro(5) reference
page for introductory information on locales and codesets.
See the iconv_intro(5) reference page for an introduction
to codeset conversion and the name format and location of
codeset converters.
The following table lists and describes the code pages
that have conversion support on a Tru64 UNIX system. An
asterisk (*) follows the names of code pages that include
support for the Euro currency sign (C=).
---------------------------------------------------------------
Code Page Description
---------------------------------------------------------------
cp437 MS-DOS United States
cp737 Greek
cp775 Baltic languages (1)
cp850 MS-DOS Multilingual (Latin-1)
cp852 MS-DOS Slavic (Latin-2)
cp855 IBM Cyrillic
cp857 IBM Turkish
cp860 MS-DOS Portuguese
cp861 MS-DOS Icelandic
cp862 Hebrew
cp863 MS-DOS Canadian French
cp865 MS-DOS Nordic languages
cp866 MS-DOS Russian
cp869 IBM Modern Greek
cp874 * MS-DOS Thai
cp932 Japanese
cp936 Chinese (People's Republic of China)
cp949 Korean
cp950 Chinese (Hong Kong)
cp1250 * Windows Latin-2
cp1251 * Windows Cyrillic
cp1252 * Windows Latin-1
cp1253 * Windows Greek
cp1254 * Windows Turkish
cp1255 * Windows Hebrew
cp1256 * Windows Arabic
cp1257 * Windows Baltic (1)
cp1258 * Windows Vietnamese
dingbats Microsoft dingbat characters
symbol Microsoft miscellaneous symbol characters
---------------------------------------------------------------
(1) Baltic languages include Estonian, Latvian, and
Lithuanian.
(2) Latin-2 languages include Albanian, Croatian, Czech,
Faeroese, Hungarian, Polish, Romanian, Latin Serbian, Slovak,
and Slovenian.
(3) Cyrillic languages include Byelorussian, Bulgarian,
and Russian.
In all cases, a code page can be converted to and from the
UCS-2, UCS-4, and UTF-8 codesets. In addition, some code
pages can be converted directly to ISO codesets as shown
in the following table, although some data loss may occur.
------------------------------------------
Code Page Can Be Converted Directly to:
------------------------------------------
cp437 ISO8859-1
cp737 ISO8859-7
cp775 ISO8859-4
cp850 ISO8859-1
cp852 ISO8859-2
cp855 ISO8859-5
cp857 ISO8859-9
cp860 ISO8859-1
cp861 ISO8859-1
cp862 ISO8859-8
cp863 ISO8859-1
cp865 ISO8859-1
cp866 ISO8859-5
cp869 ISO8859-7
cp874 TACTIS
cp1252 ISO8859-1, ISO8859-15
------------------------------------------
See Unicode(5) for information about UCS-2, UCS-4, and
UTF-8. Reference pages for UNIX implementations of the ISO
codesets have the name format iso8859-number(5).
For Traditional Chinese and Japanese, there are no codeset
converters whose names include the name of a code page
because identical character encoding is provided in existing
UNIX codesets. For Traditional Chinese, character
encoding in PC code-page format (cp950) is identical to
that in the Big-5 (big5) codeset. For Japanese, character
encoding in PC code-page format (cp932) is identical to
that in the Shift JIS (SJIS) codeset. Therefore, the codeset
converters whose names include big5 and SJIS can be
used to convert data in and out of PC code-page format for
the supported languages.
Caution for Conversion of Korean and Simplified Chinese
Conversion of text that starts out in code-page format
(cp949) to the DEC Korean (deckorean) codeset may result
in loss of data. All of the Tru64 UNIX codeset equivalents
for cp949 support all the Hanja and miscellaneous characters
also supported by the code page. However, only the
UCS-2, UCS-4, and UTF-8 codesets support the complete set
of Hangul characters supported by the cp949 code page.
The deckorean codeset supports only a subset of these
Hangul characters. Therefore, if data is converted from
cp949 format to UCS-2, UCS-4, or UTF-8, no data is lost.
However, if the data is then converted from UCS-2, UCS-4,
or UTF-8 to deckorean, the unsupported Hangul characters
will be lost.
The DEC Hanzi (dechanzi) codeset uses the same encoding
format as the PC code page used for Simplified Chinese
(cp936) but does not support all the characters supported
by the code page. Therefore, you can use converters with
dechanzi in the converter name to convert text to and from
cp936 format, but the operation may result in some loss of
data.
Commands: iconv(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: i18n_intro(5), iconv_intro(5), iso8859-1(5),
iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5),
iso8859-8(5), iso8859-15(5), Unicode(5)
code_page(5)
[ Back ] |