cp860 - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> cp860 (5)

code_page(5)

NAME [Toc] [Back]

       code_page,  cp437,  cp737,  cp775,  cp850,  cp852,  cp855,
       cp857, cp860, cp861, cp862, cp863,  cp865,  cp866,  cp869,
       cp874, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252,
       cp1253, cp1254, cp1255, cp1256, cp1257, cp1258,  dingbats,
       symbol  -  Coded character sets that are used on Microsoft
       Windows and NT systems

DESCRIPTION [Toc] [Back]

       Code pages are coded  character  sets  that  are  used  on
       Microsoft  Windows,  Windows  95,  and NT systems. Just as
       there are different UNIX codesets, there are different  PC
       code  pages, each supporting a particular set of character
       encodings.

       A Tru64 UNIX system supplies one locale, en_US.cp850, that
       directly  supports a PC code-page format (MS-DOS Latin 1).
       For all other locales, data in code-page  format  is  supported
  only through codeset converters.  These converters
       can be run directly by users or by  software  or  applications
  that  exchange  data between PC and Tru64 UNIX systems.
 Fonts and  other  kinds  of  character  support  are
       available only for the native UNIX codeset to which a code
       page can be converted.  See  the  i18n_intro(5)  reference
       page for introductory information on locales and codesets.
       See the iconv_intro(5) reference page for an  introduction
       to  codeset conversion and the name format and location of
       codeset converters.

       The following table lists and  describes  the  code  pages
       that  have  conversion  support on a Tru64 UNIX system. An
       asterisk (*) follows the names of code pages that  include
       support for the Euro currency sign (C=).

       ---------------------------------------------------------------
       Code Page            Description
       ---------------------------------------------------------------
       cp437                MS-DOS United States
       cp737                Greek
       cp775                Baltic languages (1)
       cp850                MS-DOS Multilingual (Latin-1)
       cp852                MS-DOS Slavic (Latin-2)
       cp855                IBM Cyrillic
       cp857                IBM Turkish
       cp860                MS-DOS Portuguese
       cp861                MS-DOS Icelandic
       cp862                Hebrew
       cp863                MS-DOS Canadian French
       cp865                MS-DOS Nordic languages
       cp866                MS-DOS Russian
       cp869                IBM Modern Greek
       cp874 *              MS-DOS Thai
       cp932                Japanese
       cp936                Chinese (People's Republic of China)
       cp949                Korean
       cp950                Chinese (Hong Kong)
       cp1250 *             Windows Latin-2
       cp1251 *             Windows Cyrillic
       cp1252 *             Windows Latin-1
       cp1253 *             Windows Greek
       cp1254 *             Windows Turkish

       cp1255 *             Windows Hebrew
       cp1256 *             Windows Arabic
       cp1257 *             Windows Baltic (1)
       cp1258 *             Windows Vietnamese
       dingbats             Microsoft dingbat characters
       symbol               Microsoft miscellaneous symbol characters
       ---------------------------------------------------------------

       (1)   Baltic  languages  include  Estonian,  Latvian,  and
       Lithuanian.

       (2) Latin-2 languages include Albanian,  Croatian,  Czech,
       Faeroese, Hungarian, Polish, Romanian, Latin Serbian, Slovak,
 and Slovenian.

       (3) Cyrillic languages  include  Byelorussian,  Bulgarian,
       and Russian.

       In all cases, a code page can be converted to and from the
       UCS-2, UCS-4, and UTF-8 codesets. In addition,  some  code
       pages  can  be converted directly to ISO codesets as shown
       in the following table, although some data loss may occur.

       ------------------------------------------
       Code Page   Can Be Converted Directly to:
       ------------------------------------------
       cp437       ISO8859-1
       cp737       ISO8859-7
       cp775       ISO8859-4
       cp850       ISO8859-1
       cp852       ISO8859-2
       cp855       ISO8859-5
       cp857       ISO8859-9
       cp860       ISO8859-1
       cp861       ISO8859-1
       cp862       ISO8859-8
       cp863       ISO8859-1
       cp865       ISO8859-1
       cp866       ISO8859-5
       cp869       ISO8859-7
       cp874       TACTIS
       cp1252      ISO8859-1, ISO8859-15
       ------------------------------------------

       See  Unicode(5)  for  information  about UCS-2, UCS-4, and
       UTF-8. Reference pages for UNIX implementations of the ISO
       codesets have the name format iso8859-number(5).

       For Traditional Chinese and Japanese, there are no codeset
       converters whose names include the name  of  a  code  page
       because identical character encoding is provided in existing
 UNIX  codesets.  For  Traditional  Chinese,  character
       encoding  in  PC  code-page format (cp950) is identical to
       that in the Big-5 (big5) codeset. For Japanese,  character
       encoding  in  PC  code-page format (cp932) is identical to
       that in the Shift JIS (SJIS) codeset. Therefore, the codeset
  converters  whose  names include big5 and SJIS can be
       used to convert data in and out of PC code-page format for
       the supported languages.







        Caution for Conversion of Korean and Simplified Chinese

       Conversion  of  text  that  starts out in code-page format
       (cp949) to the DEC Korean (deckorean) codeset  may  result
       in loss of data. All of the Tru64 UNIX codeset equivalents
       for cp949 support all the Hanja and miscellaneous  characters
  also  supported  by the code page. However, only the
       UCS-2, UCS-4, and UTF-8 codesets support the complete  set
       of  Hangul  characters  supported  by the cp949 code page.
       The deckorean codeset supports  only  a  subset  of  these
       Hangul  characters.  Therefore,  if data is converted from
       cp949 format to UCS-2, UCS-4, or UTF-8, no data  is  lost.
       However,  if the data is then converted from UCS-2, UCS-4,
       or UTF-8 to deckorean, the unsupported  Hangul  characters
       will be lost.

       The  DEC  Hanzi  (dechanzi) codeset uses the same encoding
       format as the PC code page  used  for  Simplified  Chinese
       (cp936)  but does not support all the characters supported
       by the code page.  Therefore, you can use converters  with
       dechanzi in the converter name to convert text to and from
       cp936 format, but the operation may result in some loss of
       data.

code_page(5)

Contents

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

SEE ALSO [Toc] [Back]