*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> l10n (5)              
Title
Content
Arch
Section
 

l10n_intro(5)

Contents


NAME    [Toc]    [Back]

       l10n_intro,  l10n,  locales,  LOCPATH  -  Introduction  to
       localization (L10N)

DESCRIPTION    [Toc]    [Back]

       Localization refers to the process of establishing  information
 within a computer system specific to each supported
       language, cultural data, and coded character set (codeset)
       combination.  Each such combination gives rise to the definition
 of one locale. The abbreviation L10N is often used
       to  stand  for  localization,  as  there are 10 characters
       between the beginning "L" and the ending "N" of that word.

       See   i18n_intro(5)  for  introductory  information  about
       internationalization and how to use system commands to set
       a  locale. See localedef(1), charmap(4), and locale(4) for
       information about creating locales. See  Writing  Software
       for  the International Market for information about creating
 locales and writing applications that use locales.

       The current release of the operating system  supports  the
       following  languages  with  locales. Each language is discussed
 separately in its own reference page.

       Catalan
       Chinese (Simplified and Traditional)
       Czech
       Danish
       Dutch
       English (discussed in this reference page)
       Finnish
       Flemish
       French
       German
       Greek
       Hebrew
       Hungarian
       Icelandic
       Italian
       Japanese
       Korean
       Lithuanian
       Norwegian
       Polish
       Portuguese
       Russian
       Slovak
       Slovene
       Spanish
       Swedish
       Thai
       Turkish

       For some of the languages, more than one codeset and country
  or  territory  are supported. Hence, multiple locales
       are supported for certain languages.  The  following  list
       describes all the supported locales. For information about
       the character encoding used by a  particular  locale,  see
       the  reference  page for the codeset specified in the last
       part of the locale name or, for those  that  end  in  Unicode(5).  Catalan locale for Spain (uses the Latin-1 codeset)
 Catalan locale for Spain (uses the  Latin-9  codeset)
       Catalan  locale  for  Spain (uses the UTF-8 codeset) Czech
       locale for Czech Republic (uses the Latin-2 codeset) Czech
       locale  for Czech Republic (uses the UTF-8 codeset) Danish
       locale for  Denmark  (uses  the  Latin-1  codeset)  Danish
       locale  for  Denmark  (uses  the  Latin-9  codeset) Danish
       locale for Denmark (uses the UTF-8 codeset) German  locale
       for  Switzerland  (uses the Latin-1 codeset) German locale
       for Switzerland (uses the Latin-9 codeset)  German  locale
       for Switzerland (uses the UTF-8 codeset) German locale for
       Germany (uses the Latin-1 codeset) German locale for  Germany
  (uses the Latin-9 codeset) German locale for Germany
       (uses the UTF-8 codeset) Greek locale for Greece (uses the
       ISO Greek codeset) Greek locale for Greece (uses the UTF-8
       codeset) English locale that includes the  euro  character
       (uses the UTF-8 codeset)

              This  locale  both  supports the euro character and
              defines the decimal point as a comma  (,)  and  the
              thousands  separator  as  a  period (.). Therefore,
              this locale is useful in many  European  countries,
              not just those for which English is the native language,
 when assigned only to the LC_MONETARY locale
              category  or  environment variable.  English locale
              for  Great  Britain  (uses  the  Latin-1   codeset)
              English  locale for Great Britain (uses the Latin-9
              codeset) English locale for Great Britain (uses the
              UTF-8 codeset) English locale for the United States
              (uses the Latin-1 codeset) English locale  for  the
              United  States  (uses  the Latin-9 codeset) English
              locale for the United States (uses cp850 encoding)

              Use this locale with data  that  contains  accented
              characters and that was generated on a PC using the
              cp850 code page for character encoding. This  character
  encoding  is usually the default for the DOS
              and  Windows  operating  systems  in  Europe.   The
              en_US.ISO8859-1   and  en_US.cp850  locales  encode
              English characters the same way but  use  different
              values  for  accented and other non-English characters
 in the Latin-1 character set.  English  locale
              for  the  United  States  (uses  the UTF-8 codeset)
              English locale for  the  United  States  (uses  the
              UTF-8 codeset)

              The  @euro  variant defines the local currency sign
              to be the euro character and the international currency
  sign  to  be EUR. See also en_EU.UTF-8@euro.
              Spanish locale for Spain (uses the Latin-1 codeset)
              Spanish locale for Spain (uses the Latin-9 codeset)
              Spanish locale for Spain (uses the  UTF-8  codeset)
              Finnish  locale for Finland (uses the Latin-1 codeset)
 Finnish locale for Finland (uses  the  Latin-9
              codeset) Finnish locale for Finland (uses the UTF-8
              codeset)  French  locale  for  Belgium  (uses   the
              Latin-1  codeset)  French  locale for Belgium (uses
              the Latin-9  codeset)  French  locale  for  Belgium
              (uses  the  UTF-8 codeset) French locale for Canada
              (uses the Latin-1 codeset) French locale for Canada
              (uses the Latin-9 codeset) French locale for Canada
              (uses the UTF-8 codeset) French locale for Switzerland
  (uses  the Latin-1 codeset) French locale for
              Switzerland  (uses  the  Latin-9  codeset)   French
              locale  for  Switzerland  (uses  the UTF-8 codeset)
              French locale for France (uses the Latin-1 codeset)
              French locale for France (uses the Latin-9 codeset)
              French locale for France (uses the  UTF-8  codeset)
              Hebrew locale for Israel (uses the ISO Hebrew codeset)
 Hungarian locale for Hungary (uses the Latin-2
              codeset)  Hungarian  locale  for  Hungary (uses the
              UTF-8 codeset) Icelandic locale for  Iceland  (uses
              the  Latin-1  codeset) Icelandic locale for Iceland
              (uses the Latin-9  codeset)  Icelandic  locale  for
              Iceland (uses the UTF-8 codeset) Italian locale for
              Italy (uses the Latin-1 codeset) Italian locale for
              Italy (uses the Latin-9 codeset) Italian locale for
              Italy (uses the UTF-8 codeset)  Hebrew  locale  for
              Israel (uses the ISO Hebrew codeset)

              This locale name is supported for backward compatibility.
 The recommended name to  use  for  the  ISO
              Hebrew  locale is he_IL.ISO8859-8.  Japanese locale
              for Japan (uses the  DEC  Kanji  codeset)  Japanese
              locale  for  Japan  (uses the Japanese EUC codeset)
              Japanese locale for Japan (uses the Super DEC Kanji
              codeset)  Japanese locale for Japan (uses the Shift
              JIS codeset) Japanese locale for  Japan  (uses  the
              UTF-8  codeset)  Korean  locale for Korea (uses the
              DEC Korean codeset) Korean locale for  Korea  (uses
              the  Korean  EUC  codeset)  Korean locale for Korea
              (uses the  UTF-8  codeset)  Lithuanian  locale  for
              Lithuania  (uses  the  Latin-4  codeset) Lithuanian
              locale for Lithuania (uses the UTF-8 codeset) Flemish
  locale  for Belgium (uses the Latin-1 codeset)
              Flemish locale for Belgium (uses the Latin-9  codeset)
  Flemish  locale  for  Belgium (uses the UTF-8
              codeset) Dutch locale for  the  Netherlands   (uses
              the  Latin-1  codeset) Dutch locale for the Netherlands
  (uses the Latin-9 codeset) Dutch locale  for
              the Netherlands  (uses the UTF-8 codeset) Norwegian
              locale for Norway  (uses the Latin-1 codeset)  Norwegian
 locale for Norway (uses the Latin-9 codeset)
              Norwegian locale for Norway  (uses the UTF-8  codeset)
  Polish  locale  for  Poland (uses the Latin-2
              codeset) Polish locale for Poland (uses  the  UTF-8
              codeset)  Portuguese  locale for Portugal (uses the
              Latin-1 codeset)  Portuguese  locale  for  Portugal
              (uses  the  Latin-9  codeset) Portuguese locale for
              Portugal (uses the UTF-8  codeset)  Russian  locale
              for  Russia (uses the ISO Cyrillic codeset) Russian
              locale for Russia (uses the UTF-8  codeset)  Slovak
              locale for Slovakia (uses the Latin-2 codeset) Slovak
 locale for Slovakia (uses  the  UTF-8  codeset)
              Slovene locale for Slovenia (uses the Latin-2 codeset)
 Slovene locale for Slovenia  (uses  the  UTF-8
              codeset)   Swedish  locale  for  Sweden  (uses  the
              Latin-1 codeset) Swedish locale  for  Sweden  (uses
              the  Latin-9  codeset)  Swedish  locale  for Sweden
              (uses the UTF-8 codeset) Thai locale  for  Thailand
              (uses the TACTIS codeset) Turkish locale for Turkey
              (uses  the  Latin-5  codeset)  Turkish  locale  for
              Turkey  (uses the UTF-8 codeset) Simplified Chinese
              locale for the People's Republic of China (uses the
              DEC  Hanzi  codeset)  Simplified Chinese locale for
              the People's Republic of China (uses the GBK  codeset,
  an  extension of the GB 2312-80 codeset) Simplified
 Chinese locale for the People's Republic of
              China  (uses the GB18030 codeset, which extends GBK
              by means of  4-byte  encoding)  Simplified  Chinese
              locale for the People's Republic of China (uses the
              UTF-8 codeset) Traditional Chinese locale for  Hong
              Kong  (uses  the BIG-5 codeset) Traditional Chinese
              locale for Hong Kong (uses the DEC  Hanyu  codeset)
              Simplified  Chinese  locale for Hong Kong (uses the
              DEC Hanzi codeset) Traditional Chinese  locale  for
              Hong  Kong  (uses the Taiwanese EUC codeset) Traditional
 Chinese locale for Hong Kong (uses the UTF-8
              codeset)  Traditional  Chinese  locale  for  Taiwan
              (uses the BIG-5 codeset) Traditional Chinese locale
              for Taiwan (uses the DEC Hanyu codeset) Traditional
              Chinese locale for Taiwan (uses the  Taiwanese  EUC
              codeset)  Traditional  Chinese  locale  for  Taiwan
              (uses the UTF-8 codeset)

              This locale supports Simplified Chinese as well  as
              Traditional Chinese.

       For  the zh_CN.dechanzi locale, the @pinyin, @radical, and
       @stroke variants are available for sorting by pinyin, radical,
   and  stroke,  respectively.  For  the  zh_TW.big5,
       zh_TW.dechanyu,  and  zh_TW.eucTW  locales,  the  @chuyin,
       @radical,  and  @stroke variants are available for sorting
       by chuyin, radical, and stroke, respectively.  These variant
  locale names (those including the @collation_modifier
       suffix) are available for  assignment  to  the  LC_COLLATE
       variable.

       The and locales are the only locales that include the euro
       monetary  symbol  in  the   coded   character   set.   The
       *.UTF-8@euro locales also define the local currency symbol
       to be the euro character and  the  international  currency
       symbol  to  be EUR. See euro(5) for more information about
       the euro symbol and how it is supported.

       You can use the -a option with the locale command to  list
       all  the locales available on the system. The POSIX (or C)
       locale is always available because it must  exist  on  all
       systems  that  conform to The Open Group's UNIX specifications.
 The POSIX locale is the default locale when  locale
       variables are not set.

                                  Note

       The  dxterm  terminal  emulator  does  not support locales
       based on the Unicode (UTF-8) or Latin-9 (ISO8859-15) codesets.
  Use  dtterm,  the default terminal emulator for the
       Common Desktop Environment (CDE), with  locales  based  on
       the Latin-9 and UTF-8 codesets.


   System Locales    [Toc]    [Back]
       When  you install Worldwide Language Support, localization
       is supported by two types of locales: Unicode locales  and
       dense code locales.

       Unicode locales conform to Unicode and ISO/IEC 10646 standards
 and use UTF-32 as the wide character encoding. Under
       UTF-32  wide  character encoding, wchar_t values represent
       the same characters regardless of the locale and,  because
       Unicode  standards  prevail,  implementation is consistent
       across platforms.

       Locales whose names end in use file code and internal process
  code (wchar_t encoding) defined in the ISO 10646 and
       Unicode standards.

       Other, non-UTF-8 Unicode locales use traditional UNIX  and
       proprietary  codesets for the file code while using UTF-32
       as the internal process code.  A subset of  these  Unicode
       locales  have a @ucs4 modifier; however, they are the same
       as the locales without the @ucs4 modifier. The @ucs4  subset
  is  provided  for  backward  compatibility and may be
       removed in the future. You  cannot  select  @ucs4  locales
       from  the CDE login menu; you must specify the locale name
       in the LANG environment variable.

       The universal.UTF-8 locale is also available (for  use  by
       applications  rather than end users). It supports the complete
 set of characters in  the  universal  character  set
       (UCS).

       See  Unicode(5)  for  more information about encoding formats.


       For locales, file code may include characters  encoded  in
       more than 1 byte; therefore, use these locales in applications
 that can process multibyte data. Design new applications
  based  on  multibyte  locales,  which incorporate a
       large character repertoire, to enable the  application  to
       expand future character support without changing the character
 set.

       Dense code locales  use  dense  code  for  wide  character
       encoding  to  minimize table size (that is, codepoints are
       assigned consecutively with no  empty  positions).   Under
       dense code locales, a wchar_t value for one locale may not
       represent the same character in another locale and,  thus,
       is locale specific. Dense code locales are appropriate for
       applications that have no  dependencies  on  the  internal
       process  code  or, because dense code locales are slightly
       more efficient than Unicode locales, require  better  performance.


       All  valid  codepoints  in  multibyte  character  sets are
       mapped to valid codepoints in Unicode, including  unmapped
       codepoints  that  are  mapped to Unicode codepoints in the
       private use area. Thus, dense code locales are  equivalent
       to  Unicode  locales.  In  general,  the same charmaps and
       locale source can be  used  for  Unicode  and  dense  code
       locales.  However,  Unicode and dense code characters that
       are not defined in the LC_COLLATE section  may  be  sorted
       differently.

       A  Unicode locale exists for each dense code locale. (However,
 not all Unicode locales have a dense code  version.)
       For  Latin-1  locales (ISO8859-1), the dense code and Unicode
 locales are identical because Latin-1 characters  are
       the same as the first 256 characters in Unicode.

       The  operating  system also supports three UCS transformation
 formats (UTFs), UTF-8, UTF-16,  and  UTF-32,  all  of
       which  are defined in the Unicode standard. See Unicode(5)
       for a full description of Unicode, UCS-4, and  the  transformation
 formats.

       The      Unicode      locales     are     installed     in
       /usr/i18n/lib/nls/ucsloc/.   Dense   code   locales    are
       installed  in  /usr/i18n/lib/nls/loc/.  A  symbolic  link,
       /usr/i18n/lib/nls/dloc  points  to  the   system   default
       locales.   For  example,  the  Japanese  locale  filename,
       /usr/lib/nls/loc/ja_JP.eucJP,  is  a  symbolic   link   to
       /usr/i18n/lib/nls/dloc/ja_JP.eucJP,  where /dloc is a symbolic
 link to either /ucsloc for the Unicode  version,  or
       /loc  for  the dense code version, of the Japanese locale.
       Keep in mind that the same locale  name  can  refer  to  a
       Unicode locale or to a dense code locale, depending on the
       setting of the symbolic link.  Thus, if running an  application
  in  a  locale  is  problematic, check the symbolic
       link.

       Because Unicode locales use consistent values for  characters
  in  wchar_t  form, a default link to Unicode locales
       can increase consistency  across  locales  and  platforms.
       However,  some  users  may  prefer  the  older, dense code
       locales that use proprietary algorithms to convert characters
 to wchar_t form, or an application may have dependencies
 on dense code wchar_t  encoding.  To  switch  between
       Unicode  and dense code locales, the system administrator,
       as root, uses i18nconfig to change the systemwide  default
       or      manually     changes     the     symbolic     link
       /usr/i18n/lib/nls/dloc from to

   Environment Variables Related to Localization    [Toc]    [Back]
       The following system  environment  variables  can  be  set
       (usually  only by installed applications or by programmers
       who are testing applications or converters under  development)
  to  override  the  default  search path for certain
       kinds of localized files: Specifies the  search  path  for
       locales and codeset converters.  This environment variable
       is  not  defined  by  current  industry   standards.   See
       iconv_intro(5),  iconv_open(3),  and setlocale(3) for more
       information.

              Because the LOCPATH  variable  is  not  defined  by
              standards,  it  is  recommended  for  use only when
              testing locales or converters under development and
              not  as  a  systemwide method for finding installed
              converters or locales.  When you set LOCPATH,  make
              sure that the search path is valid for both locales
              and converters. Otherwise, application  and  system
              software  can  find only locales or only converters
              in environments  where  both  kinds  of  files  are
              required.   Specifies  the  search path for message
              catalogs, which contain translated  text  for  programs.
  This  variable  is  used  primarily  by the
              catopen() function.  See  catopen(3)  for  detailed
              information on NLSPATH.

   Customizing Locales    [Toc]    [Back]
       Partial  source  files, along with an associated Makefile,
       are available for many locales in the /usr/lib/nls/loc/src
       directory.  By editing one of these source files and using
       the Makefile to rebuild the locale (make locale_name), you
       can  customize  one or more of the following features: The
       format of affirmative and negative responses  (LC_MESSAGES
       section) Rules and symbols for formatting monetary numeric
       information (LC_MONETARY section) Rules  and  symbols  for
       formatting  nonmonetary  numeric  information  (LC_NUMERIC
       section) Rules and symbols for formatting  date  and  time
       information (LC_TIME section)

       As  described  in  locale(4),  the LC_CTYPE and LC_COLLATE
       sections of these  locale  sources  are  not  customizable
       using  this  method. This means that you cannot use one of
       these sources to change how characters are  classified  or
       collated.  By implication, this also means that you cannot
       add a new character to a locale that does not already support
  it.   For example, you cannot add the European monetary
 character (euro) to a locale that  does  not  already
       support that character.  However, you can edit the LC_MONETARY
 section to define a string identifier  for  euro  by
       using  characters that the locale does support.  For example,
 you could replace the existing monetary  symbol  with
       EUR.

       See  locale(4)  for  more  information  on a locale source
       file. See Writing Software for  the  International  Market
       for  information  on  user  customization  of LC_CTYPE and
       LC_COLLATE.

                                Caution

       Customized versions of locales that are provided with  the
       operating system are not preserved when the operating system
 is reinstalled, even when an update installation  procedure
 is used. Therefore, you must back up files for customized
 locales and their sources before reinstalling  the
       operating  system.  After  the reinstallation is complete,
       you must restore your customized locales to the system. If
       the  newly  installed sources have revisions when compared
       to the old sources, it might be preferable to  apply  your
       customizations  to the newly installed sources and rebuild
       your customized locales.

SEE ALSO    [Toc]    [Back]

      
      
       Commands: locale(1), localedef(1)

       Functions: catopen(3)

       Files: charmap(4), locale(4)

       Others:  Catalan(5),  Chinese(5),  Czech(5),  dechanyu(5),
       dechanzi(5),    deckanji(5),    deckorean(5),    Dutch(5),
       eucJP(5),   eucKR(5),   eucTW(5),   euro(5),   Finnish(5),
       French(5),   GB18030(5),   GBK(5)   ,German(5),  Greek(5),
       Hebrew(5), Hungarian(5), i18n_intro(5),  i18n_printing(5),
       Icelandic(5),  iconv_intro(5),  iso2022(5),  iso2022jp(5),
       iso8859-1(5),  iso8859-2(5),  iso8859-4(5),  iso8859-5(5),
       iso8859-7(5),  iso8859-8(5),  iso8859-9(5), iso8859-15(5),
       Italian(5), Japanese(5), jiskanji(5),  Korean(5),  Lithuanian(5),   Norwegian(5),  Polish(5),  Portuguese(5),  Russian(5), sbig5(5), sdeckanji(5),  shiftjis(5),  Slovak(5),
       Slovene(5), Spanish(5), Swedish(5), TACTIS(5), telecode(5)
       Thai(5), Turkish(5), Unicode(5)

       Writing Software for the International Market

       Using International Software



                                                    l10n_intro(5)
[ Back ]
 Similar pages
Name OS Title
perllocale OpenBSD Perl locale handling (internationalization and localization)
perllocale IRIX Perl locale handling (internationalization and localization)
glintro IRIX Introduction to OpenGL
isa OpenBSD introduction to ISA bus support
rcsintro OpenBSD introduction to RCS commands
pci OpenBSD introduction to PCI bus support
intro OpenBSD introduction to the games
rcsintro Tru64 introduction to RCS commands
intro HP-UX introduction to miscellany
intro Tru64 Introduction to commands
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service