localedef(4) localedef(4)
NAME [Toc] [Back]
localedef - format and semantics of locale definition file
DESCRIPTION [Toc] [Back]
This is a description of the syntax and meaning of the locale
definition that is provided as input to the localedef command to
create a locale (see localedef(1M)).
The following is a list of category tags, keywords and subsequent
expressions which are recognized by localedef. The order of keywords
within a category is irrelevant with the exception of the copy keyword
and other exceptions noted under the LC_COLLATE description. (Note
that, as a convention, the category tags are composed of uppercase
characters, while the keywords are composed of lowercase characters).
Category Tags and Keywords [Toc] [Back]
The following keywords do not belong to any category and should appear
in the beginning of the locale definition file:
comment_char
Single character indicating the character to be interpreted
as starting a comment line within the locale definition
file. This character should be in the first column of a
comment line. The default comment_char is #. All lines
with a comment_char in the first column are ignored.
escape_char
A single character indicating the character to be
interpreted as an escape character within the script. The
default escape_char is \. escape_char is used to escape
localedef metacharacters to remove special meaning and in
the character constant decimal, octal, and hexadecimal
formats. It is also used to continue a line onto the next,
if escape_char is the last character on the line (before the
new-line character).
The following keywords can be used in any category:
copy A string naming another valid locale available on the
system. This causes the category in the locale being
created to be a copy of the same category in the named
locale. Since the copy keyword defines the entire category,
if used, it must be the only keyword in the category.
The following six categories are recognized:
LC_CTYPE:
This category defines character classification, case conversion
and other character attributes. The following predefined
character classifications are recognized:
Hewlett-Packard Company - 1 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
upper Character codes classified as uppercase
letters. Characters specified in the cntrl,
digit, punct or space classifications cannot
be specified in this category.
lower Character codes classified as lowercase
letters. Same restrictions applicable to the
upper category apply to this classification.
digit Character codes classified as numeric. Only
ten characters in contiguous ascending
sequence by numerical value can be specified.
Alternative digits cannot be specified here.
space Character codes classified as white-space. No
character specified for the upper, lower,
alpha, digit, graph or xdigit categories can
be included in this classification.
punct Character codes classified as punctuation
characters. No character included in the
upper, lower, alpha, digit, cntrl, xdigit or
space categories can be specified.
cntrl Character codes classified as control
characters. No character included in the
upper, lower, alpha, digit, punct, graph,
print or xdigit can be included here.
blank Character codes classified as blank
characters. The <space> and <tab> characters
are automatically included.
xdigit Character codes classified as hexadecimal
digits. Only the characters defined for the
digit class can be specified, followed by one
or more sets of six characters, with each set
in ascending order.
alpha Character codes classified as letters.
Characters classified as cntrl, digit, punct
or space cannot be specified. Characters
specified as upper and lower classes are
automatically included in this class.
print Character codes classified as printable
characters. Characters specified for upper,
lower, alpha, digit, xdigit, and punct
classes and the <space> character are
automatically included. No character from the
cntrl category can be specified.
Hewlett-Packard Company - 2 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
graph Character codes classified as printable
characters, except the <space> character. In
all other respect this classification is
similar to the print category.
The following two are special classifications, used to designate
valid first-of-two and second-of-two bytes. Note that these are
byte classifications and not character classifications; hence,
they cannot be used with the iswctype interface (see wctype(3C)),
in the same manner as the other classifications can be used.
first Valid first bytes of two-byte characters.
second Valid second bytes of two-byte characters.
Character case conversion definitions:
toupper Lowercase to uppercase character
relationships.
tolower Uppercase to lowercase character
relationships.
Miscellaneous character attribute and classifications:
alt_punct String mapped into the ASCII equivalent
string ``b!"#$%&'()*+,-./:;<=>?@[\]^_`{}~'',
where b is a blank (a langinfo(5) item).
charclass Defines one or more locale-specific character
class names as strings separated by
semicolons. Each named character class can
then be defined subsequently in the LC_CTYPE
definition. The first character of a
character class name must be a letter and the
class name cannot match any of the predefined
classifications (e.g., space, letter, cntrl).
direction String operand indicates text direction (a
langinfo(5) item). String operand "1"
indicates right-to-left text direction.
context String operand indicates character context
analysis. String "1" indicates Arabic context
analysis is required.
LC_COLLATE:
The LC_COLLATE category provides collation sequence definition
for relative ordering between collating elements (single- and
multi-character collating elements) in the locale. The following
keywords belong to this category and should come between the
Hewlett-Packard Company - 3 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
category tag LC_COLLATE and END LC_COLLATE. The first two
keywords can be in any order, but must come before the
order_start keyword. Any number of the first two keywords can be
specified.
collating-element <symbol> from string
Defines a multi-character collating element,
symbol, composed of the characters in string.
String is limited to two characters.
collating-symbol <symbol>
Makes symbol a collating symbol which can be
used to define a place in the collating
sequence. Symbol does not represent any
actual character.
order_start Denotes the start of the collation sequence.
The directives have an effect on string
collation.
The lines following the order_start keyword
and before the order_end keyword contain
collating element entries, one per line.
Operands can optionally appear after the
order_start keyword to defined rules for
string comparison using a multiple-weight
scheme (if no operands are specified, a
single forward operand is assumed). The
possible operands are:
forward Specifies that comparison
operations proceed from start of
string towards the end of it.
backward Specifies that comparison
operations proceed from end of
string towards the beginning of it.
order_end Marks the end of the list of collating
element entries.
LC_MONETARY:
The LC_MONETARY category defines the rules and symbols used to
format monetary numeric information. The following keywords
belong to this category and should come between the category tag
LC_MONETARY and END LC_MONETARY:
int_curr_symbol
The operand is a four-character string used
to designate the international currency
Hewlett-Packard Company - 4 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
symbol.
currency_symbol
The operand is a string used as the local
currency symbol.
mon_decimal_point
The operand is a string containing the symbol
used as the decimal delimiter (radix
character).
mon_thousands_sep
The operand is a string containing the symbol
used as a separator for groups of digits to
the left of decimal delimiter.
mon_grouping The operand is a semicolon-separated list of
integers. The initial integer defines the
size of the group immediately preceding the
decimal delimiter, and the following integers
define the preceding groups. If the last
integer is not -1, then the size of the
previous group (if any) will be repeatedly
used for the remainder of the digits. If the
last integer is -1, then no further grouping
will be performed.
positive_sign The operand is a srting to indicate a nonnegative
monetary quantity.
negative_sign The operand is a srting to indicate a
negative monetary quantity.
int_frac_digits
The operand is an integer representing the
number of fractional digits used in formatted
monetary values using int_curr_symbol.
frac_digits The operand is an integer representing the
number of fractional digits used in formatted
monetary values using currency_symbol.
p_cs_precedes The operand is an integer which if set to 1
indicates the currency_symbol or
int_curr_symbol precedes a monetary quantity,
and if set to 0 the symbol succeeds the
value.
p_sep_by_space The operand is an integer which if set to 1
indicates a space separates the
currency_symbol or int_curr_symbol from the
Hewlett-Packard Company - 5 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
value, and otherwise if set to 0.
n_cs_precedes The operand is an integer which if set to 1
indicates the currency_symbol or
int_curr_symbol precedes a negative monetary
quantity, and if set to 0 the symbol succeeds
the negative value.
n_sep_by_space The operand is an integer which if set to 1
indicates a space separates the
currency_symbol or int_curr_symbol from
negative monetary value, and otherwise if set
to 0.
p_sign_posn The operand is an integer which setting
indicates the positioning of the
positive_sign for a non-negative monetary
quantity. The possible values are:
0 Parenthesis surround the quantity
and the currency_symbol or
int_curr_symbol.
1 The sign string precedes the
quantity and the currency_symbol or
int_curr_symbol.
2 The sign string succeeds the
quantity and the currency_symbol or
int_curr_symbol.
3 The sign string precedes the
currency_symbol or int_curr_symbol.
4 The sign string succeeds the
currency_symbol or int_curr_symbol.
n_sign_posn The operand is an integer which setting
parallels that of p_sign_posn, but for
negative monetary quantities.
LC_NUMERIC:
The LC_NUMERIC category defines rules and symbols used to format
non-monetary numeric information. The following keywords belong
to this category and should come between the category tag
LC_NUMERIC and END LC_NUMERIC:
decimal_point The operand is a string containing the symbol
used as the decimal delimiter (radix
character) in numeric, non-monetary formatted
quantities. This keyword cannot be omitted
Hewlett-Packard Company - 6 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
and cannot be set to the empty string.
thousands_sep The operand is a string containing the symbol
used as a separator for groups of digits to
the left of the decimal delimiter.
grouping The operand is a semicolon-separated list of
integers. The initial integer defines the
size of the group immediately preceding the
decimal delimiter, and the following integers
define the preceding groups. If the last
integer is not -1, then the size of the
previous group (if any) will be repeatedly
used for the remainder of the digits. If the
last integer is -1, then no further grouping
will be performed.
alt_digit String mapped into the ASCII equivalent
string ``0123456789b+-.,eE'', where b is a
blank (a langinfo(5) item). The alt_digit
keyword is a HP extension to the localedef
POSIX standards and it has a different
meaning than the alt_digits defined in POSIX
standards.
LC_TIME:
The LC_TIME category defines the rules for generating localespecific
formatted date strings. The following mandatory
keywords belong to this category and should come between the
category tag LC_TIME and END LC_TIME:
abday Seven semicolon-separated strings giving
abbreviated names for the days of the week
beginning with Sunday.
day Seven semicolon-separated strings giving full
names for the days of the week beginning with
Sunday.
abmon Twelve semicolon-separated strings giving
abbreviated names for the months, beginning
with January.
mon Twelve semicolon-separated strings giving
full names for the months, beginning with
January.
d_t_fmt The operand is a string defining the
appropriate date and time representation.
Hewlett-Packard Company - 7 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
d_fmt The operand is a string defining the
appropriate date representation.
t_fmt The operand is a string defining the
appropriate time representation.
am_pm The operand is two semicolon-separated
strings giving the representations for AM and
PM.
t_fmt_ampm The operand is a string defining the
appropriate time representation in the 12-
hour clock format with am_pm.
era The operand is a semi-colon-separated list of
strings. Each string defines the name and
date of an era or emperor for a locale. Each
string should conform to the following
format:
direction:offset:start_date:end_date:name:format
where:
direction Either a + or - character.
The + character indicates
the time axis should be such
that the years count in the
positive direction when
moving from the starting
date towards the ending
date. The - character
indicates the time axis
should be such that the
years count in the negative
direction when moving from
the starting date towards
the ending date.
offset A number in the range
[SHRT_MIN,SHRT_MAX]
indicating the number of the
first year of the era.
start_date A date in the form
yyyy/mm/dd where yyyy, mm,
and dd are the year, month
and day numbers,
respectively, of the start
of the era. Years prior to
the year 0 A.D. are
Hewlett-Packard Company - 8 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
represented as negative
numbers. For example, an
era beginning March 5th in
the year 100 B.C. would be
represented as 3-100/3/5.
Years in the range
[SHRT_MIN+1,SHRT_MAX-1] are
supported.
end_date The ending date of the era
in the same form as the
start_date above or one of
the two special values -* or
+*. A value of -* indicates
the ending date of the era
extends to the beginning of
time while +* indicates it
extends to the end of time.
The ending date can be
chronologically either
before or after the starting
date of an era. For
example, the expressions for
the Christian eras A.D. and
B.C. would be:
+:0:0000/01/01:+*:A.D.:%o %N
+:1:-0001/12/31:-*:B.C.:%o %N
name A string representing the
name of the era which is
substituted for the %N
directive of date and
strftime() (see date(1) and
strftime(3C)).
format A string for formatting the
%E directive of date(1) and
strftime(3C). This string
is usually a function of the
%o and %N directives. If
format is not specified, the
string specified for the
LC_TIME category keyword
era_d_fmt (see below) is
used as a default.
era_d_fmt The operand is a string defining the format
of date in era notation.
Hewlett-Packard Company - 9 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
era_t_fmt The operand is a string defining the format
of time in era notation.
era_d_t_fmt The operand is a string defining the format
of date and time in era notation.
alt_digits The operand is a semi-colon-separated list of
strings. The first string is the alternative
symbol corresponding to zero, the second
string is the alternative symbol
corresponding to one, and so on. Note that
if the HP-UX-proprietary alt_digit keyword
has been specified in the same locale, the
first ten symbols should be identical for
these two keywords.
In addition to the above, the following HP-UX-proprietary
keywords are recognized (these are provided for backward
compatibility and their use is otherwise not recommended):
year_unit, mon_unit, day_unit, rour_unit, min_unit, sec_unit.
LC_MESSAGES:
The LC_MESSAGES category defines the format and values for
affirmative and negative responses. The following keywords
belong to this category and should come between the category tag
LC_MESSAGES and END LC_MESSAGES:
yesexpr The string operand is an Extended Regular
Expression matching acceptable affirmative
responses to yes/no queries.
noexpr The string operand is an Extended Regular
Expression matching acceptable negative
responses to yes/no queries.
yesstr The string operand identifies the affirmative
response for yes/no questions. This keyword
is now obsolete and yesexpr should be used
instead.
nostr The string operand identifies the negative
response for yes/no questions This keyword is
now obsolete and noexpr should be used
instead.
Keyword Operands [Toc] [Back]
Keyword operands consist of character-code constants and symbols,
strings, and metacharacters. The types of legal expressions are:
character lists, string lists, integer lists, shift, collating element
entries, regular expression, character constants and string:
Hewlett-Packard Company - 10 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
character lists
character list operands consist of single charactercode
constants or symbolic names separated by
semicolons, or a character-code range consisting of a
constant or symbolic name followed by an ellipsis
followed by another constant or symbolic name. The
constant preceding the ellipsis must have a smaller
code value than the constant following the ellipsis. A
range represents a set of consecutive character codes.
If the list is longer than a single line, the escape
character must be used at the end of each line as a
continuation character. It is an error to use any
symbolic name that is not defined in an accompanying
charmap file (see charmap(4)).
string lists
string list operands consist of strings separated by
semicolons. If longer than one line, the escape
character must be used for continuation.
string string operands consist of a sequence of zero or more
characters surrounded by double quotes ("). Within a
string, the double-quote character must be preceded by
an escape character. The following escape sequences
also can be used:
\n newline
\t horizontal tab
\b backspace
\r carriage return
\f form feed
\\ backslash
\' single quote
\ddd bit pattern
The escape \ddd consists of the escape
character followed by 1, 2, or 3 octal digits
specifying the value of the desired character
(for other possible bit pattern specification,
see character constants below). Also, an
escape character (\) and an immediatelyfollowing
newline are ignored.
Hewlett-Packard Company - 11 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
Although the backslash (\) has been used for
illustration, another escape character can be
substituted by the escape_char keyword.
character constants
Constants represent character codes in the operands.
They can be used in the following forms:
decimal constants An escape character followed by
a 'd' followed by up to three
decimal digits.
octal constants An escape character followed by
up to three octal digits.
hexadecimal constants An escape character followed by
a 'x' followed by two
hexadecimal digits.
character constants A single character (e.g., A)
having the numerical value of
the character in the machine's
character set.
symbolic names A string enclosed between < and
> is a symbolic name. localedef
input files are recommended to
be written entirely in symbolic
names, utilizing a user defined
or system-supplied charmap file.
This aids portability of
localedef input files between
different encoded character sets
(see charmap(4)).
Symbolic names can be defined
within a locale definition file
by the collating-element and
collating-symbol keywords.
These are not character
constants. It is an error if
such an internally defined
symbolic name collides with one
defined in a charmap file.
integer lists
Integer list operands consists of one or more decimal
digits separated by semicolons.
shift Shift operands follow keywords toupper and tolower, and
must consist of two character-code constants enclosed
Hewlett-Packard Company - 12 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
by left and right parentheses and separated by a comma.
Each such character pair is separated from the next by
a semicolon. For tolower, the first constant
represents an uppercase character and the second the
corresponding lowercase character. For toupper, the
first constant represents an lowercase character and
the second the corresponding uppercase character.
collating element entry
The order_start keyword is followed by collating
element entries, one per line, in ascending order by
collating position. The collating element entries have
the form:
collation_element[weight[;weight]]
collation_element can be a character, a collating
symbol enclosed in angle brackets representing a
character or collating element, the special symbol
UNDEFINED or an ellipsis (...).
A character stands for itself; a collating symbol can
be a symbolic name for a character that is interpreted
by the charmap file, a multi-character collating
element defined by a collating-element keyword, or a
collating symbol defined by the collating-symbol
keyword.
The special symbol UNDEFINED specifies the collating
position of any characters not explicitly defined by
collating element entries. For example, if some group
of characters is to be omitted from the collation
sequence and just collate after all defined characters,
a collating symbol might be defined before the
order_start keyword:
collating-symbol <HIGH>
Then somewhere in the list of collating element
entries:
UNDEFINED <HIGH>
Notice that there is no second weight. This means that
on a second pass all characters collate by their
encoded value.
An ellipsis is interpreted as a list of characters with
an encoded value higher than that of the character on
the preceding line and lower than that on the following
line. Because it is tied to encoded value of
Hewlett-Packard Company - 13 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
characters, the ellipsis is inherently non-portable.
If it is used, a warning is issued and no output
generated unless the -c option was given.
The weight operands provide information about how the
collating element is to be collated on first and
subsequent passes. Weight can be a two-character
string, the special symbol IGNORE, or a collating
element of any of the forms specified for
collating_element except UNDEFINED. If there are no
weights, the character is collating strictly by its
position in the list. If there is only one weight
given, the character sorts by its relative position in
the list on the second collation pass.
An equivalence class is defined by a series of
collating element entries all having the same character
or symbol in the first weight position. For example,
in many locales all forms of the character 'A' collate
equal on the first pass. This is represented in the
collating element entries as:
'A' 'A';'A' # first element of equivalence class
'a' 'A';'a' # next element of class
Two-to-one collating elements are specified by
collating-elements defined before the order_start
keyword. For example, the two-to-one collating element
CH in Spanish, would be defined before the order_start
keyword as
collating element <CH> from "CH"
It would then be used in a collating element entry as
<CH>.
A one-to-two collating element is defined by having a
two-character string in one of the weight positions.
For example, if the character 'X' collates equal to the
pair "AE", the collating element entry would be:
'X' "AE";'X'
A don't-care character is defined by the special symbol
IGNORE. For example, the dash character, '-' may be a
don't care on the first collation pass. The collating
element entry is:
'-' IGNORE;'-'
Hewlett-Packard Company - 14 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
Symbols defined by the collating-symbol keyword can be
used to indicate that a given character collates higher
or lower than some position in the sequence. For
example if all characters with an encoded value less
than that of '0' are to collate lower than all other
characters on the first pass, and in relative order on
the second pass, define a collating symbol before the
order_start keyword:
collating-symbol <LOW>
The first two collating element entries are then:
... <LOW>;...
'0' '0';'0'
This also illustrates the use of the ellipsis to
indicate a range. The first ellipsis is interpreted as
"all characters in the encoded character set with a
value lower than '0'"; the second ellipsis means that
all characters in the range defined by the first
collate in relative order.
regular expression
regular expression operands conform to the Extended
Regular Expressions specifications as described in
regexp(5).
Metacharacters [Toc] [Back]
Metacharacters are characters having a special meaning to localedef in
operands. To escape the special meaning of these characters, surround
them with single quotes or precede them by an escape character.
localedef meta-characters include:
< Indicates the beginning of a symbolic name.
> Indicates the end of a symbolic name.
( Indicates the beginning of a character shift pair
following the toupper and tolower keywords.
) Indicates the end of a character shift pair.
, Used to separate the characters of a character shift
pair.
" Used to quote strings.
; Used as a separator in list operands.
Hewlett-Packard Company - 15 - HP-UX 11i Version 2: August 2003
localedef(4) localedef(4)
escape character
Used to escape special meaning from other metacharacters
and itself. It is backslash (\) by default, but can be
redefined by the escape_char keyword.
Comments [Toc] [Back]
Comments are lines beginning with a comment character. The comment
character is pound sign (#) by default, but can be redefined by the
comment_char keyword. Comments and blank lines are ignored.
Separators [Toc] [Back]
Separator characters include blanks and tabs. Any number of
separators can be used to delimit the keywords, metacharacters,
constants and strings that comprise a localedef script except that all
characters between < and > are considered to be part of the symbolic
name even they are <blank>s.
EXAMPLE [Toc] [Back]
Please see the files under /usr/lib/nls/loc/src for examples of locale
description files. These files were used to create the various
locales which are delivered with HP-UX.
Hewlett-Packard Company - 16 - HP-UX 11i Version 2: August 2003 [ Back ] |