sort - Sorts or merges files
sort [-m] [-o output_file] [-Abdfinru] [-k keydef]...
[-t character] [-T directory] [-y] [kilobytes]
[-z record_size]... file...
sort -c [-u] [-Abdfinru] [-k keydef]... [-t character]
[-T directory] [-y] [kilobytes] [-z record_size]...
file...
The following older syntax is now maintained for backward
compatibility, but may be withdrawn in future issues: sort
[-Abcdfimnru] [-o output_file] [-t character] [-T directory]
[-y] [kilobytes] [-z record_size] [+fskip] [.cskip]
[-fskip] [.cskip] [-bdfinr]... file...
Interfaces documented on this reference page conform to
industry standards as follows:
sort: XCU5.0
Refer to the standards(5) reference page for more information
about industry standards and associated tags.
The -d, -f, -i, -n, and -r options override the default
ordering rules. When ordering options appear independent
of any key field specifications, the requested field
ordering rules are applied globally to all sort keys.
When attached to a specific key (see -k), the specified
ordering options override all global ordering options for
that key. In the obsolescent forms, if one or more of
these options follows a +fskip option, it affects only the
key field specified by that preceding option. [Tru64
UNIX] Sorts on a byte-by-byte basis using each character's
encoded value. On some systems, extended characters
will be considered negative values, and so sort before
ASCII characters. If you are sorting ASCII characters in
a non-C/POSIX locale, this option performs much faster.
Ignores leading spaces and tabs when determining the
starting and ending positions of a restricted sort key.
If the -b option is specified before the first -k option,
the -b option is applied to all -k options on the command
line; otherwise, the -b option can be independently
attached to each -k field_start or field_end argument.
Checks that the input is sorted according to the ordering
rules specified in the options and the collating sequence
of the current locale. No output is produced; only the
exit code is affected. Specifies that only spaces and
alphanumeric characters (according to the current setting
of LC_TYPE) are significant in comparisons. Treats all
lowercase characters as their uppercase equivalents
(according to the current setting of LC_TYPE) for the purposes
of comparison. Sorts only by printable characters
(according to the current setting of LC_TYPE). Specifies
one or more (up to 50) restricted sort key field definitions.
This option replaces the obsolescent +fskip.cskip
and -fskip.cskip options. A field comprises a maximal
sequence of non-separating characters and, in the absence
of the -t option, any preceding field separator.
The format of a key field definition is as follows:
field_start[type][,field_end[type]]
The field_start and field_end arguments define a
key field that is restricted to a portion of the
line, and type is a modifier specified by b, d, f,
i, n, r, or t. The b modifier behaves like the -b
option, but applies only to the field_start or
field_end argument to which it is attached. The t
modifier indicates that the key field is processed
as CPU time. The other modifiers behave like their
corresponding options, but apply only to the key
field to which they are attached; these modifiers
have this effect if specified with field_start,
field_end or both.
Modifiers attached to a field_start or field_end
argument override any specifications made by the
options. A missing field_end argument means the
last character of the line. When multiple sort
keys are specified, it is advisable to specify a
field_end argument to avoid possible confusion.
The field_start portion of the keydef argument
takes the following form: field_number[.first_character]
Fields and characters within fields are numbered
starting with 1. The field_number and first_character
pieces, interpreted as positive decimal integers,
specify the character to be used as part of a
sort key. If first_character is not specified, the
default is the first character of the field.
The field_end portion of the keydef argument takes
the following form: field_number[.last_character]
The field_number syntax is the same as that
described for field_start. The last_character
argument, interpreted as a nonnegative decimal
integer, specifies the last character to be used as
part of the sort key. If last_character evaluates
to 0 (zero) or is not specified, the default is the
last character of the field specified by field_number.
If -b is in effect, characters within a field are
counted from the first nonspace character in the
field. (This applies separately to first_character
and last_character.)
If -k is not specified, the default sort key is the
entire line.
When there are multiple key fields, later keys are
compared only after all earlier keys compare as
equal. Except when the -u option is specified,
lines that otherwise compare as equal are ordered
as though none of the options -d, -f, -i, -n, or -k
were present (but with -r still in effect, if it
was specified) and with all bytes in the lines significant
to the comparison.
The algorithm for the -k option can be summarized
as follows:
/*
* -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
* else +(a-1).(b-1) -(c-1).d
*/ Merges only (assumes sorted input). Sorts any
initial numeric strings (including regular expressions
consisting of optional spaces, optional
dashes, and zero (0) or more digits with optional
radix character and thousands separator, as defined
by the current locale) by arithmetic value. An
empty digit string is treated as zero; leading
zeros and signs on zeros do not affect ordering.
Only one period (.) can be used in numeric strings.
All subsequent periods (.) and any character to the
right of the period (.) will be ignored. Directs
output to output_file instead of standard output.
The output_file can be the same as one of the input
files. Reverses the order of the specified sort.
Sets the field separator character to character.
The character argument is not considered to be part
of a field (although it can be included in a sort
key). Each occurrence of character is significant
(for example, two consecutive occurrences of character
delimit an empty field). To specify the tab
character as the field separator, you must enclose
it in ' ' (single quotes).
The default field separator is one or more spaces.
[Tru64 UNIX] Places all the temporary files that
are created in directory. Suppresses all but one
in each set of equal lines (for example, lines
whose sort keys match exactly). Ignored characters
such as leading tabs and spaces, and characters
outside of sort keys are not considered in this
type of comparison.
If used with the -c option, -u checks that there
are no lines with duplicate keys, in addition to
checking that the input file is sorted. [Tru64
UNIX] Starts the sort command using kilobytes of
main storage and adds storage as needed. (If kilobytes
is less than the minimum storage size or
greater than the maximum, the minimum or maximum is
used instead.) If the -y option is omitted, the
sort command starts with the default storage size;
-y 0 starts with minimum storage, and -y (with no
value) starts with the maximum storage. The amount
of storage used by the sort command has a significant
impact on performance. Sorting a small file
in a large amount of storage is wasteful. Prevents
abnormal termination if lines being sorted are
longer than the default buffer size can handle.
When the -c or -m options are specified, the sorting
phase is omitted and a system default size
buffer is used. If sorted lines are longer than
this size, sort terminates abnormally. The -z
option specifies that the longest line be recorded
in the sort phase so that adequate buffers can be
allocated in the merge phase. The record_size
argument must be a value in bytes equal to or
greater than the number of bytes in the longest
line to be merged. Specifies the start position of
a key field. See the -k option for a description
of the current way to perform this operation.
(Obsolescent)
The fskip variable specifies the number of fields
to skip from the beginning of the input line, and
the cskip variable specifies the number of additional
characters to skip to the right beyond that
point. For both the starting point (+fskip.cskip)
and the ending point (-fskip.cskip) of a sort key,
fskip is measured from the beginning of the input
line, and cskip is measured from the last field
skipped. If you omit assumed. If you omit fskip,
0 (zero) is assumed. If you omit the ending field
specifier (-fskip.cskip), the end of the line is
the end of the sort key.
You can supply more than one sort key by repeating
+fskip.cskip and -fskip.cskip. In cases where you
specify more than one sort key, keys specified further
to the right on the command line are compared
only after all earlier keys are sorted. For example,
if the first key is to be sorted in numerical
order and the second according to the collating
sequence, all strings that start with the number 1
are sorted according to the collating order before
the strings that start with the number 2. Lines
that are identical in all keys are sorted with all
characters significant. You can also specify different
options for different sort keys in multiple
sort keys. Specifies the end position of a key
field. See the -k option for a description of the
current way to perform this operation. (Obsolescent)
The sort command sorts lines in its input files and writes
the result to standard output.
The sort command performs one of the following functions:
Sorts lines of all the named files together and writes the
result to the specified output. Merges lines of all the
named (presorted) files together and writes the result to
the specified output. Checks that a single input file is
correctly presorted.
Comparisons are based on one or more sort keys extracted
from each line of input (or the entire line if no sort
keys are specified), and are performed using the collating
sequence of the current locale.
The sort command treats all of its input files as one file
when it performs the sort. A - (dash) in place of a file
name specifies standard input. If you do not specify a
file name, it sorts standard input.
The sort command can handle a variety of collation rules
typically used in Western European languages, including
primary/secondary sorting, one-to-two character mapping,
N-to-one character mapping, and ignore-character mapping.
To summarize briefly:
Primary/Secondary Sorting
In this system, a group of characters all sort to the same
primary location. If there is a tie, a secondary sort is
applied. For example, in French, the plain and accented
a's all sort to the same primary location. If two strings
collate to the same primary location, the secondary sort
goes into effect. These words are in correct French
order:
abord pre aprs pret azur
One-to-Two Character Mappings [Toc] [Back]
This system requires that certain single characters be
treated as if they were two characters. For example, in
German, the (scharfes-S) is collated as if it were ss.
N-to-One Character Mappings [Toc] [Back]
Some languages treat a string of characters as if it were
one single collating element. For example, in Spanish,
the ch and ll sequences are treated as their own elements
within the alphabet. (ch comes between c and d in the
alphabet, and ll comes between l and m.)
Ignore-Character Mappings [Toc] [Back]
In some cases, certain characters may be ignored in collation.
For example, if - were defined as an ignore-character,
the strings re-locate and relocate would sort to the
same place. The results that you get from sort depend on
the collating sequence as defined by the current setting
of the LC_COLLATE environment variable. The configuration
files for collation and character classification information
are /usr/lib/nls/loc/src/locale.src. A field is one
or more characters bounded by the beginning of a line and
the current field separator, or one or more characters
bounded by a field separator on either side. The space
character is the default field separator. Lines longer
than 1024 bytes are truncated by sort. The maximum number
of fields on a line is 50.
The sort command returns the following exit values: All
input files were output successfully, or -c was specified
and the input file was correctly sorted. Under the -c
option, the file was not ordered as specified, or if the
-c and -u options were both specified, two input lines
were found with equal keys. An error occurred.
The following examples apply to the C locale, unless it is
specifically stated otherwise. To perform a simple sort,
enter: sort fruits
This displays the contents of fruits sorted in
ascending lexicographic order. This means that the
characters in each column are compared one by one,
including spaces, digits, and special characters.
For instance, if fruits contains the text:
banana orange Persimmon apple %%banana apple ORANGE
Then sort fruits displays: %%banana ORANGE Persimmon
apple apple banana orange
This order follows from the fact that in the ASCII
collating sequence, symbols (such as %) precede
uppercase letters, and all uppercase letters precede
the lowercase letters. If you are using a different
collating order, your results may be different.
To group lines that contain uppercase and
special characters with similar lowercase lines,
and remove duplicate lines, enter: sort -d -f -u
fruits
The -u option tells sort to remove duplicate lines,
making each line of the file unique. This displays:
apple %%banana orange Persimmon
Not only was the duplicate apple removed, but
banana and ORANGE were removed as well. The -d
option told sort to ignore symbols, so %%banana and
banana were considered to be duplicate lines and
banana was removed. The -f option told sort not to
differentiate between uppercase and lowercase, so
ORANGE and orange were considered to be duplicate
lines and ORANGE was removed.
When the -u option is used with input that contains
nonidentical lines that are considered by sort (due
to other options) to be duplicates, there is no way
to predict which lines sort will keep and which it
will remove. To sort as in Example 2, but remove
duplicates unless capitalized or punctuated differently,
enter: sort -u -k 1df -k 1 fruits
Options appearing between sort key specifiers apply
only to the specifier preceding them. There are
two sorts specified in this command line. The -k
1df argument specifies the first sort, of the same
type done with -d -f in Example 3. Then -k 1 performs
another comparison to distinguish lines that
are not actually identical. This prevents -u,
which applies to both sorts because it precedes the
first sort key specifier, from removing lines that
are not exactly identical to other lines.
Given the fruits file shown in Example 1, the added
-k 1 distinguishes %%banana from banana and ORANGE
from orange. However, the two instances of apple
are exactly identical, so one of them is deleted.
apple %%banana banana ORANGE orange Persimmon To
specify a new field separator, enter: sort -t : -k
2 vegetables
This sorts vegetables, comparing the text that follows
the first colon on each line. The -t : option
tells sort that colons separate fields. The -k 2
argument tells sort to ignore the first field and
to compare from the start of the second field to
the end of the line. If vegetables contains:
yams:104 turnips:8 potatoes:15 carrots:104 green
beans:32 radishes:5 lettuce:15
then sort -t : -k 2 vegetables displays: carrots:104
yams:104 lettuce:15 potatoes:15 green
beans:32 radishes:5 turnips:8
The numbers are not in ascending order. This is
because a lexicographic sort compares each
character from left to right. In other words, 3
comes before 5 so 32 comes before 5. To sort on
more than one field, enter: sort -t : -k 2n -k 1r
vegetables
This performs a numeric sort on the second field
(-k 2n) and then, within that ordering, sorts the
first field in reverse collating order (-k 1r).
The output looks like this: radishes:5 turnips:8
potatoes:15 lettuce:15 green beans:32 yams:104 carrots:104
The lines are sorted in numeric order; when two
lines have the same number, they appear in reverse
collating order. To replace the original file with
the sorted text, enter: sort -o vegetables vegetables
The -o vegetables option stores the sorted output
into the file vegetables. To collate using Spanish
rules, set the LC_COLLATE (or LANG) environment
variable to a Spanish locale, and then use sort in
the regular way, enter: sort sp.words
If an input file named sp.words contains the following
Spanish words:
dama loro chapa canto mover chocolate curioso llanura
The sorted file looks like this: canto curioso
chapa chocolate dama loro llanura mover
If you sort the file in the default C locale, the
output looks like this: canto chapa chocolate
curioso dama llanura loro mover
ENVIRONMENT VARIABLES [Toc] [Back] The following environment variables affect the execution
of sort: Provides a default value for the internationalization
variables that are unset or null. If LANG is unset
or null, the corresponding value from the default locale
is used. If any of the internationalization variables
contain an invalid setting, the utility behaves as if none
of the variables had been defined. If set to a non-empty
string value, overrides the values of all the other internationalization
variables. Determines the locale for the
interpretation of sequences of bytes of text data as characters
(for example, single-byte as opposed to multibyte
characters in arguments) and the behavior of character
classification for the -b, -d, -f, -i, and -n options.
Determines the locale for the format and contents of diagnostic
messages written to standard error. Determines the
location of message catalogues for the processing of
LC_MESSAGES.
Configuration files
Commands: comm(1), join(1), uniq(1)
Functions: setlocale(3), tolower(3)
Files: locale(4)
Standards: standards(5)
sort(1)
[ Back ] |