sort - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> sort (1)

sort(1)

NAME [Toc] [Back]

       sort - Sorts or merges files

SYNOPSIS [Toc] [Back]

       sort  [-m]  [-o  output_file]  [-Abdfinru]  [-k keydef]...
       [-t character]    [-T    directory]    [-y]    [kilobytes]
       [-z record_size]... file...

       sort  -c   [-u]  [-Abdfinru] [-k keydef]... [-t character]
       [-T  directory]   [-y]   [kilobytes]   [-z record_size]...
       file...

       The  following older syntax is now maintained for backward
       compatibility, but may be withdrawn in future issues: sort
       [-Abcdfimnru]  [-o  output_file] [-t character] [-T directory]
 [-y] [kilobytes] [-z record_size] [+fskip]  [.cskip]
       [-fskip] [.cskip] [-bdfinr]... file...

STANDARDS [Toc] [Back]

       Interfaces  documented  on  this reference page conform to
       industry standards as follows:

       sort:  XCU5.0

       Refer to the standards(5) reference page for more information
 about industry standards and associated tags.

OPTIONS [Toc] [Back]

       The  -d,  -f,  -i, -n, and -r options override the default
       ordering rules.  When ordering options appear  independent
       of  any  key  field  specifications,  the  requested field
       ordering rules are applied  globally  to  all  sort  keys.
       When  attached  to  a specific key (see -k), the specified
       ordering options override all global ordering options  for
       that  key.   In  the  obsolescent forms, if one or more of
       these options follows a +fskip option, it affects only the
       key  field  specified  by  that  preceding option.  [Tru64
       UNIX]  Sorts on a byte-by-byte basis  using  each  character's
 encoded value.  On some systems, extended characters
       will be considered negative values,  and  so  sort  before
       ASCII  characters.  If you are sorting ASCII characters in
       a non-C/POSIX locale, this option  performs  much  faster.
       Ignores  leading  spaces  and  tabs  when  determining the
       starting and ending positions of a  restricted  sort  key.
       If  the -b option is specified before the first -k option,
       the -b option is applied to all -k options on the  command
       line;  otherwise,  the  -b  option  can  be  independently
       attached to each -k  field_start  or  field_end  argument.
       Checks  that the input is sorted according to the ordering
       rules specified in the options and the collating  sequence
       of  the  current  locale.  No output is produced; only the
       exit code is affected.  Specifies  that  only  spaces  and
       alphanumeric  characters (according to the current setting
       of LC_TYPE) are significant in  comparisons.   Treats  all
       lowercase   characters   as  their  uppercase  equivalents
       (according to the current setting of LC_TYPE) for the purposes
  of  comparison.  Sorts only by printable characters
       (according to the current setting of LC_TYPE).   Specifies
       one  or  more (up to 50) restricted sort key field definitions.
  This option replaces the obsolescent  +fskip.cskip
       and  -fskip.cskip  options.  A  field  comprises a maximal
       sequence of non-separating characters and, in the  absence
       of the -t option, any preceding field separator.

              The format of a key field definition is as follows:
              field_start[type][,field_end[type]]

              The field_start and field_end  arguments  define  a
              key  field  that  is restricted to a portion of the
              line, and type is a modifier specified by b, d,  f,
              i,  n, r, or t.  The b modifier behaves like the -b
              option, but applies  only  to  the  field_start  or
              field_end  argument to which it is attached.  The t
              modifier indicates that the key field is  processed
              as  CPU time. The other modifiers behave like their
              corresponding options, but apply only  to  the  key
              field  to  which they are attached; these modifiers
              have this effect  if  specified  with  field_start,
              field_end or both.

              Modifiers  attached  to  a field_start or field_end
              argument override any specifications  made  by  the
              options.   A  missing  field_end argument means the
              last character of the  line.   When  multiple  sort
              keys  are  specified,  it is advisable to specify a
              field_end argument to avoid possible confusion.

              The field_start  portion  of  the  keydef  argument
              takes the following form: field_number[.first_character]


              Fields and characters within  fields  are  numbered
              starting with 1. The field_number and first_character
 pieces, interpreted as positive  decimal  integers,
 specify the character to be used as part of a
              sort key.  If first_character is not specified, the
              default is the first character of the field.

              The  field_end portion of the keydef argument takes
              the following form: field_number[.last_character]

              The  field_number  syntax  is  the  same  as   that
              described   for  field_start.   The  last_character
              argument,  interpreted  as  a  nonnegative  decimal
              integer, specifies the last character to be used as
              part of the sort key.  If last_character  evaluates
              to 0 (zero) or is not specified, the default is the
              last character of the field specified by field_number.


              If  -b  is in effect, characters within a field are
              counted from the first nonspace  character  in  the
              field.  (This applies separately to first_character
              and last_character.)

              If -k is not specified, the default sort key is the
              entire line.

              When  there are multiple key fields, later keys are
              compared only after all  earlier  keys  compare  as
              equal.   Except  when  the  -u option is specified,
              lines that otherwise compare as equal  are  ordered
              as though none of the options -d, -f, -i, -n, or -k
              were present (but with -r still in  effect,  if  it
              was specified) and with all bytes in the lines significant
 to the comparison.

              The algorithm for the -k option can  be  summarized
              as follows:

              /*
               * -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
               *              else +(a-1).(b-1) -(c-1).d
               */  Merges only (assumes sorted input).  Sorts any
              initial numeric strings (including regular  expressions
   consisting  of  optional  spaces,  optional
              dashes, and zero (0) or more digits  with  optional
              radix character and thousands separator, as defined
              by the current locale)  by  arithmetic  value.   An
              empty  digit  string  is  treated  as zero; leading
              zeros and signs on zeros do  not  affect  ordering.
              Only one period (.) can be used in numeric strings.
              All subsequent periods (.) and any character to the
              right  of  the period (.) will be ignored.  Directs
              output to output_file instead of  standard  output.
              The output_file can be the same as one of the input
              files.  Reverses the order of the  specified  sort.
              Sets  the  field  separator character to character.
              The character argument is not considered to be part
              of  a  field (although it can be included in a sort
              key).  Each occurrence of character is  significant
              (for  example, two consecutive occurrences of character
 delimit an empty field).  To specify the  tab
              character  as the field separator, you must enclose
              it in ' ' (single quotes).

              The default field separator is one or more  spaces.
              [Tru64  UNIX]  Places  all the temporary files that
              are created in directory.  Suppresses all  but  one
              in  each  set  of  equal  lines (for example, lines
              whose sort keys match exactly).  Ignored characters
              such  as  leading  tabs  and spaces, and characters
              outside of sort keys are  not  considered  in  this
              type of comparison.

              If  used  with  the -c option, -u checks that there
              are no lines with duplicate keys,  in  addition  to
              checking  that  the  input  file is sorted.  [Tru64
              UNIX]  Starts the sort command using  kilobytes  of
              main storage and adds storage as needed.  (If kilobytes
 is less than  the  minimum  storage  size  or
              greater than the maximum, the minimum or maximum is
              used instead.)  If the -y option  is  omitted,  the
              sort  command starts with the default storage size;
              -y 0 starts with minimum storage, and -y  (with  no
              value) starts with the maximum storage.  The amount
              of storage used by the sort command has a  significant
  impact  on performance.  Sorting a small file
              in a large amount of storage is wasteful.  Prevents
              abnormal  termination  if  lines  being  sorted are
              longer than the default  buffer  size  can  handle.
              When  the -c or -m options are specified, the sorting
 phase is omitted  and  a  system  default  size
              buffer  is  used.   If sorted lines are longer than
              this size,  sort  terminates  abnormally.   The  -z
              option  specifies that the longest line be recorded
              in the sort phase so that adequate buffers  can  be
              allocated  in  the  merge  phase.   The record_size
              argument must be a  value  in  bytes  equal  to  or
              greater  than  the  number  of bytes in the longest
              line to be merged.  Specifies the start position of
              a  key  field.  See the -k option for a description
              of the  current  way  to  perform  this  operation.
              (Obsolescent)

              The  fskip  variable specifies the number of fields
              to skip from the beginning of the input  line,  and
              the  cskip  variable  specifies the number of additional
 characters to skip to the right beyond  that
              point.   For both the starting point (+fskip.cskip)
              and the ending point (-fskip.cskip) of a sort  key,
              fskip  is  measured from the beginning of the input
              line, and cskip is measured  from  the  last  field
              skipped.   If you omit assumed.  If you omit fskip,
              0 (zero) is assumed.  If you omit the ending  field
              specifier  (-fskip.cskip),  the  end of the line is
              the end of the sort key.

              You can supply more than one sort key by  repeating
              +fskip.cskip  and -fskip.cskip.  In cases where you
              specify more than one sort key, keys specified further
  to the right on the command line are compared
              only after all earlier keys are sorted.  For  example,
  if the first key is to be sorted in numerical
              order and the second  according  to  the  collating
              sequence,  all strings that start with the number 1
              are sorted according to the collating order  before
              the  strings  that  start with the number 2.  Lines
              that are identical in all keys are sorted with  all
              characters  significant.  You can also specify different
 options for different sort keys in  multiple
              sort  keys.   Specifies  the  end position of a key
              field.  See the -k option for a description of  the
              current  way  to perform this operation.  (Obsolescent)

DESCRIPTION [Toc] [Back]

       The sort command sorts lines in its input files and writes
       the result to standard output.

       The  sort command performs one of the following functions:
       Sorts lines of all the named files together and writes the
       result  to  the specified output.  Merges lines of all the
       named (presorted) files together and writes the result  to
       the  specified output.  Checks that a single input file is
       correctly presorted.

       Comparisons are based on one or more sort  keys  extracted
       from  each  line  of  input (or the entire line if no sort
       keys are specified), and are performed using the collating
       sequence of the current locale.

       The sort command treats all of its input files as one file
       when it performs the sort.  A - (dash) in place of a  file
       name  specifies  standard  input.  If you do not specify a
       file name, it sorts standard input.

       The sort command can handle a variety of  collation  rules
       typically  used  in  Western European languages, including
       primary/secondary sorting, one-to-two  character  mapping,
       N-to-one  character mapping, and ignore-character mapping.
       To summarize briefly:








   Primary/Secondary Sorting
       In this system, a group of characters all sort to the same
       primary  location.  If there is a tie, a secondary sort is
       applied.  For example, in French, the plain  and  accented
       a's all sort to the same primary location.  If two strings
       collate to the same primary location, the  secondary  sort
       goes  into  effect.   These  words  are  in correct French
       order:

       abord pre aprs pret azur


   One-to-Two Character Mappings    [Toc]    [Back]
       This system requires that  certain  single  characters  be
       treated  as  if they were two characters.  For example, in
       German, the  (scharfes-S) is collated as if it were ss.

   N-to-One Character Mappings    [Toc]    [Back]
       Some languages treat a string of characters as if it  were
       one  single  collating  element.  For example, in Spanish,
       the ch and ll sequences are treated as their own  elements
       within  the  alphabet.   (ch  comes between c and d in the
       alphabet, and ll comes between l and m.)

   Ignore-Character Mappings    [Toc]    [Back]
       In some cases, certain characters may be ignored in collation.
  For example, if - were defined as an ignore-character,
 the strings re-locate and relocate would sort to  the
       same  place.  The results that you get from sort depend on
       the collating sequence as defined by the  current  setting
       of the LC_COLLATE environment variable.  The configuration
       files for collation and character classification  information
  are  /usr/lib/nls/loc/src/locale.src. A field is one
       or more characters bounded by the beginning of a line  and
       the  current  field  separator,  or one or more characters
       bounded by a field separator on either  side.   The  space
       character  is  the  default  field separator. Lines longer
       than 1024 bytes are truncated by sort.  The maximum number
       of fields on a line is 50.

EXIT STATUS [Toc] [Back]

       The  sort  command  returns the following exit values: All
       input files were output successfully, or -c was  specified
       and  the  input  file  was correctly sorted.  Under the -c
       option, the file was not ordered as specified, or  if  the
       -c  and  -u  options  were both specified, two input lines
       were found with equal keys.  An error occurred.

EXAMPLES [Toc] [Back]

       The following examples apply to the C locale, unless it is
       specifically  stated otherwise.  To perform a simple sort,
       enter: sort fruits

              This displays the  contents  of  fruits  sorted  in
              ascending lexicographic order.  This means that the
              characters in each column are compared one by  one,
              including spaces, digits, and special characters.

              For instance, if fruits contains the text:

              banana orange Persimmon apple %%banana apple ORANGE

              Then sort fruits displays: %%banana ORANGE  Persimmon
 apple apple banana orange

              This  order follows from the fact that in the ASCII
              collating sequence, symbols  (such  as  %)  precede
              uppercase  letters,  and all uppercase letters precede
 the lowercase letters. If you are using a different
 collating order, your results may be different.
  To group lines  that  contain  uppercase  and
              special  characters  with  similar lowercase lines,
              and remove duplicate lines, enter: sort  -d  -f  -u
              fruits

              The -u option tells sort to remove duplicate lines,
              making each line of the  file  unique.   This  displays:
 apple %%banana orange Persimmon

              Not  only  was  the  duplicate  apple  removed, but
              banana and ORANGE were  removed  as  well.  The  -d
              option told sort to ignore symbols, so %%banana and
              banana were considered to be  duplicate  lines  and
              banana was removed.  The -f option told sort not to
              differentiate between uppercase and  lowercase,  so
              ORANGE  and  orange were considered to be duplicate
              lines and ORANGE was removed.

              When the -u option is used with input that contains
              nonidentical lines that are considered by sort (due
              to other options) to be duplicates, there is no way
              to  predict which lines sort will keep and which it
              will remove.  To sort as in Example 2,  but  remove
              duplicates unless capitalized or punctuated differently,
 enter: sort -u -k 1df -k 1 fruits

              Options appearing between sort key specifiers apply
              only  to  the  specifier preceding them.  There are
              two sorts specified in this command  line.  The  -k
              1df  argument specifies the first sort, of the same
              type done with -d -f in Example 3.  Then -k 1  performs
  another comparison to distinguish lines that
              are not  actually  identical.   This  prevents  -u,
              which applies to both sorts because it precedes the
              first sort key specifier, from removing lines  that
              are not exactly identical to other lines.

              Given the fruits file shown in Example 1, the added
              -k 1 distinguishes %%banana from banana and  ORANGE
              from  orange.  However,  the two instances of apple
              are exactly identical, so one of them  is  deleted.
              apple  %%banana  banana  ORANGE orange Persimmon To
              specify a new field separator, enter: sort -t :  -k
              2 vegetables

              This sorts vegetables, comparing the text that follows
 the first colon on each line.  The -t : option
              tells  sort  that  colons separate fields. The -k 2
              argument tells sort to ignore the first  field  and
              to  compare  from  the start of the second field to
              the end of the line.  If vegetables contains:

              yams:104 turnips:8  potatoes:15  carrots:104  green
              beans:32 radishes:5 lettuce:15

              then  sort  -t  :  -k  2  vegetables displays: carrots:104
  yams:104  lettuce:15  potatoes:15   green
              beans:32 radishes:5 turnips:8

              The  numbers  are  not  in ascending order. This is
              because  a   lexicographic   sort   compares   each
              character  from  left  to right.  In other words, 3
              comes before 5 so 32 comes before 5.   To  sort  on
              more  than  one field, enter: sort -t : -k 2n -k 1r
              vegetables

              This performs a numeric sort on  the  second  field
              (-k  2n)  and then, within that ordering, sorts the
              first field in reverse  collating  order  (-k  1r).
              The  output  looks  like this: radishes:5 turnips:8
              potatoes:15 lettuce:15 green beans:32 yams:104 carrots:104


              The  lines  are  sorted  in numeric order; when two
              lines have the same number, they appear in  reverse
              collating order.  To replace the original file with
              the sorted text, enter: sort -o vegetables  vegetables


              The  -o  vegetables option stores the sorted output
              into the file vegetables.  To collate using Spanish
              rules,  set  the  LC_COLLATE  (or LANG) environment
              variable to a Spanish locale, and then use sort  in
              the regular way, enter: sort sp.words

              If  an  input file named sp.words contains the following
 Spanish words:

              dama loro chapa canto mover chocolate curioso  llanura


              The  sorted  file  looks  like  this: canto curioso
              chapa chocolate dama loro llanura mover

              If you sort the file in the default C  locale,  the
              output  looks  like  this:  canto  chapa  chocolate
              curioso dama llanura loro mover

ENVIRONMENT VARIABLES [Toc] [Back]

       The following environment variables affect  the  execution
       of  sort:  Provides a default value for the internationalization
 variables that are unset or null. If LANG is unset
       or  null,  the corresponding value from the default locale
       is used.  If any  of  the  internationalization  variables
       contain an invalid setting, the utility behaves as if none
       of the variables had been defined.  If set to a  non-empty
       string value, overrides the values of all the other internationalization
 variables.  Determines the locale for  the
       interpretation of sequences of bytes of text data as characters
 (for example, single-byte as opposed  to  multibyte
       characters  in  arguments)  and  the behavior of character
       classification for the -b, -d, -f,  -i,  and  -n  options.
       Determines the locale for the format and contents of diagnostic
 messages written to standard error.  Determines the
       location  of  message  catalogues  for  the  processing of
       LC_MESSAGES.

FILES [Toc] [Back]

       Configuration files

sort(1)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

STANDARDS [Toc] [Back]

OPTIONS [Toc] [Back]

DESCRIPTION [Toc] [Back]

EXIT STATUS [Toc] [Back]

EXAMPLES [Toc] [Back]

ENVIRONMENT VARIABLES [Toc] [Back]

FILES [Toc] [Back]

SEE ALSO [Toc] [Back]