sort sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a file of ‘-’. By default, sort writes the results to standard output. Synopsis:
sort [option]... [file]...
sort has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode:
A pair of lines is compared as follows: sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. If no key fields are specified, sort uses a default key of the entire line. Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. The --unique (-u) option also disables the last-resort comparison.
Unless otherwise specified, all comparisons use the character collating sequence specified by the LC_COLLATE locale.1
gnu sort (as specified for all gnu utilities) has no limit on input line length or restrictions on bytes allowed within lines. In addition, if the final byte of an input file is not a newline, gnu sort silently supplies one. A line's trailing newline is not part of the line for comparison purposes.
0 if no error occurred 1 if invoked with -c or -C and the input is not sorted 2 if an error occurred
If the environment variable TMPDIR is set, sort uses its value as the directory for temporary files instead of /tmp. The --temporary-directory (-T) option in turn overrides the environment variable.
The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own. In pre-POSIX versions of sort, global options affect only later key fields, so portable shell scripts should specify global options first.
Use this option only if there is no alternative; it is much slower than
--numeric-sort (-n) and it can lose information when
converting to floating point.
Comparison is exact; there is no rounding error.
Neither a leading ‘+’ nor exponential notation is recognized.
To compare such strings numerically, use the
--general-numeric-sort (-g) option.
If multiple random sort fields are specified, the same random hash function is used for all fields. To use different random hash functions for different fields, you can invoke sort more than once.
The choice of hash function is affected by the --random-source option.
Other options are:
With no arguments, prog must compress standard input to standard output, and when given the -d option it must decompress standard input to standard output.
Terminate with an error if prog exits with nonzero status.
White space and the backslash character should not appear in prog; they are reserved for future use.
Each pos has the form ‘f[.c][opts]’, where f is the number of the field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If ‘.c’ is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options, allowing individual keys to be sorted according to different rules; see below for details. Keys can span multiple fields.
Example: To sort on the second field, use --key=2,2
(-k 2,2). See below for more notes on keys and more examples.
See also the --debug option to help determine the part
of the line being used in the sort.
When sort has to merge more than nmerge inputs, it merges them in groups of nmerge, saving the result in a temporary file, which is then used as an input in a subsequent merge.
A large value of nmerge may improve merge performance and decrease temporary storage utilization at the expense of increased memory usage and I/O. Conversely a small value of nmerge may reduce memory requirements and I/O at the expense of temporary storage consumption and merge performance.
The value of nmerge must be at least 2. The default value is currently 16, but this is implementation-dependent and may change in the future.
The value of nmerge may be bounded by a resource limit for open
file descriptors. The commands ‘ulimit -n’ or ‘getconf
OPEN_MAX’ may display limits for your systems; these limits may be
modified further if your program already has some files open, or if
the operating system has other limits on the number of open files. If
the value of nmerge exceeds the resource limit, sort
silently uses a smaller value.
sort -o F F
and cat F | sort -o F
.
However, sort with --merge (-m) can open
the output file before reading all input, so a command like cat
F | sort -m -o F - G
is not safe as sort might start
writing F before cat is done reading it.
On newer systems, -o cannot appear after an input file if
POSIXLY_CORRECT is set, e.g., ‘sort F -o F’. Portable
scripts should specify -o output-file before any input
files.
This option can improve the performance of sort by causing it
to start with a larger or smaller sort buffer than the default.
However, this option affects only the initial buffer size. The buffer
grows beyond size if sort encounters input lines larger
than size.
That is, given the input line ‘ foo bar’, sort breaks it into fields ‘ foo’ and ‘ bar’. The field separator is not considered to be part of either the field preceding or the field following, so with ‘sort -t " "’ the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as -k 2, or fields consisting of a range, as -k 2,3, retain the field separators present between the endpoints of the range.
To specify ASCII nul as the field separator,
use the two-character string ‘\0’, e.g., ‘sort -t '\0'’.
This option also disables the default last-resort comparison.
The commands sort -u
and sort | uniq
are equivalent, but
this equivalence does not extend to arbitrary sort options.
For example, sort -n -u
inspects only the value of the initial
numeric string when checking for uniqueness, whereas sort -n |
uniq
inspects the entire line. See uniq invocation.
Historical (BSD and System V) implementations of sort have differed in their interpretation of some options, particularly -b, -f, and -n. gnu sort follows the POSIX behavior, which is usually (but not always!) like the System V behavior. According to POSIX, -n no longer implies -b. For consistency, -M has been changed in the same way. This may affect the meaning of character positions in field specifications in obscure cases. The only fix is to add an explicit -b.
A position in a sort field specified with -k may have any of the option letters ‘MbdfghinRrV’ appended to it, in which case no global ordering options are inherited by that particular field. The -b option may be independently attached to either or both of the start and end positions of a field specification, and if it is inherited from the global options it will be attached to both. If input lines can contain leading or adjacent blanks and -t is not used, then -k is typically combined with -b or an option that implicitly ignores leading blanks (‘Mghn’) as otherwise the varying numbers of leading blanks in fields can cause confusing results.
If the start position in a sort field specifier falls after the end of the line or after the end field, the field is empty. If the -b option was specified, the ‘.c’ part of a field specification is counted from the first nonblank character of the field.
On older systems, sort supports an obsolete origin-zero syntax ‘+pos1 [-pos2]’ for specifying sort keys. The obsolete sequence ‘sort +a.x -b.y’ is equivalent to ‘sort -k a+1.x+1,b’ if y is ‘0’ or absent, otherwise it is equivalent to ‘sort -k a+1.x+1,b+1.y’.
This obsolete behavior can be enabled or disabled with the _POSIX2_VERSION environment variable (see Standards conformance); it can also be enabled when POSIXLY_CORRECT is not set by using the obsolete syntax with ‘-pos2’ present.
Scripts intended for use on standard hosts should avoid obsolete syntax and should use -k instead. For example, avoid ‘sort +2’, since it might be interpreted as either ‘sort ./+2’ or ‘sort -k 3’. If your script must also run on hosts that support only the obsolete syntax, it can use a test like ‘if sort -k 1 </dev/null >/dev/null 2>&1; then ...’ to decide which syntax to use.
Here are some examples to illustrate various combinations of options.
sort -n -r
sort --parallel=4 -S 10M
sort -k 3b
sort -t : -k 2,2n -k 5.3,5.4
Note that if you had written -k 2n instead of -k 2,2n sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.
Also note that the ‘n’ modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify -k 2n,2 or -k 2n,2n. All modifiers except ‘b’ apply to the associated field, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
sort -t : -k 5b,5 -k 3,3n /etc/passwd sort -t : -n -k 5b,5 -k 3,3 /etc/passwd sort -t : -b -k 5,5 -k 3,3n /etc/passwd
These three commands have equivalent effect. The first specifies that the first key's start position ignores leading blanks and the second key is sorted numerically. The other two commands rely on global options being inherited by sort keys that lack modifiers. The inheritance works in this case because -k 5b,5b and -k 5b,5 are equivalent, as the location of a field-end lacking a ‘.c’ character position is not affected by whether initial blanks are skipped.
4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1 211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2
Fields are separated by exactly one space. Sort IPv4 addresses lexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201 because 61 is less than 129.
sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log | sort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n
This example cannot be done with a single sort invocation, since IPv4 address components are separated by ‘.’ while dates come just after a space. So it is broken down into two invocations of sort: the first sorts by time stamp and the second by IPv4 address. The time stamp is sorted by year, then month, then day, and finally by hour-minute-second field, using -k to isolate each field. Except for hour-minute-second there's no need to specify the end of each key field, since the ‘n’ and ‘M’ modifiers sort based on leading prefixes that cannot cross field boundaries. The IPv4 addresses are sorted lexicographically. The second sort uses ‘-s’ so that ties in the primary key are broken by the secondary key; the first sort uses ‘-s’ so that the combination of the two sorts is stable.
find src -type f -print0 | sort -z -f | xargs -0 etags --append
The use of -print0, -z, and -0 in this case means that file names that contain blanks or other special characters are not broken up by the sort operation.
awk '{print length, $0}' /etc/passwd | sort -n | cut -f2- -d' '
In general this technique can be used to sort data that the sort command does not support, or is inefficient at, sorting directly.
ls */* | sort -t / -k 1,1R -k 2,2
[1] If you
use a non-POSIX locale (e.g., by setting LC_ALL
to ‘en_US’), then sort may produce output that is sorted
differently than you're accustomed to. In that case, set the LC_ALL
environment variable to ‘C’. Note that setting only LC_COLLATE
has two problems. First, it is ineffective if LC_ALL is also set.
Second, it has undefined behavior if LC_CTYPE (or LANG, if
LC_CTYPE is unset) is set to an incompatible value. For example,
you get undefined behavior if LC_CTYPE is ja_JP.PCK
but
LC_COLLATE is en_US.UTF-8
.