Displaying more with less ========================== (updated for version 1.55) by Wolfgang Friebel (Wolfgang.Friebel AT desy.de) Introduction ------------ Who does not know the situation: You just downloaded a file from the Internet that is supposed to contain a promising program. But before extracting and installing it you wanted just to look into the README or consult the accompanying man page. Yes, you know how to use tar and zip commands, but how the heck do you extract a single file from an RPM archive. Yes, there are tools like Midnight Commander and such that do the job quite well, at least when running Linux. Or do you always remember the options for the man command when you want to display a man page not listed in the MANPATH? For all these problems there are solutions of course, but in some cases that is asking too much of casual UNIX users. To browse files under UNIX you can use the excellent viewer less [1], the better alternative to "more". By making use of the environment variable LESSOPEN, less can be enhanced by external filters to become even more powerful. Most Linux distributions come already preconfigured with a filter "lesspipe.sh" that covers the most common situations. I would like to present here an input filter for less that is understanding a lot of the more common file formats. It is easily extendable for new formats to be included. Description ----------- The input filter which is also called "lesspipe.sh" is written in a ksh compatible language (ksh, bash, zsh) as one of these is nearly always installed on UNIX systems and uses comparably few resources. Otherwise an implementation in perl for example would have been somewhat simpler to code. The input filter lesspipe.sh is based on two main ideas. The recognition of the file format is not based on the file suffix. This method from the DOS world is error prone and keeping the suffix list up to date is a tedious job. UNIX comes with the "file" command [2] that recognizes lots of formats. Up to date file descriptions are included in the tarball, maintaining a list of file formats is therefore only a matter of obtaining a current version of the "file" package. The second idea is to being able to call lesspipe.sh with a hierarchy of file names and to pull out finally the file at the bottom of the hierarchy. This would allow to look at individual files contained in an archive which itself could be part of a still bigger archive. As lesspipe.sh is accepting only a single argument, a hierarchical list of file names has to be separated by a nonblank character. As the colon is rarely found in file names, it has been chosen as the separator character. At each stage in extracting files from such a hierarchy the file type is determined. This guarantees a correct processing and display at each stage of the filtering. Examples -------- To give an example I show, how one could display the man page "file.man" found in the RPM source archive file-xxx.spm. The less command enhanced with the lesspipe.sh filter less file-3.27-43.i386.spm yields the following output ... SuSE series: a -rw-r--r-- 1 root root 12953 Feb 3 11:45 file-3.27.dif -rw-r--r-- 1 root root 123541 Jul 6 1999 file-3.27.tar.gz -rw-r--r-- 1 root root 3398 Mar 25 07:31 file.spec then the command less file-3.27-43.i386.spm:file-3.27.tar.gz produces the output ... -rw-rw-r-- christos/christos 8740 1999-02-14 18:16 file-3.27/file.c -rw-rw-r-- christos/christos 4886 1999-02-14 18:16 file-3.27/file.h -rw-rw-r-- christos/christos 13428 1999-02-14 18:16 file-3.27/file.man ... The desired man page can finally be viewed with less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man The subcomponents of the argument to less were easily obtained by cut and paste using information contained in the previous lines of output. If you would have liked to display the nroff sources instead, appending another colon at the end of the argument would have done the job: less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.man: If the man page was even compressed (e.g. as file.man.gz) it would have been uncompressed anyway. To also disallow uncompressing the source file.man.gz a second colon would have to be appended to the argument. Even extracting single files from an archive is possible, like with less file-3.27-43.i386.spm:file-3.27.tar.gz:file-3.27/file.c > file.c As less is not passing all bytes to STDOUT (e.g. it is suppressing binary 0) it is recommended to invoke lesspipe.sh directly: lesspipe.sh file-3.27-43.i386.spm:file-3.27.tar.gz:: > file-3.27.tar.gz Here the two colons after file-3.27.tar.gz are required to suppress the unzipping of the resulting file and to extract the tar file instead of interpreting it. Features -------- The script is able to extract files up to a depth of 6 where applying a decompression algorithm counts as a separate level. In a few rare cases the file command does not recognize the correct format (especially with nroff). In such cases the filtering can be suppressed by a trailing colon on the file name. The script lesspipe.sh is constantly enhanced thanks to suggestions from users. Among the additions to lesspipe.sh is the code to browse the ASCII contents of Word or Openoffice files, to show characteristics of mp3 files or to decode MacOS X formats. To activate lesspipe.sh you have to define the environment variable LESSOPEN in the following way: LESSOPEN="|lesspipe.sh %s"; export LESSOPEN (sh like shells) setenv LESSOPEN "|lesspipe.sh %s" (csh, tcsh) If the wrong lesspipe.sh is in the UNIX search path or if lesspipe.sh is not in your search path, then the full path to lesspipe.sh should be given in the above commands. Syntax highlighting ------------------- Experimental support for syntax highlighting was added through a perl script 'code2color' which is derived from code2html [5]. That script comes with coloring support for the languages ada, asm, awk, c, c++, groff, html, xml, java, javascript, lisp, m4, make, pascal, patch, perl, povray, python, ruby shellscript and sql. ATTENTION: Syntax highlighting is only activated if the environment variable LESS is existing and contains the option -R or -r or less is called with one of these options. This guarantees, that instead of literal escape sequences colors are displayed. The detection of the -r/-R presence at runtime is rather dependent on the operating system and may not work in all cases. Putting the option in the LESS environment variable is guaranteed to work. By installing the perl module Proc::ProcessTable the OS dependence can be reduced as well. As syntax highlighting is rather resource intense it can be switched off by appending a colon after the file name if the output was colorful. If the wrong language was chosen for syntax highlighting then another one can be forced by appending a colon and a suffix to the file name as follows (assuming this is a file with perl syntax): less config_file:.pl That works as well to force the call of code2color for a given language. The following suffixes are recognized: .ada .asm .inc .awk .c .h .cpp .cxx .groff .html .php .xml .java .js .lsp .m4 Makefile .pas .patch .diff .pm .pl .pod .pov .py .rb .sh .sql Syntax highlighting is regarded by me as an experimental feature. Much better syntax highlighting is obtained using the less emulation of vim: The editor vim comes with a file less.sh, in my case located in /usr/share/vim/vim71/macros. Assuming this file location do then define a function lessc (bash, zsh, ksh users) lessc () { /usr/share/vim/vim71/macros/less.sh "$@"} or an alias lessc (csh, tcsh users) alias lessc /usr/share/vim/vim71/macros/less.sh and use "lessc filename" to view the colored file contents. Supported file formats ---------------------- Currently lesspipe.sh [3],[4] supports the following formats: Compressed files ================ gzip, pack and compress uncompressed with gzip -c -d bzip2 uncompressed with bzip2 -d zip uncompressed with unzip -lv (extracting with unzip -avp) rar uncompressed with unrar lv (extract with unrar p -idq) 7-zip uncompressed with 7za l (extract with 7za e) lzma uncompressed with lzma -c -d Other file formats ================== tar using GNU tar tvf (extracting files with tar 0xf) nroff using groff -s -p -t -e -Tascii -mandoc ar library using ar vt (extracting with ar p) nm shared lib using nm executable using strings directory using ls -lAL rpm using rpm -qiv -p and rpm2cpio | cpio -i -tv (extracting with rpm2cpio and GNU cpio) Debian using ar, gzip and tar, optionally also dpkg html using html2text -style pretty or links -dump or lynx -dump Word using antiword pdf using pdftotext rtf using unrtf dvi using dvi2tty ps using pstotext or ps2ascii and gs mp3 using id3v2 or mp3info iso images using isoinfo MacOSX archives using lsbom and an updated /etc/magic file MacOSX bom files using lsbom and an updated /etc/magic file MacOSX plist files using plutil Microsoft cabinet files using cabextract perl storable files using perl and a recent /etc/magic file gpg encrypted files using gpg ISO8859-1,UTF-8,UTF-16 using iconv image(gif, png) files using identify Openoffice.org 1.x and Opendocument (OASIS) text documents using o3read, html2text and an up to date magic file' from 'file' version 4.17 or later File formats currently not supported ==================================== (code contributed but commented out) perl pod and pm files (the contained documentation would be displayed) This is mostly not wanted and better achieved using perldoc. jpeg and pbm graphics files to be displayed in ascii art. The ascii art library works with overprinting that does not work properly within less. Therefore the resulting quality of the converted picture is not satisfactory. Display of video streams using mplayer with -aadriver (again ascii art) is considered abuse of less and also commented out. looking at contents of DOS formatted disks by accessing the proper device file ------------- [1] http://www.greenwoodsoftware.com/less/ [2] ftp://ftp.astron.com/pub/file [3] http://www.ifh.de/~friebel/unix/lesspipe.html [4] ftp://ftp.ifh.de/pub/unix/utility/lesspipe.tar.gz [5] http://www.palfrader.org/code2html/