Previous: Database Formats, Up: Databases


4.3 Newline Handling

Within the database, file names are terminated with a null character. This is the case for both the old and the new format.

When the new database format is being used, the compression technique used to generate the database though relies on the ability to sort the list of files before they are presented to frcode.

If the system's sort command allows its input list of files to be separated with null characters via the ‘-z’ option, this option is used and therefore updatedb and locate will both correctly handle file names containing newlines. If the sort command lacks support for this, the list of files is delimited with the newline character, meaning that parts of file names containing newlines will be incorrectly sorted. This can result in both incorrect matches and incorrect failures to match.

On the other hand, if you are using the old database format, file names with embedded newlines are not correctly handled. There is no technical limitation which enforces this, it's just that the bigram program has not been updated to support lists of file names separated by nulls.

So, if you are using the new database format (this is the default) and your system uses GNU sort, newlines will be correctly handled at all times. Otherwise, newlines may not be correctly handled.