Tip of the Trade: Fixing Filename Encodings

Monday Oct 15th 2007 by Carla Schroder

Unicode is the accepted computer language standard, but inconsistencies and messes remain. To help with the cleanup, Linux and Unix users can use convmv, for converting the encodings of filenames, and iconv, for converting the contents of files.

In the beginning was ANSI, which was later extended to ASCII, and that was the universal language of computers. But that did not encompass non-English languages, so dozens of incompatible extensions to ASCII were created to include other languages. This became a big mess, and none of the languages worked reliably. Then, one day, some brainiacs invented Unicode. Unicode aims to replace all of those incompatible, messy ASCII charsets with a single giant character set that assigns a unique code to each of the world's characters.

Discuss this article in the ServerWatch discussion forum

Unicode is still a work in progress, but it has been widely adopted and is now the accepted standard. However, we are still in transition, and there are often have funny little messes to cleanup, like archives of files in the old ASCII encodings. Linux and Unix users have two great little commands to fix this: convmv for converting the encodings of filenames, and iconv for converting the contents of files.

convmv, written in Perl, converts file and directory names into different character encodings. It converts only the filenames, not their contents. This example is a dry-run to illustrate what will happen if you convert all the filenames in the convertme directory:

$ convmv -f iso-8859-7 -t utf8  convertme/

By default, nothing gets changed, so when you're ready to do it for real, add the --notest option:

$ convmv -f iso-8859-7 -t utf8 --notest convertme/

Add -r to recurse through subdirectories.

iconv works pretty much the same way, except it operates on the contents of files; not the filenames:

$ iconv -f ISO-8859-7 -t UTF-8 convertme converted

convertme is the input file and converted is the new output file. If you do not specify an output file, the results are displayed on standard output. See man convmv and man iconv for complete command options.

Mobile Site | Full Site