Tip of the Trade: GNU Recode

Tuesday Dec 26th 2006 by Carla Schroder
Share:

When it comes to conquering character encoding chaos, GNU Recode is a simple key to unicode conformity.

In the beginning, there was C and C++, as well as hosts of other computer programming languages. All are based on ASCII (American Standard Code for Information Interchange), which, as the name implies, is based on the English alphabet. This wouldn't be an issue except there are many humans in the world, and they don't all use the English alphabet.

So along came Unicode to the rescue. Unicode provides a framework for all of the alphabets of the world to be represented on computers. UTF-8 is the most popular Unicode implementation because it preserves backward compatibility with ASCII. Which is all fun to know, but what good is that when you're looking at piles of computer files that must be converted from ISO-8859-1 (Latin-1, Western European) into whatever encoding you prefer? Naturally, there are a number of utilities just for this task.

GNU Recode supports more than 150 character sets and converts just about anything to anything. For example, there are users of legacy Linux systems still running ISO-8859-1. GNU Recode converts these to nice modern UTF-8, like this:

$ recode UTF-8 recode-test.txt
Check out the GNU Recode Manual for instructions.

That's fast and easy enough, but one job remains — converting the filename. The convmv is just the tool for this. This example converts all the ISO-8859-1 filenames in the files/ directory to UTF-8:

$ convmv -f iso-8859-1 -t utf8 --notest  files/
convmv run without the --notest option does a dry-run without changing anything, which is probably a wise first step.

Maybe you have a file that you don't know what the encoding is. Upload the file to this online tool, and it will tell you. You can even do file conversions here.

Resources

The subject of character encoding is huge and bewildering, especially for us dinosaurs from the typewriter era, and when you hit a typewriter key it came out the same way — every single time. Wikipedia has a number of excellent introductory articles:

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved