encoding of text files
Roger Oberholtzer
roger
Tue May 24 07:08:27 PDT 2005
On Tue, 2005-05-24 at 13:28, Jorge Almeida wrote:
> How can I find out what encoding is used in a text file?
> A command line utility (or, say, a Perl module) would be great, since I
> need to use it in a script.
That is tricky. The problem is that a text file may contain the same
numeric value, but it will look different based on the encoding.
However, there is nothing in the text file that tells this. MIME, for
example, will tell this as part of the MIME header. But it is the mail
program that sets this - and NOT based on the file itself. The most I
have ever seen is that a program can tell you if there is any encoding
other than traditional English ASCII. But which encoding is present is
not really discernible from the file all by itself.
If the file is on a mounted volume on Linux, you can check the mount
options to see which encoding the mount used. This allows proper display
of directory and file names. You MAYBE (a big one) could assume that the
files match the encoding of the volume they are mounted on.
If the files are just coming from some source out there, you simply lack
enough info.
Perhaps look for a few key words in a likely language, and then use the
common encoding for that language. For example, in Sweden there is a 99%
chance a file is encoded in ISO8859-1. So if you find lyric discussions
of meatballs (look for k?ttbullar, possibly Mammas) in the file...
> Thanks,
>
> Jorge Almeida
> _______________________________________________
> Linux-users mailing list ( Linux-users at linux-sxs.org )
> Unsub/Password/Etc: http://mail.linux-sxs.org/cgi-bin/mailman/listinfo/linux-users
>
> Need to chat further on this subject? Check out #linux-users on irc.linux-sxs.org !
+????????????????????????????+???????????????????????????????+
? Roger Oberholtzer ? E-mail: roger at opq.se ?
? OPQ Systems AB ? WWW: http://www.opq.se/ ?
? Kapellgr?nd 7 ? ?
? P. O. Box 4205 ? Phone: Int + 46 8 314223 ?
? 102 65 Stockholm ? Mobile: Int + 46 733 621657 ?
? Sweden ? Fax: Int + 46 8 314223 ?
+????????????????????????????+???????????????????????????????+
More information about the Linux-users
mailing list