Quantcast
Channel: User HoldOffHunger - Stack Overflow
Viewing all articles
Browse latest Browse all 55

Answer by HoldOffHunger for How does the Linux command `file` recognize the encoding of my files?

$
0
0

TLDR: Magic File Doesn't Support UTF-8 BOM Markers

(and that's the main charset you need to care about)

The source code is on GitHub so anyone can search it. After doing a quick search, things like BOM, ef bb bf, and feffdo not appear at all. That means UTF-8, Byte-Order-Mark reading is not supported. Files made in other applications that use or preserve the BOM marker will all be returned as "charset=unknown" when using file.

In addition, none of the config files mentioned in the Magic File manpage are a part of magic file v. 4.17. In fact, /etc/magicfile/ doesn't exist at all, so I don't see any way in which I can configure it.

If you're stuck trying to get the ACTUAL charset encoding and magic file is all you have, you can determine if you have a UTF-8 file at the Linux CLI with:

hexdump -n 3 -C $path_to_filename

If the above returns the following sequence, ef bb bf, then you are 99% likely in possession of a BOM-marked UTF-8 file. This is not a 100% certainty, but it is far more useful than magic file, where it has no handling whatsoever for Byte Order Marks.


Viewing all articles
Browse latest Browse all 55

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>