TLDR: Magic File Doesn't Support UTF-8 BOM Markers
(and that's the main charset you need to care about)
The source code is on GitHub so anyone can search it. After doing a quick search, things like BOM
, ef bb bf
, and feff
do not appear at all. That means UTF-8, Byte-Order-Mark reading is not supported. Files made in other applications that use or preserve the BOM marker will all be returned as "charset=unknown" when using file
.
In addition, none of the config files mentioned in the Magic File manpage are a part of magic file v. 4.17. In fact, /etc/magicfile/
doesn't exist at all, so I don't see any way in which I can configure it.
If you're stuck trying to get the ACTUAL charset encoding and magic file is all you have, you can determine if you have a UTF-8 file at the Linux CLI with:
hexdump -n 3 -C $path_to_filename
If the above returns the following sequence, ef bb bf
, then you are 99% likely in possession of a BOM-marked UTF-8 file. This is not a 100% certainty, but it is far more useful than magic file
, where it has no handling whatsoever for Byte Order Marks.