Batch converting filenames from de- to pre-composed UTF-8

UTF-8 comes in two flavors in regards of encoding of many accented characters like ã. In its precomposed form, ã is encoded with its own code point, namely the hexadecimal value 0xC3A3. On some systems, macOS for one, ã is actually decomposed into its entities, i.e. the base character and the accent a ̃, and this decomposition is encoded which results in the hexadecimal value 0x00610303.

Recently, I transferred a lot of files from a HFS+ volume of my Mac to the UFS volume of the FreeBSD home server using the scp(1) command. The files are part of a web page to be served via the Apache web server. Now, all files having accented characters in their filenames would not be retrieved by Apache, but respective requests result in the 404 Not Found status code.

In order to solve the problem, I installed the file name conversion tool converters/convmv on my FreeBSD system, and then I used convmv(1) for batch converting the file names to its precomposed UTF-8 form:

# pkg install convmv
# cd /path/to/the/web/directory
# convmv -r -f utf8 -t utf8 --nfc --notest ./

Copyright © Dr. Rolf Jansen - 2018-02-26 10:26:14

Discussion on Twitter: 1082811681548525568