Sep 032012
Saving encoded as UTF-8 without BOM in notepad++

i nearly went crazy the other day when i tried to write a simple php script that would read a json-formatted database file into an array. it simply wouldn’t work and the worst thing was that it had been working shortly before that. but all of a sudden, the perfect and well-formed json file which had been nicely read into an array using the json_decode() function of php simply would only give back NULL as a result.

after hours and days of meticulously checking the json file again and again, and the function as well, searching the internet forums up and down to no avail i found the bug. it was so annoying that i need to help anyone stumbling across the same problem by writing it down here.
i have saved the json file (utf-8 encoded, which is important for the json_decode function of php to work!) in different editors: in notepad++ on windows and gedit on linux (and i had done so before, i.e., before the problem first emerged). as it turns out, for some reason that was neither retraceable nor intelligible to me, at some point the file was saved encoded as utf-8 with a byte order mark (BOM).
don’t ask me why that BOM is needed and what it is exactly – i am sure there are good reasons for the BOM to be included in documents – but it was a) not readily apparent that such a mark was included in the file (fucker’s invisible!), nor was it clear that and why it would jack up the json_decode() function in php. i finally realized something was wrong with the file when i saw that – contrary to the file itself which showed no problems in the editor – the content of the json file, when read with the file_get_contents() function (without transforming contents into an array) and print_r()d into the html, had this little mark at the beginning:

this character string is in fact the BOM and you need to get rid of it using an editor that offers a flexible option to chose encoding, so that json_decode() does what you want. to fix this bugger in notepad++, simply chose ‘Encode in UTF-8 without BOM’:

Saving encoded as UTF-8 without BOM in notepad++

Saving encoded as UTF-8 without BOM in notepad++

and magically, the damn thing works again.

lo and behold, of course i did find a similar comment and alert to the problem in one of the comments in the php manual at where i did not look far enough – simply because i did not look for “BOM” as i didn’t know yet: in this comment, another solution for the problem is offered that does not necessitate caring too much for the json file itself, but rather prepare the contents before it gets fed to json_decode(). of course, that may be considered more elegant (or not, as you need more convoluted code in the script), but it definitely tempts to take attention away from coding proper json files and instead rely on some code later fixing the problem if it exists. so i do not recommend it – why not avoid the problem altogether instead of fixing it later?

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.