Hello Community,
somebody (svasti?) noted in the roadmap regarding this issue:
Seems some additional clarification is necessary
Some general explanations about the BOM are given in the
XH wiki. I'll try to elaborate on this specific issue, getting more into the details, what seems to be appropriate for this forum (OD)--but I'll still try to keep it simple--the exact details can be found in the
HTTP specs.
- The BOM of an UTF-8 encoded file is actually a sequence of 3 bytes at its beginning: "\xEF\xBB\xBF". These bytes are typically not shown by a text editor, but one can see them in a hex editor.
- If a PHP file is processed (it doesn't matter if it's browsed directly, or included from another file) everything outside of <?php ... ?> is sent to the client (i.e. browser) directly [1] as part of the body of the HTTP response.
- As soon as only one octet (i.e. byte) of the body of the HTTP response was sent, no further HTTP response headers can be set via header(); they will simply be ignored.
- The HTTP response headers contain vital information for the client (i.e. browser), besides others the cookies.
Now consider that config.php has a BOM (this can easily be reproduced by saving config.php with Windows Notepad as UTF-8). When config.php is included by cms.php, the three bytes which consitute the BOM are sent to the browser as part of the HTTP response body. When after the inclusion of config.php the login.php gets included it sets the appropriate headers for the login credentials as cookies. But these headers can't be sent, as the HTTP response body already has "started". IOW: when config.php contains a BOM, successful login is impossible.
There are three possibilities to handle this:
- simply ignore it, because no CMSimple_XH PHP file has to have a BOM (this was done before CMSimple_XH 1.5.4)
- send a message to the client, that some headers couldn't be sent (current solution)
- filter out the BOM, so that it's never sent to the client in the first place
(1) is a simple, fast and pragmatic solution, quite suitable for CM
Simple: just assume everything is alright, and fail otherwise. (2) is still quite simple and fast, but gives at least an explicit hint that something is wrong (even if the message might not be very helpful for end-users, though). (3) will effectively ignore any BOM, which is very convenient, but it has a price: the output buffering has to be started very early in cms.php. This will result in requests to ?download=... to be buffered, which will increase the memory footprint of these request (it might not be possible to deliver very large files, as the memory_limit might be exceeded). And for all request, this will mean, that the complete HTTP response body has to be searched for BOMs (and removing them). Particularly the latter
might be a measurable performance penalty (this has to be measured and confirmed, though).
Due to the lack of any representative benchmarks I waver between (2) and (3), but I tend to (2) with the reason, that slowing down the system because of potentially erroneous file encodings,
seems not very CM
Simple-like.
And to be honest: I don't like the current output buffering at all. As it is now, it's not necessary at all. And I'm not convinced that the possible use cases could pay off, though this have to be measured--not guessed!
[1] The only exception is a single line break directly after the closing ?>
Christoph