XH 1.6.1 : Malformed UTF-8 detected!
XH 1.6.1 : Malformed UTF-8 detected!
Hello
I had this message from a customer who uses a french website under XH 1.6.1 (that i made, HTML5, correctly UTF-8 encoded)). I cannot reproduce this error but I know that it existed with XH 1.5.4. Do I have to look for the problem on the browser of the customer or a problem still exists on the last version?
Bob
I had this message from a customer who uses a french website under XH 1.6.1 (that i made, HTML5, correctly UTF-8 encoded)). I cannot reproduce this error but I know that it existed with XH 1.5.4. Do I have to look for the problem on the browser of the customer or a problem still exists on the last version?
Bob
Re: XH 1.6.1 : Malformed UTF-8 detected!
This message is triggered by the check of the input variables to be valid UTF-8. Now, that I rethink your problem, it seems the check (or at least the handling) is to strict. For instance, there could be problems with websites that were formerly encoded as ANSI, if someone has bookmarked a deep link to a page "Téléchargement". Encoded as ISO 8859-1 the link is ?T%E9l%E9chargement, what will result in said message under XH 1.6/1.6.1 (but not under XH < 1.6). I'm not sure whether there may be other cases where this message will show up, unless the user is having an old or even misconfigured browser.
Anyway, we have to reconsider the check and maybe drop it altogether. Actually, it is not necessary for security per se, but is rather meant to suppress security issues with some mostly old plugins which were not written for UTF-8 encoding resp. newer PHP versions.
As a quick workaround for your client's installation, just remove the following from cmsimple/cms.php (line 330ff):
Anyway, we have to reconsider the check and maybe drop it altogether. Actually, it is not necessary for security per se, but is rather meant to suppress security issues with some mostly old plugins which were not written for UTF-8 encoding resp. newer PHP versions.
As a quick workaround for your client's installation, just remove the following from cmsimple/cms.php (line 330ff):
Code: Select all
XH_checkValidUtf8(
array($_GET, $_POST, $_SERVER, array_keys($_GET), array_keys($_POST))
);
Christoph M. Becker – Plugins for CMSimple_XH
Re: XH 1.6.1 : Malformed UTF-8 detected!
Thank you for this tip Christoph.
I'm going to test with this solution. I verified the links in the content, i did not see strange encoding text and all texts and variables are in UTF-8 encoded... I verified with the php function mb_check_encoding() and I specify that only the customer sees this error (with 5 PC at home and many browsers, caches and cookies emptied).
I'm going to test with this solution. I verified the links in the content, i did not see strange encoding text and all texts and variables are in UTF-8 encoded... I verified with the php function mb_check_encoding() and I specify that only the customer sees this error (with 5 PC at home and many browsers, caches and cookies emptied).
Re: XH 1.6.1 : Malformed UTF-8 detected!
I had completely forgotten this issue. As it seems to be a bug, I've put it on the 1.6.2 roadmap. I suggest that we simply revert to the less restrictive XH 1.5.x check for now.
PS: cf. http://cmsimpleforum.com/viewtopic.php?f=29&t=7127
PS: cf. http://cmsimpleforum.com/viewtopic.php?f=29&t=7127
Last edited by cmb on Tue Apr 15, 2014 10:11 pm, edited 1 time in total.
Reason: added PS
Reason: added PS
Christoph M. Becker – Plugins for CMSimple_XH
Re: XH 1.6.1 : Malformed UTF-8 detected!
+1cmb wrote:I suggest that we simply revert to the less restrictive XH 1.5.x check for now.
Re: XH 1.6.1 : Malformed UTF-8 detected!
As the check seems reasonable, why not just omit the check of array_keys($_GET)?cmb wrote:I suggest that we simply revert to the less restrictive XH 1.5.x check for now.
Re: XH 1.6.1 : Malformed UTF-8 detected!
Might be the best option.manu wrote:As the check seems reasonable, why not just omit the check of array_keys($_GET)?
Christoph M. Becker – Plugins for CMSimple_XH
Re: XH 1.6.1 : Malformed UTF-8 detected!
FWIW: I've made some quick benchmark tests regarding the UTF-8 check, with the following command:
Results:
Code: Select all
ab -n 1000 -c 10 http://localhost/xh161e/
- Plain XH 1.6.1 (i.e. full checking):
Code: Select all
Time per request: 96.486 [ms] (mean)
- No checks:
Code: Select all
Time per request: 68.764 [ms] (mean)
- Checks as with XH 1.5.10:
Code: Select all
Time per request: 67.204 [ms] (mean)
- full checking, except $_SERVER:
Code: Select all
Time per request: 65.764 [ms] (mean)
Code: Select all
Time per request: 66.564 [ms] (mean)
Christoph M. Becker – Plugins for CMSimple_XH
Re: XH 1.6.1 : Malformed UTF-8 detected!
I had a closer look at the utf8_is_valid() vs. utf8_compliant() issue. The sources (plugins/utf8/utils/validation.php) point to a comment of the original author of the utf8 library:
Further investigation showed that the relevant behavior changed with PCRE 7.3 2007-08-28, what is documented in the PCRE changelog as item 15:
As this comment was made eight years ago, I double-checked that, and apparently, the behavior has changed in newer PCRE versions, so since PHP 4.4.9 and PHP 5.2.5 (standard builds) valid UTF-8 sequences are not regarded as valid UTF-8 by PCRE.PCRE regards five and six octet UTF-8 character sequences as valid (both in patterns and the subject string) but these are not supported in Unicode
Further investigation showed that the relevant behavior changed with PCRE 7.3 2007-08-28, what is documented in the PCRE changelog as item 15:
So I can safely update utf8_is_valid() to make use of the much faster check when an approriate PCRE version is installed.Updated the test for a valid UTF-8 string to conform to the later RFC 3629. This restricts code points to be within the range 0 to 0x10FFFF, excluding the "low surrogate" sequence 0xD800 to 0xDFFF. Previously, PCRE allowed the full range 0 to 0x7FFFFFFF, as defined by RFC 2279.
Considering the above: +1manu wrote:As the check seems reasonable, why not just omit the check of array_keys($_GET)?
Christoph M. Becker – Plugins for CMSimple_XH
Re: XH 1.6.1 : Malformed UTF-8 detected!
Done (r1300+r1301).
Christoph M. Becker – Plugins for CMSimple_XH