XH 1.5.4 utf8-plugin not compatible with calendar

A place to report and discuss bugs - please mention CMSimple-version, server, platform and browser version
Post Reply
svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

XH 1.5.4 utf8-plugin not compatible with calendar

Post by svasti » Tue Oct 09, 2012 7:16 pm

Hello all,

received an email from Tata, where he says that when starting to enter text with an accented character into the info-text field of calendar 1.4.2, the result is:
blank page wrote:Malformed UTF-8 detected!
Hm, that's new in CMSimple_XH 1.5.4.

I tried to replicate the problem and found that any entry will trigger this Malformed UTF-8 detected!
So calendar1.4.2 cannot be used with 1.5.4 :cry: ?

svasti

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by cmb » Tue Oct 09, 2012 8:13 pm

Hi svasti,
svasti wrote:Hm, that's new in CMSimple_XH 1.5.4.
Yes. In the discussion about UTF-8, I've asked, if this check should be added to XH 1.5.4:
cmb wrote:In addition we might consider checking all GPC data for UTF-8 validity to avoid potential security risks by malformed input.
As nobody said otherwise I've added the check. But I've found a bug and reported this some days ago.

However, I was not able to reproduce the problems with Calendar. Even if the check will be removed from CMSimple_XH, I would be very interested what caused the error message as it might be caused by a bug in the UTF-8 library. So could you please do some debugging? The check happens in cmsimple/cms.php line 188ff. Just change it to:

Code: Select all

foreach (array('_GET', '_POST', '_COOKIE') as $temp => $i) {
    foreach ($$i as $j) {
        if (!utf8_is_valid($j)) {
            var_dump($i);
            var_dump($temp);
            var_dump($j);
            exit('Malformed UTF-8 detected!');
        }
    }
} 
and save the eventlist again. Then the array, the key and the value of the "malformed" UTF-8 should be reported.

Christoph

PS: I wouldn't be surprised, if the error message is caused by a cookie on the same domain, which is not encoded as UTF-8. So we should exclude at least the general check for cookies.
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by svasti » Tue Oct 09, 2012 9:11 pm

string(5) "_POST" int(1) array(3) { [1]=> string(7) "öm.pdf" [2]=> string(12) "test/öm.pdf" [3]=> string(30) "svasti.de/downloads/svasti.pdf" } Malformed UTF-8 detected!
Interesting, it complains about a pdf name. I used to check all kinds of strange pdf names as people use all kinds of names for their pdfs.

What now?
svasti

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by cmb » Tue Oct 09, 2012 9:34 pm

svasti wrote:it complains about a pdf name
No, actually I don't think so. This particular problem is apparently caused by the bug. Somewhat strangely it seems to be no problem under PHP 5.4 (I haven't checked other versions yet). Could you please test the fixed code from the other thread?
Christoph M. Becker – Plugins for CMSimple_XH

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by cmb » Tue Oct 09, 2012 11:01 pm

cmb wrote:Somewhat strangely it seems to be no problem under PHP 5.4 (I haven't checked other versions yet).
Now I have tested with PHP 5.2.17, and indeed "öm.pdf" is reported as being "malformed UTF-8". But that's caused by the bug that doesn't cater for POST arrays. What happens? The request contains $_POST['linkaddr'][1] == 'öm.pdf'. Then the routine calls utf8_is_valid($_POST['linkaddr']). Inside of utf8_is_valid() a loop runs for strlen($_POST['linkaddr']). In versions before PHP 5.3 this returns 5 (as the array is cast to 'Array' :shock:), so the loop runs five times and tries to access the respective character of the array, which obviously fails, and so utf8_is_valid() returns FALSE. Since PHP 5.3 strlen($_POST['linkaddr']) returns NULL, and everything is fine (well, actually it's not fine, as the input doesn't get checked, but the function at least doesn't return nonsense in this case).

Changing the routine to the suggested one solves the problem.

But anyway, having files with non ASCII characters in the name is not without problems on a webserver. E.g. try to upload öm.pdf with the standard filebrowser from a Windows PC (CP 1252) and have a look at the resulting filename in downloads/. A reasonable solution to this problem is transliteration (see the "Transliteration" section in the UTF-8 lib developer manual).
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by svasti » Wed Oct 10, 2012 8:43 am

Yes changing the checking routine in cms.php solves the problem -- not only of strange pdf names but also of input of accented characters, which is the real problem. Because alle the inputs are in array format.

So this is a real bug in CMSimple_XH 1.5.4. I hope 1.5.5 will come out soon.

svasti

P.S. Interesting that uploading file with non-ASCII names with standard filebrowser mangles the name. I used to upload these stange files via ftp, and calendar could link to them correctly. Not that I would encourage such file names, but pdfs or doc-files often have non ASCII-filenames, so I was curious to see, if Calendar could link to them und Calendar even does rawurlencode to the link to these names. The todo list of Calendar is getting longer.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: XH 1.5.4 utf8-plugin not compatible with calendar

Post by cmb » Wed Oct 10, 2012 10:39 am

svasti wrote:Interesting that uploading file with non-ASCII names with standard filebrowser mangles the name.
Actually it's not the filebrowser that mangles the name, but the browser. It sends the name "öm.pdf" url-encoded as "%C3%B6m.pdf" to the server. This is url-decoded by PHP to a string with the hex value "C3B66D2E706466". When this string is written to the HTML output (which is UTF-8 encoded) the "correct" result is displayed: "öm.pdf". But when this string is given as filename to one of the filesystem routines, it is interpreted with regard to the filesystem encoding (which is usually CP 1252 on a German Windows OS), so the result is "öm.pdf".
svasti wrote:I used to upload these stange files via ftp, and calendar could link to them correctly.
Yes, I noticed this too. But it's quite strange as "öm.pdf" is linked as "%C3%B6m.pdf". I suppose actually the web server does the conversion from UTF-8 to the system charset, if the file couldn't be found. But that might well depend on the server's configuration. So IMO non ASCII characters in filenames should be avoided at all.
svasti wrote:So this is a real bug in CMSimple_XH 1.5.4. I hope 1.5.5 will come out soon.
ACK. The question is, which of the items on the roadmap for 1.5.5 should actually be addressed in this version.

Post Reply