UTF-8 search doesn't recognise danish characters SOLVED?

A place to report and discuss bugs - please mention CMSimple-version, server, platform and browser version
Post Reply
jerry
Posts: 177
Joined: Fri Jul 25, 2008 8:54 pm
Location: Denmark
Contact:

UTF-8 search doesn't recognise danish characters SOLVED?

Post by jerry » Tue Oct 19, 2010 11:19 am

Search function in CMSimple 1.2-utf8 doesn't find anything containing non latin characters (at least special danish charactes). Changing to iso-8859-1 makes search possible but it wasn't the idea with utf-8 encoding of CMSimple. Has anybody an idea how to solve it.
AdvancedSearch plugin has the same problem and right now is not utf-8 compatible.
jerry
Last edited by jerry on Wed Oct 20, 2010 2:58 pm, edited 1 time in total.
jerry/simplesolutions

leenm
Posts: 116
Joined: Wed Dec 09, 2009 12:33 pm
Location: Kloetinge, Netherlands
Contact:

Re: UTF-8 search doesn't recognise danish characters

Post by leenm » Tue Oct 19, 2010 7:58 pm

It has probably to do with the search routine. You can try changing line 17 in search.php, replace with: (Didn't test it, because I don't speak danish ;) )

Code: Select all

if (!hide($i)) if (@preg_match('/'.preg_quote($search, '/').'/i', (function_exists('html_entity_decode')?html_entity_decode($c[$i], ENT_QUOTES, 'UTF-8'):$c[$i])))$ta[] = $i;        
 
Reason:
http://php.net/manual/en/function.html-entity-decode.php wrote:The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion.
Please let us know if this works!

jerry
Posts: 177
Joined: Fri Jul 25, 2008 8:54 pm
Location: Denmark
Contact:

Re: UTF-8 search doesn't recognise danish characters

Post by jerry » Tue Oct 19, 2010 10:02 pm

Yes it's search routine that fails to match non latin characters and changing encoding til ISO-8859-1 doesn't help (except for the whole site). Matching non latin utf-8 encoded characters has allways been a problem, so has anybody an idea how to solve it. I spend some hours on trying without success and rigt now I have no more ideas.
jerry
jerry/simplesolutions

jerry
Posts: 177
Joined: Fri Jul 25, 2008 8:54 pm
Location: Denmark
Contact:

Re: UTF-8 search doesn't recognise danish characters. SOLVED?

Post by jerry » Wed Oct 20, 2010 2:50 pm

Some more testing
It helps to specify encoding to UTF-8 but the search of no latin characters becames as expected case specific. Ümlaut will not find ümlaut. So it’s necessary to add character conversion to make it work. Adding codepage as a variabel seems to be a solution to anybody using ISO-8859-1 or another encoding.

Code: Select all

if (!hide($i)) if (@preg_match('/'.preg_quote( mb_strtolower($search, $tx['meta']['codepage']), '/').'/i', (function_exists('html_entity_decode')?html_entity_decode( mb_strtolower($c[$i], $tx['meta']['codepage']), ENT_QUOTES, $tx['meta']['codepage']): mb_strtolower($c[$i], $tx['meta']['codepage']))))$ta[] = $i;   
I will try to implement similar solution in Advanced Search.
jerry
jerry/simplesolutions

Post Reply