UTF-8 search doesn't recognise danish characters SOLVED?

jerry · Post by **jerry** » Tue Oct 19, 2010 11:19 am

Search function in CMSimple 1.2-utf8 doesn't find anything containing non latin characters (at least special danish charactes). Changing to iso-8859-1 makes search possible but it wasn't the idea with utf-8 encoding of CMSimple. Has anybody an idea how to solve it.
AdvancedSearch plugin has the same problem and right now is not utf-8 compatible.
jerry

leenm · Post by **leenm** » Tue Oct 19, 2010 7:58 pm

It has probably to do with the search routine. You can try changing line 17 in search.php, replace with: (Didn't test it, because I don't speak danish

)

Code: Select all

if (!hide($i)) if (@preg_match('/'.preg_quote($search, '/').'/i', (function_exists('html_entity_decode')?html_entity_decode($c[$i], ENT_QUOTES, 'UTF-8'):$c[$i])))$ta[] = $i;

Reason:

http://php.net/manual/en/function.html-entity-decode.php wrote:The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion.

Please let us know if this works!

jerry · Post by **jerry** » Tue Oct 19, 2010 10:02 pm

Yes it's search routine that fails to match non latin characters and changing encoding til ISO-8859-1 doesn't help (except for the whole site). Matching non latin utf-8 encoded characters has allways been a problem, so has anybody an idea how to solve it. I spend some hours on trying without success and rigt now I have no more ideas.
jerry

jerry · Post by **jerry** » Wed Oct 20, 2010 2:50 pm

Some more testing
It helps to specify encoding to UTF-8 but the search of no latin characters becames as expected case specific. Ümlaut will not find ümlaut. So it’s necessary to add character conversion to make it work. Adding codepage as a variabel seems to be a solution to anybody using ISO-8859-1 or another encoding.

Code: Select all

if (!hide($i)) if (@preg_match('/'.preg_quote( mb_strtolower($search, $tx['meta']['codepage']), '/').'/i', (function_exists('html_entity_decode')?html_entity_decode( mb_strtolower($c[$i], $tx['meta']['codepage']), ENT_QUOTES, $tx['meta']['codepage']): mb_strtolower($c[$i], $tx['meta']['codepage']))))$ta[] = $i;

I will try to implement similar solution in Advanced Search.
jerry

CMSimple_XH–Forum

UTF-8 search doesn't recognise danish characters SOLVED?

UTF-8 search doesn't recognise danish characters SOLVED?

Re: UTF-8 search doesn't recognise danish characters

Re: UTF-8 search doesn't recognise danish characters

Re: UTF-8 search doesn't recognise danish characters. SOLVED?