Search function in CMSimple 1.2-utf8 doesn't find anything containing non latin characters (at least special danish charactes). Changing to iso-8859-1 makes search possible but it wasn't the idea with utf-8 encoding of CMSimple. Has anybody an idea how to solve it.
AdvancedSearch plugin has the same problem and right now is not utf-8 compatible.
jerry
UTF-8 search doesn't recognise danish characters SOLVED?
UTF-8 search doesn't recognise danish characters SOLVED?
Last edited by jerry on Wed Oct 20, 2010 2:58 pm, edited 1 time in total.
jerry/simplesolutions
Re: UTF-8 search doesn't recognise danish characters
It has probably to do with the search routine. You can try changing line 17 in search.php, replace with: (Didn't test it, because I don't speak danish )
Reason:
Code: Select all
if (!hide($i)) if (@preg_match('/'.preg_quote($search, '/').'/i', (function_exists('html_entity_decode')?html_entity_decode($c[$i], ENT_QUOTES, 'UTF-8'):$c[$i])))$ta[] = $i;
Please let us know if this works!http://php.net/manual/en/function.html-entity-decode.php wrote:The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion.
Re: UTF-8 search doesn't recognise danish characters
Yes it's search routine that fails to match non latin characters and changing encoding til ISO-8859-1 doesn't help (except for the whole site). Matching non latin utf-8 encoded characters has allways been a problem, so has anybody an idea how to solve it. I spend some hours on trying without success and rigt now I have no more ideas.
jerry
jerry
jerry/simplesolutions
Re: UTF-8 search doesn't recognise danish characters. SOLVED?
Some more testing
It helps to specify encoding to UTF-8 but the search of no latin characters becames as expected case specific. Ümlaut will not find ümlaut. So it’s necessary to add character conversion to make it work. Adding codepage as a variabel seems to be a solution to anybody using ISO-8859-1 or another encoding.
I will try to implement similar solution in Advanced Search.
jerry
It helps to specify encoding to UTF-8 but the search of no latin characters becames as expected case specific. Ümlaut will not find ümlaut. So it’s necessary to add character conversion to make it work. Adding codepage as a variabel seems to be a solution to anybody using ISO-8859-1 or another encoding.
Code: Select all
if (!hide($i)) if (@preg_match('/'.preg_quote( mb_strtolower($search, $tx['meta']['codepage']), '/').'/i', (function_exists('html_entity_decode')?html_entity_decode( mb_strtolower($c[$i], $tx['meta']['codepage']), ENT_QUOTES, $tx['meta']['codepage']): mb_strtolower($c[$i], $tx['meta']['codepage']))))$ta[] = $i;
jerry
jerry/simplesolutions