Hello Community,
it seems reasonable to improve the search function wrt. Unicode equivalence, what is traditionally completely neglected.
PHP's intl extension offers grapheme_strpos() which could be used instead of the current strpos(), if available (otherwise we'd had to fall back to strpos() anyway).
We should furthermore consider to use a case-insensitive comparision, i.e. grapheme_stripos(), and otherwise fall back to the current algorithm that uses utf8_strtolower() and strpos(). Especially this step might provide a performance improvement and better results.
Christoph
Search function and Unicode equivalence
Search function and Unicode equivalence
Christoph M. Becker – Plugins for CMSimple_XH
Re: Search function and Unicode equivalence
As I found out grapheme_strpos() doesn't cater for Unicode equivalence; actually, it only reports the position of the needle within the haystack in Unicode code points, what is not helpful for our purpose.cmb wrote:it seems reasonable to improve the search function wrt. Unicode equivalence, what is traditionally completely neglected.
PHP's intl extension offers grapheme_strpos() which could be used instead of the current strpos(), if available (otherwise we'd had to fall back to strpos() anyway).
However, there is Normalizer::normalize() which is also part of the intl extension. I have used this instead.
Furthermore I have added utf8_stripos() to Utf8_XH, which tries to use mb_stripos(), and falls back to utf8_strtolower() and strpos(). I have used this new function for the search.
(r1349-r1352)
Christoph M. Becker – Plugins for CMSimple_XH