Search function and Unicode equivalence

Discussions and requests related to new CMSimple features, plugins, templates etc. and how to develop.
Please don't ask for support at this forums!
Post Reply
cmb
Posts: 14227
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Search function and Unicode equivalence

Post by cmb » Tue May 13, 2014 12:52 pm

Hello Community,

it seems reasonable to improve the search function wrt. Unicode equivalence, what is traditionally completely neglected.

PHP's intl extension offers grapheme_strpos() which could be used instead of the current strpos(), if available (otherwise we'd had to fall back to strpos() anyway).

We should furthermore consider to use a case-insensitive comparision, i.e. grapheme_stripos(), and otherwise fall back to the current algorithm that uses utf8_strtolower() and strpos(). Especially this step might provide a performance improvement and better results.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

manu
Posts: 1117
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: Search function and Unicode equivalence

Post by manu » Tue May 13, 2014 1:25 pm

+1

cmb
Posts: 14227
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Search function and Unicode equivalence

Post by cmb » Mon Aug 18, 2014 6:46 pm

cmb wrote:it seems reasonable to improve the search function wrt. Unicode equivalence, what is traditionally completely neglected.

PHP's intl extension offers grapheme_strpos() which could be used instead of the current strpos(), if available (otherwise we'd had to fall back to strpos() anyway).
As I found out grapheme_strpos() doesn't cater for Unicode equivalence; actually, it only reports the position of the needle within the haystack in Unicode code points, what is not helpful for our purpose.

However, there is Normalizer::normalize() which is also part of the intl extension. I have used this instead.

Furthermore I have added utf8_stripos() to Utf8_XH, which tries to use mb_stripos(), and falls back to utf8_strtolower() and strpos(). I have used this new function for the search.

(r1349-r1352)
Christoph M. Becker – Plugins for CMSimple_XH

Post Reply