Page 1 of 1

Addon: Cleaner URLs

Posted: Sun Dec 02, 2012 2:28 pm
by cmb
Hello Community,

this is not about the so-called clean URLs, i.e. URLs without the ? at the beginning. If you want to have these you can try using the respective tip in the CMSimple Wiki. Good luck ;)

IMO such clean URLs are overvalued. The ? is no problem for search engines, and IMO only a minor issue for humans. What matters is the following:
  1. The URL shouldn't contain any URL encoded characters (such as Umlauts), as one cannot recognize or remember these easily: http://www.example.com/?Fahrvergn%FCgen. I'm well aware that modern browsers are capable of handling many special characters in the URL, but IMO it's better to avoid them generally.
  2. The underscore should be avoided, as many users are not familiar with this character and it is hard to spot if a link is underlined: http://www.example.com/?Hard_to_spot. phpBB handles this quite okay here, but see my signature for an underscore, that is really hard to spot .
  3. The slash should be used as delimiter of the page headings of different levels, which is quite common.
  4. The URL shouldn't contain mixed case characters, as these are hard to remember. It's quite common to have lower case letters only.
(2) and (4) can be solved by slightly modifying function uenc() in cmsimple/cms.php. (3) can be solved by changing the configuration option uri_seperator.

(1) can be solved by using urichar_org/new. But that requires to cater for all potential special chars in headings (what's quite impossible) or to adjust urichar_org/new whenever a new special character is used in a heading. Note that it's not possible to replace a comma with urichar_org/new.

So I've thought about an automatic solution, which works the following way:
  1. replace all HTML entities
  2. apply some kind of transliteration (e.g. é -> e, ä -> ae)
  3. replace all characters that would be URL encoded with a minus sign
  4. replace all occurences of more than one consecutive minus sign with a single minus sign
  5. convert the characters to lower case
The result is the following replacement of function uenc in cmsimple/cms.php (please note, that it is not possible in this case to move uenc() to userfuncs.php). The modification requires PHP 5 (as html_entity_decode() doesn't work für UTF-8 in PHP 4) and the Utf8_XH-Plugin (which is already distributed with CMSimple_XH 1.5.4 and later):

Code: Select all

function uenc($s)
{
    global $tx;
    
    require_once UTF8 . '/utils/ascii.php'; // optionally replace with better transliteration library
    
    $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
    $s = utf8_accents_to_ascii($s); // optionally replace with better transliteration function
    $s = rawurlencode($s);
    $s = preg_replace('/%[a-f0-9]{2}/i', '-', $s);
    $s = preg_replace('/\-+/', '-', $s);
    $s = strtolower($s);
    return $s;
} 
Please note that this doesn't use urichar_org/new at all, and that the translation only caters for western and central european languages, so you can't use it for e.g. Russian or Chinese. If there is a better or more appropriate transliteration library available, you can use it instead of Utf8_XH (see the comments in the code).

And please note that this is only roughly tested, and that it will change your URLs, so existing backlinks and bookmarks might break.

Christoph

Re: Addon: Cleaner URLs

Posted: Tue Jun 24, 2014 2:07 pm
by cmb
Another solution for this issue would be to use URLify, what should work fine for several languages, including de, ru, fr and cs.

Just download the zip archive, and move URLify.php to cmsimple/classes.

Then replace XH_uenc() in cmsimple/functions.php with the following:

Code: Select all

function XH_uenc($s, $search, $replace)
{
    global $pth, $sl;

    require_once $pth['folder']['classes'] . 'URLify.php';
    return URLify::filter($s, 60, $sl);
}
Note that this solution completely ignores urichar_org/new and uri_word_separator.

Re: Addon: Cleaner URLs

Posted: Tue Jul 14, 2015 2:54 pm
by svasti
What about putting it into 1.7?
With hidden/advanced option to switch it on, or use the old solution instead.

Re: Addon: Cleaner URLs

Posted: Tue Jul 14, 2015 9:56 pm
by cmb
svasti wrote:What about putting it into 1.7?
With hidden/advanced option to switch it on, or use the old solution instead.
I'm not against it (even though I have some concerns adding foreign libraries to the core). Please put it on the roadmap (if not already done). If you like, I'll add it to sprint #5 (which we really should push forward; I'd like to see patches before voting, but currently do not have much time to work on these).

Re: Addon: Cleaner URLs

Posted: Tue Jul 14, 2015 10:51 pm
by svasti
Had a quick look at the code ... not so sure we need taking over the complete package. What about something smaller for XH which we controll ourselves. After all, it's just a list of chars to be exchanged with another char, not much different from what xh has already, only much bigger.

Re: Addon: Cleaner URLs

Posted: Tue Jul 14, 2015 11:35 pm
by cmb
svasti wrote:Had a quick look at the code ... not so sure we need taking over the complete package. What about something smaller for XH which we controll ourselves.
I'm not against this, either. In the end it comes down to what's easier to maintain for us. Not sure, what that'd be.

Re: Addon: Cleaner URLs

Posted: Wed Jul 15, 2015 12:49 am
by cmb
svasti wrote:After all, it's just a list of chars to be exchanged with another char, not much different from what xh has already, only much bigger.
Without closer investigation: it seems to me that URLify works rather similar to a hard-coded comprehensive urichar_new/org for several languages. Whatever we change for XH 1.7, we should thoroughly investigate the details (amongst others, we should measure the actual performance).