this is not about the so-called clean URLs, i.e. URLs without the ? at the beginning. If you want to have these you can try using the respective tip in the CMSimple Wiki. Good luck
IMO such clean URLs are overvalued. The ? is no problem for search engines, and IMO only a minor issue for humans. What matters is the following:
- The URL shouldn't contain any URL encoded characters (such as Umlauts), as one cannot recognize or remember these easily: http://www.example.com/?Fahrvergn%FCgen. I'm well aware that modern browsers are capable of handling many special characters in the URL, but IMO it's better to avoid them generally.
- The underscore should be avoided, as many users are not familiar with this character and it is hard to spot if a link is underlined: http://www.example.com/?Hard_to_spot. phpBB handles this quite okay here, but see my signature for an underscore, that is really hard to spot .
- The slash should be used as delimiter of the page headings of different levels, which is quite common.
- The URL shouldn't contain mixed case characters, as these are hard to remember. It's quite common to have lower case letters only.
(1) can be solved by using urichar_org/new. But that requires to cater for all potential special chars in headings (what's quite impossible) or to adjust urichar_org/new whenever a new special character is used in a heading. Note that it's not possible to replace a comma with urichar_org/new.
So I've thought about an automatic solution, which works the following way:
- replace all HTML entities
- apply some kind of transliteration (e.g. é -> e, ä -> ae)
- replace all characters that would be URL encoded with a minus sign
- replace all occurences of more than one consecutive minus sign with a single minus sign
- convert the characters to lower case
Code: Select all
function uenc($s)
{
global $tx;
require_once UTF8 . '/utils/ascii.php'; // optionally replace with better transliteration library
$s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
$s = utf8_accents_to_ascii($s); // optionally replace with better transliteration function
$s = rawurlencode($s);
$s = preg_replace('/%[a-f0-9]{2}/i', '-', $s);
$s = preg_replace('/\-+/', '-', $s);
$s = strtolower($s);
return $s;
}
And please note that this is only roughly tested, and that it will change your URLs, so existing backlinks and bookmarks might break.
Christoph