uri_seperator and url encoding

Discussions and requests related to new CMSimple features, plugins, templates etc. and how to develop.
Please don't ask for support at this forums!
cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

uri_seperator and url encoding

Post by cmb » Sun Nov 27, 2011 5:40 pm

Hello Community,

a user pointed out a problem with URLs to his CMSimple site posted on Facebook. If he wants to post e.g. the URL http://www.example.com/?page/subpage Facebook replaces this by http://www.example.com/?page%2Fsubpage, but this link gives a "404: Not found".

On Wikipedia it is explained, that:
Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.
This holds for the commonly used uri_seperators ":" and "/", and probably also for all other reasonable uri_seperators (as otherwise clashes with page titles could be expected).

So IMO it's up to CMSimple to cater for URLs that have url-encoded those uri_separators. AFAIK technically this could be done by a rewrite rule (e.g. by mod_rewrite) or in CMSimple's source code, the latter being more generally useful and more convenient, as the actual uri_seperator is known. This can be done in /cmsimple/cms.php around line 130 (might depend on the actual version), by inserting:

Code: Select all

$su = substr($su, 0, $cf['uri']['length']); // after this line
$su = preg_replace('/'.urlencode($cf['uri']['seperator']).'/iU', $cf['uri']['seperator'], $su); // insert this one
 
I've chosen the preg_replace() here instead of the faster str_ireplace() as the latter is only available under PHP 5. This substitution should be usable with any uri_seperator which is an ASCII character, as all other bytes in an UTF-8 bytestream will have their highest bit set (see e.g. Wikipedia).

If anybody sees problems with this approach, or even does have a better way to solve the issue, I'm looking forward to read about it.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Holger
Site Admin
Posts: 3470
Joined: Mon May 19, 2008 7:10 pm
Location: Hessen, Germany

Re: uri_seperator and url encoding

Post by Holger » Sun Nov 27, 2011 9:52 pm

cmb wrote:If anybody sees problems
No problem! That change is more than welcome, because I had the same problems in some cases with other socials-stuff.

KR
Holger

johnjdoe
Posts: 571
Joined: Tue May 20, 2008 6:32 am

Re: uri_seperator and url encoding

Post by johnjdoe » Mon Nov 28, 2011 11:30 am

Realy nice and helpfull modification! Should imho be in the core of the next release.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: uri_seperator and url encoding

Post by cmb » Sat Jan 21, 2012 11:55 pm

Hi Community,

a big problem with this solution was found: headings must not contain the chosen uri_seperator anymore. Otherwise navigation to the page is not possible. This is due to the fact, that CMSimple urlencode()s the uri_seperator, if it's contained in a heading, so that it's not interpreted as a separator between the headings of different menu levels. But with my proposed change it cannot be distinguished anymore, if the encoded uri_seperator was a separator in the first place, or if it's part of heading. Currently I see the following possibilities:
  1. Removing the change from CMSimple_XH 1.5.1, and dropping support for Facebook links, what might not be a good idea as posting links to CMSimple_XH sites might help to increase their popularity. (Well, that may change when SOPA/PIPA will pass legislation ;))
  2. The uri_seperator must be avoided in headings, which could be done through urichar_org/urichar_new, or perhaps even automatically. But that will change the URLs of already existing pages.
  3. The HTTP_REFERER could be used to determine, how to interpret the encoded uri_seperator. Definitely no option, as the information given by the browser might be incorrect, the detection would require all sites, that will encode the uri_separator to be known (and will probably change over time), and there could be ambiguities.
  4. Try to interpret the encoded uri_seperator both ways and see, if a corresponding page could be found. IMO no option, as there could be more than one page that fits.
  5. Submitting a petition to Facebook to change the way they encode URLs. ;)
So it seems to come down to choosing the lesser evil between (1) and (2).

Please note, that it doesn't matter, if the decoding of encoded uri_seperators happens through mod_rewrite or anywhere in CMSimple_XH.

Does anybody see a better solution? What should we do?

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Martin
Posts: 346
Joined: Thu Oct 23, 2008 11:57 am
Contact:

Re: uri_seperator and url encoding

Post by Martin » Sun Jan 22, 2012 10:20 am

Hi Christoph,

maybe there is a 6th possibility: Leave $su as it was before but accept an urlencoded version from outside when trying to figure out the page index in rfc(), ~ l. 624:

Code: Select all

 if ($su == $u[$i] || $su == urlencode($u[$i])) {
            $s = $i;
        }  
:?:

Martin

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: uri_seperator and url encoding

Post by cmb » Sun Jan 22, 2012 12:44 pm

Hi Martin,

indeed, that's smart and simple! Existing URLs don't have to be changed, no ambiguities are possible and links from Facebook will work, if the URL doesn't contain any already urlencoded characters (i.e. %XX), but that might be best practise anyway. So if users care for backlinks from Facebook, they can use urichar_org/new to keep their URLs "clean".

To be honest: I haven't tested it yet, but I'm quite convinced that this should have been the solution for the given problem in the first place.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

johnjdoe
Posts: 571
Joined: Tue May 20, 2008 6:32 am

Re: uri_seperator and url encoding

Post by johnjdoe » Tue Jan 24, 2012 11:29 am

cmb wrote:Hi Martin,

indeed, that's smart and simple! Existing URLs don't have to be changed, no ambiguities are possible and links from Facebook will work, if the URL doesn't contain any already urlencoded characters (i.e. %XX), but that might be best practise anyway. So if users care for backlinks from Facebook, they can use urichar_org/new to keep their URLs "clean".

To be honest: I haven't tested it yet, but I'm quite convinced that this should have been the solution for the given problem in the first place.

Christoph
Do you know allready in which version this will be implemented?

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: uri_seperator and url encoding

Post by cmb » Tue Jan 24, 2012 12:09 pm

Hi Gerd,

I've already put it on the roadmap for 1.5.2, but it is not approved (yet).

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

johnjdoe
Posts: 571
Joined: Tue May 20, 2008 6:32 am

Re: uri_seperator and url encoding

Post by johnjdoe » Wed Jan 25, 2012 6:08 am

cmb wrote:Hi Gerd,

I've already put it on the roadmap for 1.5.2, but it is not approved (yet).

Christoph
Thanks, hope it will be approved soon.

manu
Posts: 1085
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: uri_seperator and url encoding

Post by manu » Fri Feb 10, 2012 5:50 pm

cmb wrote:Hi Community,

a big problem with this solution was found: headings must not contain the chosen uri_seperator anymore. Otherwise navigation to the page is not possible. This is due to the fact, that CMSimple urlencode()s the uri_seperator, if it's contained in a heading, so that it's not interpreted as a separator between the headings of different menu levels. But with my proposed change it cannot be distinguished anymore, if the encoded uri_seperator was a separator in the first place, or if it's part of heading. ...//...
Does anybody see a better solution? What should we do?

Christoph
To prevent problems in upgrades it would be nice to have this remarked in the 1.5.1 release notes.
regards
manu

Post Reply