Linkchecker in 1.6.5

A place to report and discuss bugs - please mention CMSimple-version, server, platform and browser version
svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

Linkchecker in 1.6.5

Post by svasti » Thu Jan 22, 2015 1:56 pm

Hi all,

just trying out the coming 1.6.5 (btw phantastic).
The Linkchecker gave some false alarms:
Claimed as faulty an internal link to a pdf (which is actually fine)
Linkchecker wrote:Link: hier
Linkziel: ./userfiles/downloads/Obstbaumschnitt-Termine%202015.pdf
Fehler: Fehlerhafter interner Link, Seite existiert nicht.
and an external link (which also is fine).
Linkchecker wrote:Link: 'Netzwerk Blühende Landschaft'
Linkziel: http://www.bluehende-landschaft.de
Fehler: Fehlerhafter externer Link, Seite nicht erreichbar.
http Statuscode: 404

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Linkchecker in 1.6.5

Post by cmb » Thu Jan 22, 2015 2:31 pm

svasti wrote:The Linkchecker gave some false alarms:
Well, the link checker is flawed (to say the least). You can get an overview by looking at tests/unit/LinkCheckerTest.php function dataForLinkStatus() where working as well as faulty behavior is tested.
svasti wrote:
Linkchecker wrote:Link: hier
Linkziel: ./userfiles/downloads/Obstbaumschnitt-Termine%202015.pdf
Fehler: Fehlerhafter interner Link, Seite existiert nicht.
That is because the link checker doesn't urldecode() filenames. It seems the following patch should fix this bug:

Code: Select all

Index: cmsimple/classes/LinkChecker.php
===================================================================
--- cmsimple/classes/LinkChecker.php	(revision 1468)
+++ cmsimple/classes/LinkChecker.php	(working copy)
@@ -200,7 +200,7 @@
         global $c, $u, $cl, $pth, $cf;
 
         if (isset($test['path']) && !isset($test['query'])) {
-            $filename = $test['path'];
+            $filename = urldecode($test['path']);
             if (is_file($filename) && is_readable($filename)) {
                 return '200';
             }
(A unit test should be also added.)
svasti wrote:
Linkchecker wrote:Link: 'Netzwerk Blühende Landschaft'
Linkziel: http://www.bluehende-landschaft.de
Fehler: Fehlerhafter externer Link, Seite nicht erreichbar.
http Statuscode: 404
That's rather interesting! While `curl http://www.bluehende-landschaft.de/` works as expected `curl -I http://www.bluehende-landschaft.de/` (i.e. a HTTP HEAD request) fails with 404 Not Found. It seems to me that is an erroneous response, because according to RFC 7231, section 4.3.2:
The HEAD method is identical to GET except that the server MUST NOT send a message body in the response (i.e., the response terminates at the end of the header section). The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET, except that the payload header fields (Section 3.3) MAY be omitted.
Well, the spec says "SHOULD", so the response is not erroneous, but still I can't see any reason for this behavior. Actually, they should be happy that a HEAD request is made instead of a GET, if only the headers are needed. Do you know the site operators; can you contact them?
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

Re: Linkchecker in 1.6.5

Post by svasti » Thu Jan 22, 2015 3:00 pm

cmb wrote:Do you know the site operators; can you contact them?
No, I've no connection to them. I was simply updating an XH 1.2 website for somebody and playing around a bit.

manu
Posts: 1090
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: Linkchecker in 1.6.5

Post by manu » Thu Jan 22, 2015 6:09 pm

internal download links with parameters like "./userfiles/downloads/annual_report.pdf?v=2015" also throws an error "Fehlerhafter interner Link, Seite existiert nicht.". I use this often to force a new download.
Anchors inside table elements seems to error also: ?actitivies#activities2015. But this is not back checked.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Linkchecker in 1.6.5

Post by cmb » Thu Jan 22, 2015 7:23 pm

manu wrote:internal download links with parameters like "./userfiles/downloads/annual_report.pdf?v=2015" also throws an error "Fehlerhafter interner Link, Seite existiert nicht.". I use this often to force a new download.
Anchors inside table elements seems to error also: ?actitivies#activities2015. But this is not back checked.
See also http://cmsimpleforum.com/viewtopic.php?t=7566. I'm still confused why the second link would fail.

Anyhow, IMO worse than the false positives are the false negatives (i.e. broken links that are not reported as such).
Christoph M. Becker – Plugins for CMSimple_XH

svasti
Posts: 1659
Joined: Wed Dec 17, 2008 5:08 pm

Re: Linkchecker in 1.6.5

Post by svasti » Fri Jan 23, 2015 4:32 pm

cmb wrote:It seems the following patch should fix this bug:
Will this be implmented in 1.6.5? Haven't seen it in the svn.
The fix is working fine here.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Linkchecker in 1.6.5

Post by cmb » Fri Jan 23, 2015 4:33 pm

svasti wrote:
cmb wrote:It seems the following patch should fix this bug:
Will this be implmented in 1.6.5? Haven't seen it in the svn.
The fix is working fine here.
Then I'll add it. :)
Christoph M. Becker – Plugins for CMSimple_XH

manu
Posts: 1090
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: Linkchecker in 1.6.5

Post by manu » Sat Jan 24, 2015 10:27 am

manu wrote:Anchors inside table elements seems to error also: ?actitivies#activities2015. But this is not back checked.
rubbish, the anchors are set inside a plugin(calendar). That's why the test fails.
manu wrote:internal download links with parameters like "./userfiles/downloads/annual_report.pdf?v=2015" also throws an error "Fehlerhafter interner Link, Seite existiert nicht.".
this on might be solved as follows:

Code: Select all

Index: LinkChecker.php
===================================================================
--- LinkChecker.php	(Revision 1464)
+++ LinkChecker.php	(Arbeitskopie)
@@ -199,7 +199,7 @@
     {
         global $c, $u, $cl, $pth, $cf;
 
-        if (isset($test['path']) && !isset($test['query'])) {
+        if (isset($test['path'])) {
             $filename = $test['path'];
             if (is_file($filename) && is_readable($filename)) {
                 return '200';
I don't see any reason to disallow a query append here. Unit test case added, unit tests passed.

EDIT:
But the case of file-not-found && query is not caught. Have to investigate - after the Hahnenkamm Downhill Race.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Linkchecker in 1.6.5

Post by cmb » Sat Jan 24, 2015 10:45 am

manu wrote:I don't see any reason to disallow a query append here. Unit test case added, unit tests passed.
I believe the original reason to check for no query string has been another CMSimple installation on the same server or a second language. If they're linked with .../index.php?... the page won't be checked now. The question is what is more likely; I tend to believe your use case is.
manu wrote:But the case of file-not-found && query is not caught.
Maybe there was another reason to check for no query string. :?

PS: FWIW: I've found the first suggestion for checking for files: http://cmsimpleforum.com/viewtopic.php? ... 130#p23248.

PPS: For reference: XH 1.6: Linkchecker overhaul
manu wrote:But the case of file-not-found && query is not caught. Have to investigate - after the Hahnenkamm Downhill Race.
I suggest you a add unit test in any way. If it can't be easily solved, then put it in the limitations section. We have "Fix link checker" already on the 1.7 roadmap, and I still suppose that we can't fix all cases with the current design. Besides my idea to check all links via HTTP requests (cURL would allow further protocols as well), at least we should make the code more legible by introducing objects encapsulating the link, the page on which it is used, the status etc. This "requires" PHP 5, though.
Christoph M. Becker – Plugins for CMSimple_XH

manu
Posts: 1090
Joined: Wed Jun 04, 2008 12:05 pm
Location: St. Gallen - Schweiz
Contact:

Re: Linkchecker in 1.6.5

Post by manu » Sat Jan 24, 2015 2:14 pm

Ok, I see, to improve the linkchecker is not so easy. This patch would make the trick for the moment, I think. I try to better distinguish the cases wether the link target is a file or not.

Code: Select all

Index: LinkChecker.php
===================================================================
--- LinkChecker.php	(Revision 1464)
+++ LinkChecker.php	(Arbeitskopie)
@@ -199,10 +199,12 @@
     {
         global $c, $u, $cl, $pth, $cf;
 
-        if (isset($test['path']) && !isset($test['query'])) {
+        if (isset($test['path']) && substr($test['path'], -1) != '/') {
             $filename = $test['path'];
             if (is_file($filename) && is_readable($filename)) {
                 return '200';
+            } else {
+                return 'file not found';
             }
         }
         if (!isset($test['query'])) {
Its a good idea to do a refactoring on linkchecker in the near future.

Post Reply