Pages in search &print и &page=0

A place to report and discuss bugs - please mention CMSimple-version, server, platform and browser version
Termin
Posts: 101
Joined: Thu Jan 27, 2011 8:55 am
Contact:

Pages in search &print и &page=0

Post by Termin » Thu Jul 07, 2011 9:03 am

Pages in search yandex

Code: Select all

http//mysite/?hom&print
http//mysite/?link&print
http//mysite/?link2&print
http//mysite/?link3&pic=0&page=0
http//mysite/?link4&page=0
Help please

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Pages in search &print и &page=0

Post by cmb » Thu Jul 07, 2011 9:59 am

Hello Termin,

sorry, but I just don't know what's your problem. Could you please explain in more detail, what you expect vs. what is happening?

Perhaps you can provide a link to your website.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Termin
Posts: 101
Joined: Thu Jan 27, 2011 8:55 am
Contact:

Re: Pages in search &print и &page=0

Post by Termin » Fri Jul 08, 2011 12:00 pm


cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Pages in search &print и &page=0

Post by cmb » Fri Jul 08, 2011 12:46 pm

Hello Termin,

if I understand you correctly, your problem is that yandex.ua shows links to some pages, that you do not want to be shown there, such as the print links (e.g. http://pixelcom.crimea.ua/?O_KOMPANII&print).

I don't know exactly about the current status of CMSimple in this regard, but I found an old thread about this topic: http://www.cmsimple.dk/forum/viewtopic.php?t=75

A workaround could be adaption of robots.txt or if you provide a sitemap.xml manually.

I would not consider this a bug. But perhaps you like to make a feature request in the Open Development forum. It might not be a problem for a future version of CMSimple to automatically create a sitemap.xml.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Termin
Posts: 101
Joined: Thu Jan 27, 2011 8:55 am
Contact:

Re: Pages in search &print и &page=0

Post by Termin » Fri Jul 08, 2011 6:41 pm

Christoph, yes, you understood correctly this problem is not solved robot.txt You need to put in the robot.txt all pages of the site to the end of $print, all't fit in robot.txt there are limitations. Yandex bans many pages. I need a website, so I will consider other options CMS. Thank you for your attention, I hope error fix.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Pages in search &print и &page=0

Post by cmb » Sat Jul 09, 2011 12:54 am

Hello Termin,

I'm no expert with search engines, but IMO CMSimple's behaviour is not an error. If you have a link 'http://mysite/?page1&print' on a page, this will be followed by a search engine. Indeed it is AFAIK not possible to exclude too many such links in robots.txt. And I don't know, if it's possible to exclude all pages in robots.txt and to only provide a sitemap.xml for yandex.

But if you don't need the print link, a simple solution to your problem would be to remove the

Code: Select all

echo printlink()
call from your /templates/???/template.htm. So neither your guests nor yandex (or another search engine) will see it. Perhaps this is a viable way for you?

Another problem might be the &page=??? links. These are used for the register plugin. You should use Register_mod_XH, if you don't already, and might to try to contact it's author for further details, Gert, who is a member of this forum.

You also might consider to contact vadim (another member of this forum), who is the operator of http://www.cmsimple-xh.ru. Probably he could provide more details about CMSimple_XH and yandex.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Termin
Posts: 101
Joined: Thu Jan 27, 2011 8:55 am
Contact:

Re: Pages in search &print и &page=0

Post by Termin » Sat Jul 09, 2011 10:53 am

Removed function

Code: Select all

function printlink() {
	global $f, $search, $file, $sn, $tx;
	$t = amp().'print';
	if ($f == 'search')$t .= amp().'function=search'.amp().'search='.htmlspecialchars(stsl($search));
	else if($f == 'file')$t .= amp().'file='.$file;
	else if($f != '' && $f != 'save')$t .= amp().$f;
	else if(sv('QUERY_STRING') != '')$t = str_replace('&','&',sv('QUERY_STRING')).$t; // str_replace by GE 09-06-22
	return '<a href="'.$sn.'?'.$t.'">'.$tx['menu']['print'].'</a>';
}
to ascertain the circumstances.

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Pages in search &print и &page=0

Post by cmb » Sat Jul 09, 2011 11:03 am

Hello Termin,

it might be safer, to replace printlink() with:

Code: Select all

function printlink() {
    return '';
}
 
If you remove printlink(), but it is called, AFAIK PHP throws a fatal error, so your visitors will only see an empty page. The code given above will just return an empty string, so the rest of the page will be displayed without any print link.

On further rethinking the problem of search engines looking for &print pages, I came across the possible <meta name="robots" content="noindex, nofollow"> solution. I posted about that in http://www.cmsimpleforum.com/viewtopic.php?f=29&t=3253

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Termin
Posts: 101
Joined: Thu Jan 27, 2011 8:55 am
Contact:

Re: Pages in search &print и &page=0

Post by Termin » Sat Jul 09, 2011 5:56 pm

Thank you Christoph. Write the code I test. Tell the result.

Viktor

cmb
Posts: 14225
Joined: Tue Jun 21, 2011 11:04 am
Location: Bingen, RLP, DE
Contact:

Re: Pages in search &print и &page=0

Post by cmb » Sat Jul 09, 2011 7:23 pm

Hello Victor,

I took a look at the &page links. I'm not exactly sure, but I guess they come from register plugin. I suppose you use Register_mod_XH. Then look at the top of /plugins/register/index.php. Directly below the long comment you find:

Code: Select all

// Start session unless a robot is accessing the page
if (preg_match('/Googlebot/i',$_SERVER['HTTP_USER_AGENT']));
else if (preg_match('/MSNbot/i',$_SERVER['HTTP_USER_AGENT']));
else if (preg_match('/slurp/i',$_SERVER['HTTP_USER_AGENT']));
else if(session_id() == "") 
  session_start();
 
You should add another line for the yandex bot. Under http://www.user-agents.org/index.shtml?t_z I found only the yandex.ru bot, but perhaps it's the same. That might keep yandex from indexing your &page links.

Christoph
Christoph M. Becker – Plugins for CMSimple_XH

Post Reply