Google is crawling bogus pages

netamismb · March 13, 2024

In Google Console, Google is crawling a lot of pages (more than 1k pages) with index.php?controller=trigger&ts=1710225603 which is going to {"status":"failed","error":"Forbidden"}.

Any idea how I can get rid of them?

the.rampage.rado · March 13, 2024

Probably some module is making them. You can remove them from the sitemap from the sitemap module settings page - check all that you DON'T want to be shown in the sitemap.

wakabayashi · March 14, 2024

I have similair warnings. Google is trying to crawl this controller=trigger urls. If I remember right, they are coming from the core.

I added:

Disallow: /*controller=trigger

in robots.txt

wakabayashi · July 17, 2024

As the robots.txt solution is also not perfect, I have opened an issue on github: https://github.com/thirtybees/thirtybees/issues/1844

DRMasterChief · July 18, 2024

it is closed at Github right now, can anyone tell the solution here? thanks

the.rampage.rado · July 18, 2024

You can update to edge (simplest) or update your files as follows: https://github.com/thirtybees/thirtybees/commit/e89c365731d095d971038d21d2e0310833e469b9

Regarding the discovered pages in the Search Console - I'm curious if they can be removed or should we leave them as is?

nickz · July 18, 2024

Search-engines are asking for sitemaps, but push into the index everything, even pages the engine create.
Bing e.g has pages in the index which were outdated in 2019. As DDG took over their indexes so those URLS are in there too.
So in love with AI that they forgot all about cleaning their indexes.

wakabayashi · July 19, 2024

15 hours ago, the.rampage.rado said:

Regarding the discovered pages in the Search Console - I'm curious if they can be removed or should we leave them as is?

You can't remove them. They should go away over time. But I have entries in search console, that were fixed a year ago (with canonical and so) and google still tries to crawl the old version. The search console even tells, that there is no referring page to the url they want to crawl 😂 🤦‍♂️

It surely helps, when you have the bleeding edge AND the robots.txt disallow.

Personally I think this has no realy influence on SEO, but it helps to clean search console. With a clean search console it's simpler to find real issues...

Edited July 19, 2024 by wakabayashi

datakick · July 19, 2024

16 hours ago, DRMasterChief said:

it is closed at Github right now, can anyone tell the solution here? thanks

The solution is to demand that google fix this bug on their side.

If your robots.txt instructs google to ignore some url, then google console should not display warning that the url is not crawlable.

DRMasterChief · July 20, 2024

thank you @datakick

i have Disallow: /*controller=trigger

since the first day and honestly i do not have this crawled by Google, but it is an interesting thing for me.

netamismb · January 28

I want to update the files as follows: https://github.com/thirtybees/thirtybees/commit/e89c365731d095d971038d21d2e0310833e469b9. Do you have any clues about what I need to test it?

wakabayashi · November 12

My search console is still full of entries, cause I block the urls with robots.txt.

I was now looking into the source code and I believe, there is a simple solution. Just install the module "tbdetectcrawler". It will stop adding this trigger urls to crawlers.

Sign In

Google is crawling bogus pages

Recommended Posts

netamismb

the.rampage.rado

wakabayashi

wakabayashi

DRMasterChief

the.rampage.rado

nickz

wakabayashi

datakick

DRMasterChief

netamismb

wakabayashi

Create an account or sign in to comment

Create an account

Sign in