netamismb Posted March 13 Posted March 13 In Google Console, Google is crawling a lot of pages (more than 1k pages) with index.php?controller=trigger&ts=1710225603 which is going to {"status":"failed","error":"Forbidden"}. Any idea how I can get rid of them?
the.rampage.rado Posted March 13 Posted March 13 Probably some module is making them. You can remove them from the sitemap from the sitemap module settings page - check all that you DON'T want to be shown in the sitemap. 1
wakabayashi Posted March 14 Posted March 14 I have similair warnings. Google is trying to crawl this controller=trigger urls. If I remember right, they are coming from the core. I added: Disallow: /*controller=trigger in robots.txt 1 1
wakabayashi Posted July 17 Posted July 17 As the robots.txt solution is also not perfect, I have opened an issue on github: https://github.com/thirtybees/thirtybees/issues/1844 1
DRMasterChief Posted July 18 Posted July 18 it is closed at Github right now, can anyone tell the solution here? thanks
the.rampage.rado Posted July 18 Posted July 18 You can update to edge (simplest) or update your files as follows: https://github.com/thirtybees/thirtybees/commit/e89c365731d095d971038d21d2e0310833e469b9 Regarding the discovered pages in the Search Console - I'm curious if they can be removed or should we leave them as is?
nickz Posted July 18 Posted July 18 Search-engines are asking for sitemaps, but push into the index everything, even pages the engine create. Bing e.g has pages in the index which were outdated in 2019. As DDG took over their indexes so those URLS are in there too. So in love with AI that they forgot all about cleaning their indexes.
wakabayashi Posted July 19 Posted July 19 (edited) 15 hours ago, the.rampage.rado said: Regarding the discovered pages in the Search Console - I'm curious if they can be removed or should we leave them as is? You can't remove them. They should go away over time. But I have entries in search console, that were fixed a year ago (with canonical and so) and google still tries to crawl the old version. The search console even tells, that there is no referring page to the url they want to crawl 😂 🤦♂️ It surely helps, when you have the bleeding edge AND the robots.txt disallow. Personally I think this has no realy influence on SEO, but it helps to clean search console. With a clean search console it's simpler to find real issues... Edited July 19 by wakabayashi
datakick Posted July 19 Posted July 19 16 hours ago, DRMasterChief said: it is closed at Github right now, can anyone tell the solution here? thanks The solution is to demand that google fix this bug on their side. If your robots.txt instructs google to ignore some url, then google console should not display warning that the url is not crawlable. 1
DRMasterChief Posted July 20 Posted July 20 thank you @datakick i have Disallow: /*controller=trigger since the first day and honestly i do not have this crawled by Google, but it is an interesting thing for me.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now