papagino Posted October 1 Posted October 1 I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results. If a visitor click on this link, then he is granted with the blackhole page and block from my site. Google however doesn't crawl this page How can I fix this?
x97wehner Posted October 1 Posted October 1 (edited) 1 hour ago, papagino said: I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results. If a visitor click on this link, then he is granted with the blackhole page and block from my site. Google however doesn't crawl this page How can I fix this? I see the same issue with duckduckgo. Not with bing@datakick Edited October 1 by x97wehner
datakick Posted October 1 Posted October 1 You should ask duckduckgo this question, not me. If the robots.txt explicitly blocks the url, there shouldn't be any reason for them to index it. 1 1
wakabayashi Posted October 1 Posted October 1 (edited) How does your robots.txt look like? And please really check the file on the server, not what any tool is saying... Edited October 1 by wakabayashi
papagino Posted October 1 Author Posted October 1 17 minutes ago, wakabayashi said: How does your robots.txt look like? And please really check the file on the server, not what any tool is saying... The file robots.txt on the server does have Disallow: /blackhole/ on the bottom...
datakick Posted October 1 Posted October 1 2 minutes ago, papagino said: The file robots.txt on the server does have Disallow: /blackhole/ on the bottom... Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page.
papagino Posted October 1 Author Posted October 1 1 hour ago, datakick said: Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. Yes I did... and that was a very long time ago, a year maybe...
30knees Posted October 1 Posted October 1 Take a look here: https://www.bing.com/webmasters/help/how-can-i-remove-a-url-or-page-from-the-bing-index-37c07477
the.rampage.rado Posted October 1 Posted October 1 I don't have this issue with my domains. They even show on first page when I search for the addon shops (something that Google does not do because reasons)... Shame that nobody uses those in Europe... 🙂 They say that robots.txt is respected but who knows...
datakick Posted October 2 Posted October 2 I've released new version of the module that will allow you to change the trap URL. If bing is already indexing your trap url for any reason, you can change it from https://domain.com/blackhole to something new like https://domain.com/my-honey-trap. (and change robots.txt accordingly) This way, when bing sends a traffic to your website to /blackhole address, it will not be blocked. To prevent 404, I suggest you add redirect from /blackhole to homepage into your .htaccess file as well. Hopefully, bing will not add the new trap url to the index again. I've added some extra precaution to prevent this as well -- if the known good bot (google, bing, etc) somehow make it to your trap url (even when the robots.txt blocks it), then the content of the trap page will be mostly empty, and page headers will contain <meta name="robots" content="noindex"> that will instruct bot to not index this page.
wakabayashi Posted October 2 Posted October 2 @datakick great work! IMO "my-honey-trap" should be the default 🤣 😜 1
DRMasterChief Posted October 2 Posted October 2 (edited) Great, hope this works with new version of the module. should there be 2 lines in robots.txt: Disallow: /blackholenew/ Disallow: /modules/blackholebots/blackholenew/ Edited October 2 by DRMasterChief
datakick Posted October 2 Posted October 2 1 minute ago, DRMasterChief said: Great, hope this works with new version of the module. should it be like this in robots.txt: Disallow: /blackholenew/ Disallow: /modules/blackholebots/blackholenew/ The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there The robots.txt should look like this: User-agent: * Disallow: */blackholenew/ Disallow: /modules/blackholebots/blackhole/* Note the * before /blackholenew/ -- it's to block language variants as well 1
datakick Posted October 2 Posted October 2 Also, you should use tool like https://technicalseo.com/tools/robots-txt/ to check that it works properly
papagino Posted October 2 Author Posted October 2 3 hours ago, datakick said: The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there The robots.txt should look like this: User-agent: * Disallow: */blackholenew/ Disallow: /modules/blackholebots/blackhole/* Note the * before /blackholenew/ -- it's to block language variants as well Thanks datakick for the updates, my site is bilingual, maybe the missing "*" in the robots.txt was the problem in my case... Will try the new version and investigate further... Cheers
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now