papagino Posted October 1 Share Posted October 1 I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results. If a visitor click on this link, then he is granted with the blackhole page and block from my site. Google however doesn't crawl this page How can I fix this? Link to comment Share on other sites More sharing options...
x97wehner Posted October 1 Share Posted October 1 (edited) 1 hour ago, papagino said: I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results. If a visitor click on this link, then he is granted with the blackhole page and block from my site. Google however doesn't crawl this page How can I fix this? I see the same issue with duckduckgo. Not with bing@datakick Edited October 1 by x97wehner Link to comment Share on other sites More sharing options...
datakick Posted October 1 Share Posted October 1 You should ask duckduckgo this question, not me. If the robots.txt explicitly blocks the url, there shouldn't be any reason for them to index it. 1 1 Link to comment Share on other sites More sharing options...
wakabayashi Posted October 1 Share Posted October 1 (edited) How does your robots.txt look like? And please really check the file on the server, not what any tool is saying... Edited October 1 by wakabayashi Link to comment Share on other sites More sharing options...
papagino Posted October 1 Author Share Posted October 1 17 minutes ago, wakabayashi said: How does your robots.txt look like? And please really check the file on the server, not what any tool is saying... The file robots.txt on the server does have Disallow: /blackhole/ on the bottom... Link to comment Share on other sites More sharing options...
datakick Posted October 1 Share Posted October 1 2 minutes ago, papagino said: The file robots.txt on the server does have Disallow: /blackhole/ on the bottom... Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. Link to comment Share on other sites More sharing options...
papagino Posted October 1 Author Share Posted October 1 1 hour ago, datakick said: Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. Yes I did... and that was a very long time ago, a year maybe... Link to comment Share on other sites More sharing options...
30knees Posted October 1 Share Posted October 1 Take a look here: https://www.bing.com/webmasters/help/how-can-i-remove-a-url-or-page-from-the-bing-index-37c07477 Link to comment Share on other sites More sharing options...
the.rampage.rado Posted October 1 Share Posted October 1 I don't have this issue with my domains. They even show on first page when I search for the addon shops (something that Google does not do because reasons)... Shame that nobody uses those in Europe... 🙂 They say that robots.txt is respected but who knows... Link to comment Share on other sites More sharing options...
datakick Posted October 2 Share Posted October 2 I've released new version of the module that will allow you to change the trap URL. If bing is already indexing your trap url for any reason, you can change it from https://domain.com/blackhole to something new like https://domain.com/my-honey-trap. (and change robots.txt accordingly) This way, when bing sends a traffic to your website to /blackhole address, it will not be blocked. To prevent 404, I suggest you add redirect from /blackhole to homepage into your .htaccess file as well. Hopefully, bing will not add the new trap url to the index again. I've added some extra precaution to prevent this as well -- if the known good bot (google, bing, etc) somehow make it to your trap url (even when the robots.txt blocks it), then the content of the trap page will be mostly empty, and page headers will contain <meta name="robots" content="noindex"> that will instruct bot to not index this page. Link to comment Share on other sites More sharing options...
wakabayashi Posted October 2 Share Posted October 2 @datakick great work! IMO "my-honey-trap" should be the default 🤣 😜 1 Link to comment Share on other sites More sharing options...
DRMasterChief Posted October 2 Share Posted October 2 (edited) Great, hope this works with new version of the module. should there be 2 lines in robots.txt: Disallow: /blackholenew/ Disallow: /modules/blackholebots/blackholenew/ Edited October 2 by DRMasterChief Link to comment Share on other sites More sharing options...
datakick Posted October 2 Share Posted October 2 1 minute ago, DRMasterChief said: Great, hope this works with new version of the module. should it be like this in robots.txt: Disallow: /blackholenew/ Disallow: /modules/blackholebots/blackholenew/ The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there The robots.txt should look like this: User-agent: * Disallow: */blackholenew/ Disallow: /modules/blackholebots/blackhole/* Note the * before /blackholenew/ -- it's to block language variants as well 1 Link to comment Share on other sites More sharing options...
datakick Posted October 2 Share Posted October 2 Also, you should use tool like https://technicalseo.com/tools/robots-txt/ to check that it works properly Link to comment Share on other sites More sharing options...
papagino Posted October 2 Author Share Posted October 2 3 hours ago, datakick said: The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there The robots.txt should look like this: User-agent: * Disallow: */blackholenew/ Disallow: /modules/blackholebots/blackhole/* Note the * before /blackholenew/ -- it's to block language variants as well Thanks datakick for the updates, my site is bilingual, maybe the missing "*" in the robots.txt was the problem in my case... Will try the new version and investigate further... Cheers Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now