Jump to content
thirty bees forum

Major issues, blackhole being crawled by bing and duckduckgo even if "Disallow: /blackhole/" is written inside robots.txt, losing lots of customers because of it.


papagino

Recommended Posts

1 hour ago, papagino said:

I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results.

If a visitor click on this link, then he is granted with the blackhole page and block from my site.

Google however doesn't crawl this page

How can I fix this?

I see the same issue with duckduckgo. Not with bing@datakick

Edited by x97wehner
Link to comment
Share on other sites

17 minutes ago, wakabayashi said:

How does your robots.txt look like?

And please really check the file on the server, not what any tool is saying...

The file robots.txt on the server does have Disallow: /blackhole/ on the bottom...

Link to comment
Share on other sites

2 minutes ago, papagino said:

The file robots.txt on the server does have Disallow: /blackhole/ on the bottom...

Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. 

Link to comment
Share on other sites

1 hour ago, datakick said:

Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. 

Yes I did... and that was a very long time ago, a year maybe...

 

Link to comment
Share on other sites

I've released new version of the module that will allow you to change the trap URL. 

If bing is already indexing your trap url for any reason, you can change it from https://domain.com/blackhole to something new like https://domain.com/my-honey-trap. (and change robots.txt accordingly)

This way, when bing sends a traffic to your website to /blackhole address, it will not be blocked. To prevent 404, I suggest you add redirect from /blackhole to homepage into your .htaccess file as well.

Hopefully, bing will not add the new trap url to the index again.

I've added some extra precaution to prevent this as well -- if the known good bot (google, bing, etc) somehow make it to your trap url (even when the robots.txt blocks it), then the content of the trap page will be mostly empty, and page headers will contain <meta name="robots" content="noindex"> that will instruct bot to not index this page.

Link to comment
Share on other sites

1 minute ago, DRMasterChief said:

Great, hope this works with new version of the module.

should it be like this in robots.txt:

Disallow: /blackholenew/
Disallow: /modules/blackholebots/blackholenew/

The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there

The robots.txt should look like this:

User-agent: *
Disallow: */blackholenew/
Disallow: /modules/blackholebots/blackhole/*

Note the * before /blackholenew/ -- it's to block language variants as well

  • Like 1
Link to comment
Share on other sites

3 hours ago, datakick said:

The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there

The robots.txt should look like this:

User-agent: *
Disallow: */blackholenew/
Disallow: /modules/blackholebots/blackhole/*

Note the * before /blackholenew/ -- it's to block language variants as well

Thanks datakick for the updates, my site is bilingual, maybe the missing "*" in the robots.txt was the problem in my case...

Will try the new version and investigate further...

Cheers

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...