Jump to content
thirty bees forum

Major issues, blackhole being crawled by bing and duckduckgo even if "Disallow: /blackhole/" is written inside robots.txt, losing lots of customers because of it.


Recommended Posts

Posted

I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results.

If a visitor click on this link, then he is granted with the blackhole page and block from my site.

Google however doesn't crawl this page

How can I fix this?

Posted (edited)
1 hour ago, papagino said:

I just find out that when I do a search for my business on bing and duckduckgo, the page https://www.mysite.com/blackhole/ come up in the search results.

If a visitor click on this link, then he is granted with the blackhole page and block from my site.

Google however doesn't crawl this page

How can I fix this?

I see the same issue with duckduckgo. Not with bing@datakick

Edited by x97wehner
Posted

You should ask duckduckgo this question, not me. If the robots.txt explicitly blocks the url, there shouldn't be any reason for them to index it. 

  • Like 1
  • Haha 1
Posted (edited)

How does your robots.txt look like?

And please really check the file on the server, not what any tool is saying...

Edited by wakabayashi
Posted
17 minutes ago, wakabayashi said:

How does your robots.txt look like?

And please really check the file on the server, not what any tool is saying...

The file robots.txt on the server does have Disallow: /blackhole/ on the bottom...

Posted
2 minutes ago, papagino said:

The file robots.txt on the server does have Disallow: /blackhole/ on the bottom...

Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. 

Posted
1 hour ago, datakick said:

Did you add that entry into the robots.txt from the very beginning - at the same time you installed the blackholebots module? If you added this later, then bing might have already indexed the page. 

Yes I did... and that was a very long time ago, a year maybe...

 

Posted

I don't have this issue with my domains. They even show on first page when I search for the addon shops (something that Google does not do because reasons)... Shame that nobody uses those in Europe... 🙂


They say that robots.txt is respected but who knows...

Posted

I've released new version of the module that will allow you to change the trap URL. 

If bing is already indexing your trap url for any reason, you can change it from https://domain.com/blackhole to something new like https://domain.com/my-honey-trap. (and change robots.txt accordingly)

This way, when bing sends a traffic to your website to /blackhole address, it will not be blocked. To prevent 404, I suggest you add redirect from /blackhole to homepage into your .htaccess file as well.

Hopefully, bing will not add the new trap url to the index again.

I've added some extra precaution to prevent this as well -- if the known good bot (google, bing, etc) somehow make it to your trap url (even when the robots.txt blocks it), then the content of the trap page will be mostly empty, and page headers will contain <meta name="robots" content="noindex"> that will instruct bot to not index this page.

Posted (edited)

Great, hope this works with new version of the module.

should there be 2 lines in robots.txt:

Disallow: /blackholenew/
Disallow: /modules/blackholebots/blackholenew/

Edited by DRMasterChief
Posted
1 minute ago, DRMasterChief said:

Great, hope this works with new version of the module.

should it be like this in robots.txt:

Disallow: /blackholenew/
Disallow: /modules/blackholebots/blackholenew/

The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there

The robots.txt should look like this:

User-agent: *
Disallow: */blackholenew/
Disallow: /modules/blackholebots/blackhole/*

Note the * before /blackholenew/ -- it's to block language variants as well

  • Like 1
Posted
3 hours ago, datakick said:

The other disallow directive is for stores without friendly urls enabled - there is no change in blackhole name there

The robots.txt should look like this:

User-agent: *
Disallow: */blackholenew/
Disallow: /modules/blackholebots/blackhole/*

Note the * before /blackholenew/ -- it's to block language variants as well

Thanks datakick for the updates, my site is bilingual, maybe the missing "*" in the robots.txt was the problem in my case...

Will try the new version and investigate further...

Cheers

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...