Pedalman Posted October 5, 2017 Posted October 5, 2017 Hi I had some struggle with the generated robots.txt and Google's own testing tool for Google access vs robots.txt allows/dissalows restrictions. Anyway there are two lines that make me wonder: User-agent: * Disallow: /* The first line I thought would allow access of all bots that could be seo-wise nice :) The last I thought would be a general disallow. But since this line is at the end I thought that would be fine since bots are allowed at the top already. In summary Google was not allowed to crawl my site optimal with these settings. I had to comment # Disallow: /* and to add User-agent: * Disallow: (at the end of robots.txt) to give Google's bots and crawler free access. In a nutshell, since we are in ecommerce I wonder waht the optimal robot.txt is to boost seo/Google.
Pedalman Posted October 19, 2017 Author Posted October 19, 2017 Nice, that you found time to check it. I did rename the file and generated a new one on the live host. Differences are that I had to add, at the end: User-agent: * Disallow: Otherwise Google is not happy. I also added for my blog that I going to burry: Allow: */wordpress/
lesley Posted October 19, 2017 Posted October 19, 2017 It is. Actually, to be honest, I quit using them unless you are specifically trying to hide something. Robots crawl everything unless they are told specifically not to.
DRMasterChief Posted May 18, 2018 Posted May 18, 2018 Hello, as far as i understand, the standard for robots.txt does not know the use of asterisk * I would like to include some links for my own in the robots.txt , as i dont want to let crawlers see our privacy policy, Terms and Conditions, shipping and something like that.... The links are as follows for English and German language, e.g. myABCDEdomain.com/en/info/Terms-Conditions myABCDEdomain.com/de/info/allgemeine-geschaeftsbedingungen so i think i have to include it like this: ``` Files Disallow: /en/info/Terms-Conditions Disallow: /de/info/allgemeine-geschaeftsbedingungen ``` Is this correct? Does this directive means that the url (file) starting with /en/ is prohibited from browsing? To exclude the whole directory, it would be like this: Disallow: /de/info thank you
lesley Posted May 18, 2018 Posted May 18, 2018 Why even disallow them? Google has suggested to move to a system with minimal denies.
DRMasterChief Posted May 19, 2018 Posted May 19, 2018 Maybe you know that in Germany the lawyers or consumer advocates can write warnings and penalties to online mercants. They often use scripts for the search with search-engines or do his manually with a piece of text to find such information as ToS, data protection and so on. We want to prevent us of that and do not let those pages be indexed by search-engines.
Beeta Posted May 21, 2018 Posted May 21, 2018 On the CMS there's already the option to deny CSM articles to be indexed... is that not working?
DRMasterChief Posted May 21, 2018 Posted May 21, 2018 Yes i have activate this in BO, but i am not sure if this works. Even it is active in BO, these files are not included in the robots.txt in root folder (and other robots.txt are not noted by search engines). So my idea was to manually include them.
lesley Posted May 21, 2018 Posted May 21, 2018 Its because it does not add it to the robots.txt file, it uses the no index no follow tag. http://www.robotstxt.org/meta.html
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now