Jump to content
thirty bees forum

thirtybees Litespeed LSCache website crawler configuration step by step

Recommended Posts

This is a short guide for those that are running their thirtybees on LiteSpeed servers. I'm running my shops on shared hosting.

1. First of all of course you have to have Litespeed server and paid license for the cache and enabled crawler (on the server side). If something is not working please contact your host and check if the crawler is enabled, etc.

2. Install the free Litespeed cache module. With it you will be able to configure various settings. Current version is 1.4.1 and it should work with thirtybees 1.5.1 with the following settings:

(The module is written for PS of course and not as well supported as the one for Magento or WP but it is working fine as of now for thirtybees.)

At this moment you can test if the LScache is working - open your FO and open certain page. If you load this page for first time the header for the generated html code should show "miss":


After a page reload this should show hit:

(How to check this: right click anywhere, click 'Inspect', go to 'Network', on the left tab scroll to the top and look for your generated page, click it and on the right side under Headers tab there should be lot of information, near the end of the first section you should find 'X-Litespeed-Cache:' which should be 'miss' or 'hit'. If you loaded the current page for the first time it should be showing 'miss', on reload it should show 'hit' and this means that your LSCache is configured and working properly)

3. Download the Litespeed LSCache crawler script. Place it in your website root (next to index.php). Some documentation for the plugin and the crawler can be found here . Make it executable with 0711 permissions. image.thumb.png.b3801130bbd636b18bdd3fda03ae627c.png

4. You have to have your thirtybees Sitemap module installed and running. If you don't know how to configure it, just as in a comment and somebody will hop in to help, but the module is pretty self-explanatory. Generate your sitemap and leave this page open as you will need to copy the sitemap link.

5. As I said I'm using shared hosting so I go in my cPanel -> Cron jobs and make new entry with the following settings:

public_html/cachecrawler.sh https://www.myshop.com/1_en_0_sitemap.xml

and just for testing if everything is working you can run this job every 5 minutes (my shop with ~250 products takes ~120 seconds to crawl every page with default crawl interval).

(Please note that bash should be installed and running to use this script.)

6. If you configured the cron to run every 5 minutes wait 2-3 more minutes and visit your shop again. This time open the Develpoer tools in advance and load any link that you know was not visited recently. If the crawler did its job there you should see direct 'hit'.

7. In order to troubleshoot the crawler it is recommended to take a look at the email that is sent after the cronjob is done. You can find information how to configure this online on many places. In this email you can see which pages are cached already so are skipped, which are caching, etc.

8. It is recommended to edit your cronjob and set it to run every 12 or 24 hours.

And that's all, enjoy your faster first page loads!


Troubleshooting (in addition to the one in the LSCache documentation page):

1. I'm using Warehouse theme and the module comes with preconfigured profile for it. If your cart module or any other acts funny you can play around with ESI hole punching - more info in the documentation.

2. If you are using Blackhole for bots module from DataKick please keep in mind that the server running the cronjob could end up in the blacklist if the Sitemap settings are not correct. And blacklisted folder should be excluded from the sitemap (I had problems with /modules/, exclude everything you don't need in the sitemap). If you are locked out you can delete the latest IP from the blackholeforbots table, regenerate the sitemap and test again.

3. If you're using URLs with accented characters (or Cyrillic letters as I'm) you have to use -e switch like so:

public_html/cachecrawler.sh https://www.myshop.com/1_en_0_sitemap.xml -e

Otherwise the crawler will ignore every accented character and will only crawl the first couple of pages of your sitemap.

4. If not every page is crawled because some of them are forbidden or not available the crawler script will stop after 15 such pages. In order to override this you have to edit it. Look for PROTECTOR='ON' in the very beginning and turn it OFF. In general this will not be needed but for troubleshooting together with Blackhole for Bots module it could be needed.

5. In general don't have pages in your sitemap that are blocked by robots.txt when you are using Blackhole for Bots module.

6. If you are using it as me and you are using Multistore be extra careful if you have removed all module pages from your sitemap (leaving reviews, etc front facing pages) PER SHOP as All Shops context currently does not affect the Per shop settings and you can end up with edge-case scenario with two different versions of your sitemaps depending if you manually generate them or leave the cron to do so. So as of today - make all Per shop settings in Sitemap comply with robots.txt too.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...