Jump to content
thirty bees forum

The Caching Mystery


wakabayashi

Recommended Posts

Hello

I know, that we have some server experts here. Maybe somebody could explain, how caching is working. It confuses me, that there are many different caching methods. As far as I know, I have the following:

  • nginx cache
  • redis cache
  • smarty cache
  • full page cache

Additional there is the browser cache, right? Which one is doing what? Where are the risks?

I am always afraid that visitor could see outdated informations. Let's say I change price of a product. Is there a risk, that customer sees the old price? What if I change tpl files?

Link to comment
Share on other sites

Sure. Hope this quick explanation suffices:

  • Nginx cache/Varninish cache = ProxyServer Cache. This caches all HTTP responses (200, 302) when a GET request is sent.
  • Redis cache = Database Cache in RAM. Normally, your database caches using the filesystem cache; but with Redis, you load everything database-related to RAM.
  • Smarty cache. It caches the html code, that results of the compilation of your smarty templates.
  • Full page cache. It caches the current page html.
  • OPCache. It caches the compiled PHP code.
  • Browser cache. It caches the requested data (images, css, js, html, etc.) in local storage (client).

If you understand the flow of page rendering, you'll know which cache works in what step.

What are the risks? Well, there are just way too many risks. For example, you wouldn't want to cache a PUT or POST request, data with cookies, or a dynamic webpage that changes on every render, it would be a waste of resources.

Link to comment
Share on other sites

As far as I understand the greatest risc comes with the full page cache because it is not clear what we should or should not add to it. At least basic tutorial for non tech guys would be nice to have ;) The other caching systems I believe we (merchants) can't mess up.

Link to comment
Share on other sites

Thanks for the explanation, @moy2010. There's also a "CCC-cache" and 30bz code has kind of an internal caching system, too. Quite often, database queries are stored in some cache file and retrieved from there on the next read.

My gut feeling says there are so many cache mechanisms that handling all of them is slower than going without them.

For example, what is the point of a database cache? A DB server should know best how to use available RAM for maximum performance. Accordingly, instructing the DB server to grab more RAM is better than putting a caching system on top of it.

Also this Smarty cache. It's output is PHP, which is yet again cached as OPcache. And a third time if it's a full page cache hit.

@MockoB: I think all this caching stuff should melt down into a single on/off button. Perhaps also a selector on where to apply the full page cache, but that's it.

Also, to make extra sure a price change is recognized, simply clear the cache.

All this said, cleaning all these caching mechanisms is likely a lot of work and I can almost hear all the complaints about "30bz doesn't support caching!, it's slow!" :-)

Link to comment
Share on other sites

@Traumflug

"@MockoB: I think all this caching stuff should melt down into a single on/off button. Perhaps also a selector on where to apply the full page cache, but that’s it."

Excellent points. I especially like having the choice to turn something on or off rather than having to configure specific details I know nothing about.

Link to comment
Share on other sites

I'm not a cache expert, I know just enough to be dangerous. (A great expression my mother used to use.)

For example, what is the point of a database cache? A DB server should know best how to use available RAM for maximum performance.

A database cache like redis turns a general-purpose database like MySQL or MariaDB into a high performance monster capable of feats that far exceed what the database itself is capable of. This is important in scenarios where a system has far more "reads" than "updates" -- a web server, for example. There are also databases designed for mostly for performance but they have other trade offs which make them less desirable. Using MySQL/MariaDB + redis is a "best of both worlds" scenario. The well understood MySQL structure, better data integrity & consistency, but still the performance of far faster systems. It's great the 30bz is offering redis.

Trying to use a single "on/off" button to manage all cache settings seems like an ideal option but it removes all flexibility. Some modules don't play well with some types of caching. Some combinations of modules will have the same issue. Being able to turn on & off different types of cache is important when you are troubleshooting problems on your website.

Rather than a single toggle perhaps some information on a recommended standard configuration for shops that don't have a lot of modules installed would be nice. Maybe, when the devs have time, a 30bz module that can be installed with a toggle switch for the default caching configuration.

In the end the topic of caching comes down to the same thing I said in the "Let's talk about Search" thread: The advantage of SaaS systems like Shopify is that they do manage all of this stuff for users. User-managed webshops like 30bz require the user to learn more and then apply that knowledge. That gives us all tremendous power but it does require an investment of time & effort. The benefit to this is a more flexible site, the ability to scale to multiple sites if desired, and the lack of a big SaaS bill every month.

Link to comment
Share on other sites

This flexibility isn't of much point if there's no help to decide which settings fit best. Accordingly one needs some performance measurement system. I'm not aware of one other than turning on profiling, playing around and looking at the numbers.

If some caching system always improves performance, there's not much point in allowing to turn it off individually.

That's why I think there should be a single on/off button. Developers can test each cache strategy and do the individual decision for or against a system at development time already. SaaS providers can do it, too, after all.

If there are decisions which depend on the hardware in use, such tests can be run automated. Like running some phantomjs measurement for an hour or two, spitting out a configuration file with the best combination found.

At this point I want to emphasize that I do not point fingers to developers. No mistakes happened. The situation is just not completed and what I write here hopefully helps a bit to find a sensible guideline for future developments.

To give an example from embedded development (3D printer controller): early rule of thumb there was to make the binary as small as possible. Not because "disk" space is limited, but because each instruction has to be loaded and executed, which costs time. This alone brought performance some 20% over the competition, which uses all kinds of tricks (like writing in assembler) to meet buzzword performance improvements.

Next step was to introduce a performance measurement system. A simulator running the firmware on a PC. Now we could measure the number of clock cycles between two waypoints and had standard tests to get comparable results. This way we could try and experiment, shaving off microsecond by microsecond of what we had already. And had hard facts on which variant of code performed better, instead of guessing. Each commit got a performance report. Developers enjoyed to experiment and came out with things like this: "look, if I multiply before the division, it's 0.2% faster!" And these 0.2% happened often, eventually added up. End result is a 3D printer firmware not 20%, but 100% faster, 15% the binary size of these competitors and known to have the easiest readable code.

Link to comment
Share on other sites

On/Off button is not possible I think, because one server will support one caching option and other will not. For example, from my experience with ps 1.5 I tried memcached caching server and my site was dead slow. Nice to have "off" button just for that. Like with smart cache for javascript "off" actually gives better performance then when it is "on", etc. @Traumflug, knowing assembler deserves respect !

Link to comment
Share on other sites

This flexibility isn’t of much point if there’s no help to decide which settings fit best. Accordingly one needs some performance measurement system. I’m not aware of one other than turning on profiling, playing around and looking at the numbers.

Website performance is generally measured using 3rd party tools that check page speed, number of requests to load a page, the types of caching used, images types & compression, how cookies are set up, etc. Turning different types of caching on and off will generally impact the results from these tools in repeatable ways.

If some caching system always improves performance, there’s not much point in allowing to turn it off individually.

Some hosts don't offer some cache types, as @MockoB mentioned. Some hosts offer a cache type but only in an older version. Sometimes there are also multiple options for the same type of caching so a site will have to choose the one that a) their host offers b) is compatible with the modules they are using and c) that gives the best overall performance. Sometimes using a given type of cache would be ideal but the server being used doesn't have enough memory to use it.

Sometimes a module will conflict with a certain type of caching as well, in which case the shop owner will have to decide between using the module or using the cache.

That’s why I think there should be a single on/off button. Developers can test each cache strategy and do the individual decision for or against a system at development time already. SaaS providers can do it, too, after all.

This is possible on a SaaS platform because the business running that platform has complete control over the platform. You can't write a Shopify module that screws with the fundamentals of the Shopify platform because Shopify doesn't give module developers that level of access. With an open source solution any module developer, theme developer, or even shop owner can do anything they want to any of the code. They shouldn't, of course, but they can. This leads to many of the problems and incompatibilities. These incompatibilities often don't show up in every configuration but only in certain configurations. Sometimes one version of the cart will be fine but a later version will have problems. This mostly comes from module developers (and sometimes theme developers) not following the published guidelines for what they are building. Sometimes they know they can get better performance if they do things a certain way, sometimes it's just bad coding, and sometimes there may be no other way to do what they want to do. The upside of being an open source cart is complete access and total flexibility. The downside of being an open source cart is also complete access and total flexibility...

To give an example from embedded development (3D printer controller)

Again, you guys could do this because you had complete control over the system. If you had 10,000 random developers also making changes, additions, deletions, etc in ways they thought best you would never have been able to do what you did.

Link to comment
Share on other sites

Some hosts don’t offer some cache types, as @MockoB mentioned. Some hosts offer a cache type but only in an older version.

Such stuff can be detected automatically. No need to put that burden onto the user.

Sometimes a module will conflict with a certain type of caching as well

This is a problem the module developer should deal with. Again, no need to bother the user with it.

you guys could do this because you had complete control over the system.

Actually we didn't. This software runs on some 50 distinct controllers, including different architectures. Still this strategy worked out, because what makes a software faster on a 8-bit architecture makes it usually faster on a 32-bit architecture, too. If not, code detects this and offers distinct code paths for both.

OK, I'll stop now. Web software was always complicated in the details, so it'll take time until users get used to simplicity.

Link to comment
Share on other sites

@Traumflug said in The Caching Mystery:

Some hosts don’t offer some cache types, as @MockoB mentioned. Some hosts offer a cache type but only in an older version.

Such stuff can be detected automatically. No need to put that burden onto the user.

Sometimes yes, sometimes no. The advantage of an SaaS is that the software developers know exactly what the system will be running on, and exactly how the system will be configured. They know what services will be available, the addresses the services are on, and what port numbers will be used. They might not control the baremetal hardware but they will control everything from the OS layer through to the SaaS software the user is paying for. This makes everything immeasurably easier to manage. There are no "special cases" or oddball configurations.

How would 30bz detect that there is an ES server available on a different IP? How would they know if redis was configured and available but on a non-standard port? There are too many options.

Sometimes a module will conflict with a certain type of caching as well

This is a problem the module developer should deal with. Again, no need to bother the user with it.

Sure, in theory. In reality a shop owner just bought a module and now they have to figure out why it isn't working properly. Some developers will be helpful, some will not. Especially since 30bz is a fork some developers won't be super interested in helping out.

you guys could do this because you had complete control over the system.

Actually we didn't.

You did though. You knew which hardware you were willing to support and you had control over the codebase that would be run on that hardware. It's not SaaS level control but it's much more control than any open source shopping cart has.

OK, I'll stop now. Web software was always complicated in the details, so it'll take time until users get used to simplicity.

It's complicated because of the nature of the Internet and the nature of open source software. This is unlikely to change any time soon due to the desire for variety and flexibility.

For those that have a need for a highly standardized shopping cart with little config required and no tech knowledge then Shopify exists. I know I keep mentioning Shopify but it's because they have done an incredible job. They have an excellent platform with strong performance and a good API. It's expensive but for what they provide it's a good value for many people, especially if you do not have to pay their transaction fees. IMO there is no other shopping cart SaaS system that comes close to what they offer. It's not surprising that they have grown as quickly as they have.

Link to comment
Share on other sites

You knew which hardware you were willing to support and you had control over the codebase that would be run on that hardware.

That's exactly what 30bz has, too.

Shopify but it’s because they have done an incredible job.

Shopify did what a thirty bees user should be able to do within an hour. I simply don't buy this "it's all too complicated". PHP has a well defined API, Apache has the same, as has MySQL. Many users demonstrate here that setting up a shop can be done in short time. So let's make this even easier!

Link to comment
Share on other sites

@Traumflug said in The Caching Mystery:

You knew which hardware you were willing to support and you had control over the codebase that would be run on that hardware.

That's exactly what 30bz has, too.

30bz doesn't even know what OS the system will run on -- Linux? BSD? Windows? OSX? They don't know what services will be available. They don't even know what web server will be used. Beyond that they have no idea what versions will be on a given host, and hardware varies wildly from shared hosts like Dreamhost to multiple dedicated servers. People also modify the codebase all the time and stores use multiple modules that may or may not have been developed following any sort of reasonable guidelines. It's massively more complex than any embedded system. Embedded development presents its own challenges but from a complexity of possible configurations standpoint I'm not sure how you can even compare the two, to be honest.

Shopify but it’s because they have done an incredible job.

Shopify did what a thirty bees user should be able to do within an hour.

If it was possible for a non-technical user to emulate what Shopify provides with an hour of work then Shopify wouldn't exist. I don't mean this in any sort of insulting way but I think you are greatly simplifying the complexity of what is running on the typical web hosting platform, and how much work goes into making all of that simple to use. 10% of the work is in putting a functional base platform together, 90% of the work is in making it easy to use and difficult to screw up.

I simply don't buy this "it's all too complicated". PHP has a well defined API, Apache has the same, as has MySQL. Many users demonstrate here that setting up a shop can be done in short time. So let's make this even easier!

Who says a server will be running Apache or MySQL? nginx and MariaDB are just as likely and could be better choices. LiteSpeed is an option on some hosts too, and of course IIS. Who knows which version any of this will be? Some hosts update all the time, some hosts patch old versions instead of updating. Even PHP could be one of a bunch of versions and can be installed in multiple different ways. There is no "standard" web configuration.

As far as ease of use is concerned I think 30bz already makes all the underlying complexity easy to use. Yes, some sort of add-on to help with cache configuration would be nice but there is no getting around that the user still needs to have some understanding of how all this works. At some point something will go wrong. Corrupt cache, a cache that needs to be manually refreshed, a compatibility issue... These types of things happen to all stores and require a degree of active management. Unless the store owner pays someone else to take care of their store then they are going to have to learn how to do these types of things.

Link to comment
Share on other sites

30bz doesn’t even know what OS the system will run on – Linux? BSD? Windows? OSX?

It doesn't matter, 30bz has zero touching points with the operating system.

Who says a server will be running Apache or MySQL? [...] Who knows which version any of this will be?

The server knows and sends this information with every page request. PHP knows it, too, see phpinfo()

Corrupt cache, a cache that needs to be manually refreshed

Yet another reason to cut down this cache mess.

This discussion pretty clearly shows why setting up a shop is considered to be hard. People expect it to be hard, a self fulfilling prophecy. Accordingly these people have zero vision on how to make it a couple-of-clicks experience. It clearly can be done.

Link to comment
Share on other sites

@Traumflug said in The Caching Mystery:

This discussion pretty clearly shows why setting up a shop is considered to be hard. People expect it to be hard, a self fulfilling prophecy. Accordingly these people have zero vision on how to make it a couple-of-clicks experience. It clearly can be done.

If you really believe it is so easy to do what Shopify has done I suggest you set up a business that competes with Shopify.

Shopify had gross revenues of $390 million in 2016 but a net loss of $37 million. However since it's so easy to build a system as simple to use as Shopify you should be able to do it with very few staff and reap massive profits. You could undercut them slightly on price and your growth would be stratospheric. Imagine how much you could spend on marketing with such a low tech overhead.

Perhaps I shouldn't be so sarcastic, you clearly know much more about this than me. Me and the combined developers of every open source shopping cart on the planet. How could they not see how easy it would be to make their systems super easy to use? Someone should tell them!

I look forward to your future shopping cart SaaS business success. Don't forget about us plebes while you're relaxing on your future private island.

/sarcasm-off

Edit: Added top quote for context, added minor clarification to last paragraph.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...