Jump to content
thirty bees forum

Indiegogo ElasticSearch project


vincentdenkspel

Recommended Posts

  • Replies 172
  • Created
  • Last Reply

Top Posters In This Topic

I doubt, that any shop can fullfill customer expectations without filters and search.

I see only two reason to not support it: 1. You have no budget. Then you (hopefully) also don't have the server specs, to use it. 2. You are happy with the existing modules.

Link to comment
Share on other sites

A few remarks from my developer view:

These have to be done in thirty bees anyways or are part of native search already:

Possibility to do: Ajaxsearch Normal search Instant search The possibility to completely disable the module and fall back to standard search during index updates Support indexation defined on cron Manual indexation fired in background with slow server timeout protection Indexation cron links ready to use with ‘Cron task manager’ Support auto indexation on product add/update/duplicate The ability to not index certain fields ElasticSearch log in all available log levels Filter by: Categories Features Manufacturer Ajax filter Price slider Filter on pages: Home Categories Manufacturer Supplier Special pages Best sales New Products Search pages Display selected filter with a ‘cancel’ button Possibility to hook the module to left, right en center Possibility to display the filter result in grid or list view Possibility to select infinite scroll

These are easily doable with PHP/SQL onsite code:

Possibility to: Search within word Select minimum word length Blacklisted words Combining searches (AND, OR, etc.) Indexing.

Leaves these, which either require ElasticSearch or more advanced code:

Field weight

Seeing the long list of what a search provider does not provide, and the short list what it does provide, still can be done without a provider, I wonder whether it's a good idea to become dependent on such a provider.

Maybe the list simply isn't complete, yet.

In the earlier discussion response time was mentioned a few times. As far as I know (maybe I'm wrong on this), 30bz runs even Ajax requests through a pretty complex dispatcher, domain name verificator and a lot more. Part of the project work could be to simplify handling such requests, making them faster. As searches happen, a connection to the shop domain is already established, so chances are good it's actually faster than connecting to another server just for searches.

So far my $0.02.

Link to comment
Share on other sites

I am certinly not a developer but as I understand it the list you present about what can be done for me seems to mean a lot of core changes. There is a working module that is just not complete. I think that ES has proven many times the difference in time used for searches. Why make it more complicated and somehow invent the wheel one more time. Why not use the knowledge of people that probably has forgotten more about how seach engines should be built than anyone of us will ever learn

Link to comment
Share on other sites

Tight integration of external services is complicated. At least more complicated than using a local service. If this external service is faster than a local service, there's something wrong with the local service. Fixing that benefits not only searches, but all communications between shop and customer.

Link to comment
Share on other sites

Native database full text search sucks, and that's why Elasticsearch exists and several other open source search solutions exist. It's why Algolia exists, has received ~$75mil in funding, and can charge customers so much for their services.

Native db text searching does not scale well, does not provide features like synonym search, does not handle spelling mistakes well (really not at all), and there is no way to easily weight the indexes. "Instant" search results in 30bz using native db searching are too slow to be useful and really are just frustrating. They're slow enough that I didn't even realize they existed as I could type my entire query and hit enter to search before any "instant" results appeared. Filtering search results using native db search is likewise slow.

Besides being incredibly fast, Elasticsearch provides an easy to use query API which is far easier to work with and modify than complicated SQL queries full of joins. It scales very well and is the search platform of choice for many well known sites. It's also being actively developed and is improving even more with each release.

IMO if 30bz aspires to complete with PS and Magento then a powerful search module is an absolute must. Next to fast caching I'd say that fast & accurate search is the most important thing an eCommerce site can do well. I'm happy to contribute to this project and will put in 250 Euro as soon as I can after the Indiegogo project goes live.

Edit: Totally forgot to mention, ES supports autocomplete, auto-suggest, and the ability to highlight certain results if desired. It's endlessly flexible and extremely powerful.

Link to comment
Share on other sites

“Instant” search results in 30bz using native db searching are too slow to be useful and really are just frustrating. [...] Besides being incredibly fast, Elasticsearch provides

OK, you're an ES addict and don't even try to think about a great module without it.

Link to comment
Share on other sites

@Traumflug said in Indiegogo ElasticSearch project:

I'm not a developer. I made this initiative from a site owner perspective.

These have to be done in thirty bees anyways or are part of native search already:

I've TB installed but apparently i'm missing a lot. Which of the these features are part of native search already ? not the filter options.

Seeing the long list of what a search provider does not provide, and the short list what it does provide, still can be done without a provider, I wonder whether it’s a good idea to become dependent on such a provider.

I was told that with a decent vps/dedicated server you could run Elasticsearch yourself, so you won't need a provider for that.

Link to comment
Share on other sites

For my part I try to think with merchants and with the future of thirty bees:

  • ES is free now, will it be free in 5 years?
  • What if this API changes, who does the migration?
  • Merchants don't care about the technology used behind the scenes, they want a well working search function.
  • For merchants, a requirement to hook up with another service just to get basic functionality is a burden and entry barrier. How can this be avoided?
  • Per the provided feature list, ES search requires a fallback search engine. Which means duplicate code, more code maintenance work.
  • If it's just about great search algorithms, what makes ES better than an on-site Google search?
  • Why are such on-site Google searches used so rarely, despite being free, available for many years and backed with powerful algorithms?
  • How well does the current engine work with state-of-the-art Ajax callbacks?
  • What about privacy?
  • What about other search engines, like Brad, what makes ES better?
  • Can ES searches be integrated into the page at page load time? With a native engine one can prepare search results even before sending the page.
  • Can ES search results be reported back to the next page request? Like featuring products similar to the ones a user searched for before. Like showing a "you recently searched for ..." selection.
  • Can page rendering take advantage of recent search results, like adjusting prices for often searched products or highlighting products which that particular user has searched before?

That's all stuff which should probably find reasonably founded answers and weighting before jumping to conclusions on what works best for merchants and thirty bees.

P.S.: what I mean here, if a concerted effort for a search engine/module happens, it's a good idea to not build a this generation search field, but to prepare a next generation search field.

Link to comment
Share on other sites

@Traumflug I think perhaps you make ES into something it isn't. From a license, deployment and technological point of view it's basically comparable to MySQL. Meaning you could more or less take all your questions above and replace ES with MySQL. In other words we're not talking about ES as a cloud service, but as a locally installed piece of open source software, just like MySQL.

Link to comment
Share on other sites

@roband7 said in Indiegogo ElasticSearch project:

@Traumflug I think perhaps you make ES into something it isn't. From a license, deployment and technological point of view it's basically comparable to MySQL. Meaning you could more or less take all your questions above and replace ES with MySQL. In other words we're not talking about ES as a cloud service, but as a locally installed piece of open source software, just like MySQL.

MySQL, Apache, Redis, PHP, Imagemagick, Linux... Even 30bz itself for that matter. These (including Elasticsearch of course) are all under strong open source licenses though so I'm not concerned.

Link to comment
Share on other sites

Okay, I'll take a crack at answering these:

ES is free now, will it be free in 5 years?

Elasticsearch is open source software licensed under the Apache license. It's based on other open source projects (like most things in the OSS movement are) and it is very unlikely that the licensing situation will change. However if the licensing situation did change the existing versions would still remain available as OSS under the Apache license and the community would fork the code and ES would live on. Much like we see with MariaDB and MySQL, except MySQL also still exists as OSS too. (Edit: Also much like we see with PS and 30bz.)

What if this API changes, who does the migration?

If the API changes or evolves with a future version of ES then 30bz can continue to use compatible versions of ES, the versions of ES that the module was originally designed to work with.

If the newer versions of ES are dramatically better than existing versions then the 30bz module can be updated to be compatible with newer versions of ES. Who does this update and how it is done can be tackled at that time. I would assume the 30bz community will be far larger by this time so it should be much less of a concern.

The same question could be asked of MySQL or any other technology that 30bz makes use of, and the answer would be the same.

Merchants don’t care about the technology used behind the scenes,

Of course, and that's the way it should be!

they want a well working search function.

ES is the best existing way to give this to merchants who need/want high performance eCommerce search but don't want to spend huge money on a hosted solution like Algolia. (And everyone with more than a few pages of products needs high performance search!)

For merchants, a requirement to hook up with another service just to get basic functionality is a burden and entry barrier. How can this be avoided?

30bz already needs multiple services just to function. No PHP? No MySQL/MariaDB? No Apache? No website.

Additionally the proposed ES solution would be a module to use on top of 30bz, replacing the standard search function on websites that use the module. The standard search function works fine for small websites, it just doesn't scale and doesn't provide an ideal user experience.

Per the provided feature list, ES search requires a fallback search engine. Which means duplicate code, more code maintenance work.

ES does not require a fallback search option but if a large site needs to reindex everything it can be a better user experience for customers if ES is disabled during this time. It's somewhat unusual to need to reindex everything, generally only changes need to be indexed and they are done as the changes happen.

If ES is disabled then the site would fall back to using the standard 30bz search.

An analogy to this would be if you need to rebuild your Redis cache or if you are doing some work on the site and need to disable caching temporarily. 30bz falls back to running with no cache until the cache is turned back on and can rebuild itself. The site will be slower but will still work.

If it’s just about great search algorithms, what makes ES better than an on-site Google search?

What makes Piwik better than using Google Analytics? The answers are pretty similar:

  1. Using ES is way faster than using onsite Google Search because ES is hosted locally, either on the same server or on a separate server in the same datacenter, depending on site owner preference and skill level. (With Cloudways you can use ES on your VPS with two mouse clicks, so that's how easy it can be to set up and use.)

  2. As with Piwik, when you use ES you control your own data and you can decide exactly how you want the system to work. Want to index descriptions? Great! Don't want to index descriptions? No problem! Want to index descriptions but give them a very low weight in the search results? Just change the weighting number and make it lower. You don't have this type of control with onsite Google Search. In fact you really have no control at all, Google just gives you search results.

Additionally using ES allows high speed faceted search. The more data you add about your items to your website the finer the control customers will have over the results they see. If customers want only blue widgets they can select Blue. If they want only widgets from a certain brand they can select that too. It's basically similar to how Amazon's search functions. You can narrow down your search results dramatically with a few clicks.

Why are such on-site Google searches used so rarely, despite being free, available for many years and backed with powerful algorithms?

Because onsite Google search looks terrible, has Google branding, is comparatively slow, and it can take quite a lot of time between adding new products and Google reindexing your site. Additionally you can't filter results easily and quickly with onsite Google search and you really have no flexibility or control over one of the most important functions on your website.

How well does the current engine work with state-of-the-art Ajax callbacks?

The problem with the current search system is that there is no search engine. It's simply making calls to the database using SQL. SQL databases are great at storing large quantities of data in stable ways but they are not optimized for high speed full text searches. They do a very poor job of it and do not scale well to larger numbers of products or larger numbers of users. Results are slow and they aren't all that accurate unless the user exactly nails the search terms.

It's like asking why do we need to use cache on the website when Apache can just serve everything directly to the visitor? Of course Redis isn't absolutely necessary for basic website operation but the site is going to work a lot better if Redis is turned on and functioning.

What about privacy?

With ES hosted on your webserver or on another server you control there are no additional concerns about privacy. You control the server, you control the service, you control exactly how your data is used.

What about other search engines, like Brad, what makes ES better?

Brad is a search module that uses ES for search. That's why it's so fast. I would expect that the existing code for Brad will be a base of inspiration for the 30bz module. Unfortunately while Brad is open source the license is not a standardized one which makes it less than ideal to directly lift code from for reuse. I think we might want to contact the author and ask them to release it under a standardized license that would give the same rights to users but be better from a legal standpoint for everyone.

Can ES searches be integrated into the page at page load time? With a native engine one can prepare search results even before sending the page.

I don't really understand this question. However to my knowledge there is nothing that can be done with the existing 30bz search system that can't be done better and faster with an ES module.

Can ES search results be reported back to the next page request? Like featuring products similar to the ones a user searched for before. Like showing a “you recently searched for …” selection.

This is a feature that would generally be provided by the module itself rather than the search engine. The module keeps track of what users have searched for and can display these results in a "you recently searched for..." section. This would work much the same way as the existing 30bz search works, perhaps even exactly the same way.

Can page rendering take advantage of recent search results, like adjusting prices for often searched products or highlighting products which that particular user has searched before?

Rendering is managed by 30bz as always. Therefore anything that can be done at the render stage with the current search could be done with Elasticsearch. Of course adding these types of features will slow down page rendering, but that isn't specific to ES or any other search engine.

Regarding highlighting, it is possible with ES to highlight search results or even to provide special search result sorts. For example if you have your own in-house brand you can make sure that those items always appear at the top of search results for that type of product. You can also highlight products that are on sale, or highlight newly arrived products. Endless flexibility.

I hope this answers your questions and provides some clarity. Feel free to ask followup questions or new questions and I'll do my best to answer. (Of course if someone else has questions, answers, corrections, or additional info please chime in!)

Edit: A bunch of small edits for minor corrections and clarifications. Should be finished editing now, 2017-06-28 03:45 UTC. Edit: And another small edit for clarification, 04:20 UTC.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...