Jump to content
thirty bees forum

Recommended Posts

Posted

Rather than adding still more to my reply above I'll make a separate post.

There do exist other search engine options besides Elasticsearch, so why do I like ES best? Well in short it's the best open source search engine, and it's the one developing the fastest.

There are three main OSS search engines, Elasticsearch, Solr, and Sphinx. ES and Solr both run on top of Apache Lucene. Sphinx is it's own separate system.

All have advantages and disadvantages but right now Elasticsearch has the best balance of features and is the fastest growing option with the most mindshare. No option is absolutely perfect of course but there are some problems with the other two main options that ES fixes:

  • Solr is very large and somewhat heavy for installation & use
  • Solr is more difficult to configure
  • Solr is backed by the Apache Foundation which is awesome but does have somewhat limited resources to push it forward. Progress happens but a bit slower.

  • Sphinx uses a proprietary protocol for queries, not JSON or REST.

  • Sphinx does not support server side scripts
  • Sphinx does not support triggers

Elasticsearch has brought a few very nice things to the table, with many of the best features of other OSS search engines and fixing some of the problems & limitations: * It's much lighter weight than Solr, both for installation and on server resources * It's very easy to configure. A standard installation will work well even without any configuration. * Offers standardized APIs for interaction (so does Solr but not Sphinx) * Is the fastest growing option at the moment * Open source license (same Apache 2 license as Solr) but being pushed forward by a business and a community so advancements are happening quite quickly

Elasticsearch has become the search engine of choice to power eCommerce sites. The most popular Magento search module is powered by Elasticsearch. The best high performance PS search module is powered by Elasticsearch, as is the only free PS high performance search module (Brad).

Many large eCommerce websites use Elasticsearch including eBay and Dell. Other large scale users that most people would recognize include Netflix, Tinder, Facebook, and Microsoft Azure.

So I don't think there is any doubt that Elasticsearch is the best choice to power the 30bz high performance search module. The only real question is how many of us will pony up a bit of cash to make it happen. I know not everyone has a budget to contribute but I hope enough of our small community can come up with something to contribute so we can move forward.

Edit: Another thing I meant to include is that Cloudways, so far the only host recommended by 30bz, provides Elasticsearch as a free service on all their VPS plans. It only takes two mouseclicks to turn it on and start using it.

  • Replies 172
  • Created
  • Last Reply

Top Posters In This Topic

Posted

I think there was some misunderstanding about what ES is and what was being proposed for this project. I hope the recent replies have helped shed a bit more light on the subject and made things clearer.

Posted

I have just posted a new blog post about the crowdfunding. You can see it here, https://thirtybees.com/blog/crowdfunding-modules/

The reason we went with options is because we are trying to gauge how successful the idea is and also try to cover the most amount of users.

Posted

Regardless of the module chosen I will contribute some funds to the cause. My preference, of course, is for an ES-powered search module but I can certainly see the benefits of the other options as well. Thanks @lesley for taking the time to set up the poll.

Posted

Awesome, thanks. Likely what we are doing is ordering the modules. Like the winning module will be the goal, then the second place module will be a stretch goal. All depending on how the whole campaign goes.

Posted

@dynambee

Okay, I’ll take a crack at answering these:

Thanks for the effort, even if @roband7 's short note answered most of this already. I just looked up above again, there is no mention that this should be a local service. And looking at the ES site, one sees subscription plans and remote requests as code samples. Looks like a few people forgot the basics in all this buzzword euphoria :-)

Posted

Not all will be able to use it as a local service. I doubt that those on shared hosting are allowed to install it or det it installed. But then there is a possibility to use a hosted alternative. And never forget all the merchants that dont want or can manage the tech stupp. They are probably not on this forum, but they need to be tb users, otherwise tb will be a very very tiny player

Posted

Now being clear it should become a local service, I'd like to add these to the specifications sheet:

  1. Use it as a local service (well, obviously).
  2. Detect presence of this service automatically. At most, ask for entering a single address/port/name.
  3. Abstract the service connector, to make a change to the next generation search engine possible.
  4. Make it an extension to the standard search module, so merchants don't have to configure this (choose hooks, define blacklist, define fields to be indexed/searched) twice. Alternatively, provide a common configuration page for all search engines.
  5. Keep search engine choice transparent to the theme. Front office themes shouldn't have to care about which search engine is currently at work.
  6. Implement features like infinite scroll, cancel button, price slider in an engine-agnostic way. That probably means outside the module.
  7. If functionality requires additional hooks, or extension of existing hooks, or new Ajax callbacks, implement that for standard search, too. With dummy answers if not applicable. In order to keep the search engine choice away from the theme and other parts of the shop software.
Posted

I will just react on the first point. It must be up to the shopowner to freely choose what kind of service it will be. A local one run on the same server as the shop, a service run on a second server or as connected to some of the hosted ones. And perhaps the merchant sell worlwide. Then the nodes also should be closest to the customer It defently not for the tb developers to decide

Posted

Not all will be able to use it as a local service. [...] But then there is a possibility to use a hosted alternative.

Writing a module which supports both, local and remote services, sounds like a non-trivial change. For example, remote requests have to handle timeouts gracefully and requests asynchronously. With local requests one can rely on a timely answer, which is much simpler.

It should be clear whether remote requests are part of the specification or not, before that specification is finalized.

Posted

There is one already. The PS ElasticSearch comnnector kan use three nodes. locan is default then you can set 2 others. But the first one can also be changed

Posted

I forgot one point for the specification:

  1. All current search API (hooks, template variables, etc.) should continue to be supported, so current themes continue to work, too.
Posted

What happens when you install the PS connector? Searches go via ES. If you uninstall it? Search is back to PS. What is the problem with that?

Posted

@Traumflug said in Indiegogo ElasticSearch project:

Not all will be able to use it as a local service. [...] But then there is a possibility to use a hosted alternative.

Writing a module which supports both, local and remote services, sounds like a non-trivial change. For example, remote requests have to handle timeouts gracefully and requests asynchronously. With local requests one can rely on a timely answer, which is much simpler.

It should be clear whether remote requests are part of the specification or not, before that specification is finalized.

For someone who seemed to have no idea what Elasticsearch was 24 hours ago and who even seemed confused about how open source licensing works, you sure seem to have a lot of "ideas" today about exactly how everything related to this module should be done.

It doesn't matter if ES is on the local server or a remote server it is still accessed through the API via IP & port. As such timeouts are possible even on a local server. If the server is doing heavy indexing, has run out of memory, has crashed, or for some reason was just shut down then local timeouts will happen. As such there is fundamentally no difference between a local instance and a remote instance from the point of view of the module. Obviously the further the ES server is from the webserver the slower search results will be but if the two servers are relatively close geographically it will be fine.

For me personally I plan to run five to ten 30bz sites on one VPS and then use a shared ES instance on a separate VPS in the same datacenter. Likewise I will run Piwik+Redis (for Piwik) on a separate server in the same datacenter. VPS servers are so cheap now it just makes sense to split things up a little. However in many cases it would be no problem at all to run ES on the same server as the webserver & db are running on.

I just looked up above again, there is no mention that this should be a local service. And looking at the ES site, one sees subscription plans and remote requests as code samples. Looks like a few people forgot the basics in all this buzzword euphoria :-)

This thread is a continuation of the "Let's talk about Search!" thread which you also participated in. Running ES on the same server as the web & db servers was discussed at length in that thread, as was running ES on a separate server in the same datacenter. Using ES instead of Algolia because ES is free and Algolia is expensive was also discussed. There should be no secrets or surprises here for anyone who has been following along.

Posted

who even seemed confused about how open source licensing works

Me? lol I don't think there are many people on this planet who have studied all the flavors of "open source" more than me. I'm doing this for some 30 years now and have participated in more projects than I can count.

you sure seem to have a lot of “ideas” today about exactly how everything related to this module should be done.

Ah. Pointing out flaws in the above specification upsets you. Good to know.

It doesn’t matter if ES is on the local server or a remote server it is still accessed through the API via IP & port. As such timeouts are possible even on a local server.

If this is your assumption, please put it into the specification. MySQL is a local server, too, and queries to/from there are expected to be fast and reliable. If they're not, an exception happens, which means a blank page in production mode or this new encrypted error message. Handling such stuff more gracefully needs more code and if this isn't part of the specification, it won't happen.

Posted

But Markus, why cant you admit that there is modules out there, for many different open source e-commerce patforma that already do this, and with success. Why should it be harder for tb?

Posted

Traumflug points are just logical. If you don't ask for A in the specifications, A won't be there in the module. He's not saying if it's easy or hard.

Posted

@Traumflug said in Indiegogo ElasticSearch project:

who even seemed confused about how open source licensing works

Me? lol I don't think there are many people on this planet who have studied all the flavors of "open source" more than me. I'm doing this for some 30 years now and have participated in more projects than I can count.

You specifically asked if an open source project would be free in 5 years. That's a mighty strange question for anyone to ask who has even a basic understanding of how open source works.

you sure seem to have a lot of “ideas” today about exactly how everything related to this module should be done.

Ah. Pointing out flaws in the above specification upsets you. Good to know.

You aren't pointing out flaws, you're adding requests, some of which make little sense. If you want to make requests that's fine but it would be nice if you learned a little about how ES works before you post random things. Even learning the very basics would be good.

It doesn’t matter if ES is on the local server or a remote server it is still accessed through the API via IP & port. As such timeouts are possible even on a local server.

If this is your assumption, please put it into the specification. MySQL is a local server, too, and queries to/from there are expected to be fast and reliable. If they're not, an exception happens, which means a blank page in production mode or this new encrypted error message. Handling such stuff more gracefully needs more code and if this isn't part of the specification, it won't happen.

Right, and when websites get too busy for their hardware the website stops responding. This will be no different for ES than it is currently for MySQL or even for Apache in extreme situations. It's nothing unusual or unexpected.

Posted

@moy2010 said in Indiegogo ElasticSearch project:

Traumflug points are just logical. If you don't ask for A in the specifications, A won't be there in the module. He's not saying if it's easy or hard.

The problem is @Traumflug writes like he has an authoritative knowledge on the subject of Elasticsearch when it's clear he has little idea what he's talking about or asking for. When this is pointed out he claims that things that have been clearly explained (like ES being free, being able to run on local servers, being able to run on remote servers...) haven't been explained and that people (everyone else in this thread) should do a better job of explaining things instead of getting caught up in "buzzword euphoria". The man seems incapable of admitting error.

Regarding his requests:

@Traumflug said in Indiegogo ElasticSearch project:

Now being clear it should become a local service, I'd like to add these to the specifications sheet:

  1. Use it as a local service (well, obviously).

Not necessary, and actually a bad idea that would make the module inflexible. Also directly conflicts with the need for high performance search on multi-node clusters for a very large site, as listed in the specs already.

  1. Detect presence of this service automatically.

Not possible as the service could be anywhere. Local, remote, multi-node cluster.

At most, ask for entering a single address/port/name.

All that is necessary to use ES is to enter the IP address(es) and port(s) of the ES node(s) the module is to use. ES has no authentication requirements and will accept queries from any IP that is granted access to it.

  1. Abstract the service connector, to make a change to the next generation search engine possible.

This would add an extra level of overhead that would slow down performance. It's an Elasticsearch module, let it work with Elasticsearch. The whole idea is to be no-holds-barred fast.

  1. Make it an extension to the standard search module, so merchants don't have to configure this (choose hooks, define blacklist, define fields to be indexed/searched) twice. Alternatively, provide a common configuration page for all search engines.

This is a search module, now the suggestions are to modify the way 30bz itself works? That seems very over the top to me. Make it a good and fast search module for those who want fast & effective search.

  1. Keep search engine choice transparent to the theme. Front office themes shouldn't have to care about which search engine is currently at work.

This is unlikely to be possible because ES can do so much more than standard search can. Doing this would either cripple the ES module or would require modifying 30bz itself extensively.

  1. Implement features like infinite scroll, cancel button, price slider in an engine-agnostic way. That probably means outside the module.

This is a good idea but again would require considerable modifications to 30bz to work. Perhaps in a later version of 30bz this could be added.

  1. If functionality requires additional hooks, or extension of existing hooks, or new Ajax callbacks, implement that for standard search, too. With dummy answers if not applicable. In order to keep the search engine choice away from the theme and other parts of the shop software.

Again, requires modifying the way 30bz itself works. Extensive modifications like this would probably result in breaks in compatibility with PS 1.6. Maybe this could be considered for a later version of 30bz but I'm not sure it would be worth it. Standard search is slow and quite resource intensive already, those who want more features and better search can use....the ES module.

/rant off

Posted

He had a wrong idea, indeed, but he has already made that clear.

His responses, for me, are from someone trying to get a clear idea of what ES is and how does it work, not from someone who knows everything and comes to share his wisdom. Just try to relax and realise that he has already accepted that he had a wrong notion about ES.

Yes, an ideal discussion is where everyone has the same knowledge or notion of concepts, but that's simply impossible. Just point to what you think is wrong about his assumptions, just as he does with everybody else's. This, at the end, will help with the development of the module because, as himself stated:

Don't get anything for granted when dealing with software development. If we don't specify what we need for the ES TB's module, MDekker won't guess it for us.

Posted

@dynambee

The problem is @Traumflug writes like he has an authoritative knowledge on the subject of Elasticsearch

As far as I understand matters, thirty bees isn't about demo'ing ElasticSearch, it's about good experience for merchants and even more for their customers. Nota bene for shops having this service available and for shops having it not. Just waiving "Yay, we have an ES module!" isn't sufficient to achieve that.

Discussing and refining a spec sheet not only expresses user/merchant/donator expectations, it also helps aligning these expectations with the developer's ideas and help him laying out a plan on how to write the code. What's possible to do from within a module, what requires touching general code, what's better done in a slightly different way, what requires additional budget, what can't be done at all. Right now this sheet reads much like "we barely know what we want, but we want everything". A nice recipe for disappointment.

Looking at the current discussion style, such specification refinement is apparently no longer possible, so I'll stop discussing here. Good luck!

Posted

@Traumflug said in Indiegogo ElasticSearch project:

Looking at the current discussion style, such specification refinement is apparently no longer possible, so I'll stop discussing here. Good luck!

Please don't do it. We need you here as well. Cause we need people who ask (critical) questions. This thread will help, to make a better module. We really should define, what the coder has to do and what not. I appreciate the way, you see merchants. Unfortunately not all coders think like that.

So lets calm down and go back to constructive discussion ;)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...