Jump to content
thirty bees forum

Let's talk about Search!


dynambee

Recommended Posts

Can anyone of you experts explain what 10000 records mean?

I make no claims of being an expert, but I had the same question and managed to dig up some information.

On Algolia a basic record is one item, and you can add as many attributes to that item as you like (Source) as long as the total size doesn't surpass 10KB of minified JSON (Source).

However it's not quite as easy as that. Each index you create multiplies the number of records. So if you have 10,000 products that you want customers to be able to sort by price ascending and price descending then that is two indexes which means 10,000 products + 10,000 for price asc + 10,000 for price desc, a total of 30,000 records. So each new index you add (in each direction) adds another entire set of records. (Source).

This pre-sorting is one reason why Algolia is so fast, but it does create a lot of records in a hurry.

100000 operations?

Each time a search or sort is performed it is an operation. Also when you are adding or updating records it causes indexing, and each indexing event is an operation. There will be one operation per item you add or update. (Source).

If you have 10,000 to 20,000 items and don't get too crazy with sort options then the $49/month plan should be sufficient, unless you have a very high traffic site. Multiply those numbers by 10 for the $249/month plan.

You can also share one Algolia account between multiple websites. So if you have 5 websites each with 20,000 records they should all fit into one $249/month plan.

Link to comment
Share on other sites

  • Replies 208
  • Created
  • Last Reply

Top Posters In This Topic

You can put your Elasticsearch server anywhere you want, but the further it is from your website the more latency there is going to be for those search results. You can also share one Elasticsearch server between as many databases as you like, as long as the server has the hardware to run the number of searches that are being sent its way.

For best results you would want to have the Elasticsearch server in the same datacenter as your web & database servers but having it somewhere nearby (same basic geographic region) really shouldn't be a hugely noticeable issue. This is also how Algolia works.

You can also install Elasticsearch on the same server as your web and/or database server. If you have enough cores and enough memory it should be fine but two smaller servers is probably going to be better and may even be cheaper.

Finally, about hardware, Elasticsearch performs at its best with 32GB of memory. More than 32GB can be problematic. 16GB seems to be okay for less busy installations and people do run it with 8GB. However servers with very small amounts of memory (<8GB) seem to have problems with running out of memory and/or crashing. Elasticsearch can be run in a cluster if more than 32GB is needed but somehow I don't think any of us are going to have a need for that.

Edit: 32GB is the amount ES ideally wants to use. The best overall total for the machine seems to be 64GB, ~30GB for ES and the remainder for the OS & a very big disk cache. It also seems to be very important to make sure the correct I/O scheduler settings are used for SSD vs HDD. Lots of good information here, directly from Elastic.

Link to comment
Share on other sites

Yes, you could probably have a shop anywhere in western Europe use that server without performance problems. I suspect it would happily support quite a few shops, but I don't have enough experience to say how many or how many records.

I'm still in the very early days of learning about search. I had hoped that bradsearch would be a mostly turnkey solution that could be used with 30bz but I was seeing some very odd results. I would search for a term and completely unrelated items would be returned, items that I know don't contain that term anywhere at all.

Elasticsearch is used by some very large websites (Facebook, Wikipedia, eBay, etc) so I know it's a capable search platform. There are a bunch of potential reasons it isn't working:

  • Something is wrong with the way bradsearch is configured.
  • A problem with how Elasticsearch 2.4.5 is configured on my Cloudways server.
  • For some reason my server is corrupting the indexes (lack of memory would be my guess)
  • My sad 2GB dev box is causing it to freak out (very possible IMO)
  • There is an incompatibility between brad and 30bz (I doubt this as the db design is very similar to PS 1.6.1.x)
  • I've screwed something else up along the way

I suspect it's a memory issue. I might set up a couple of 16GB VULTR boxes for a month in Tokyo and put ES on both of them in a two-node cluster config and see how that goes. My ultimate goal is to have one ES setup per region that I have websites and to serve 10~20 sites from each ES box. I'm hoping this can be done with a single 16GB ES server but that may be wishful thinking. I'm a long, long way from that though!

Link to comment
Share on other sites

They seem to work on bradsearch in bursts. A few months ago they did a fairly major upgrade and it now support ES 5 as well as ES 2.4.

I suspect the problems I am having with brad are of my own making, mostly a very under-spec'd server. Less than 8GB of memory is not recommended on a dedicated ES VPS, my dev VPS has only 2GB and I'm running apache, nginx, redis, mysql, etc. Adding ES to that mix seems to be asking for trouble! That said, the documentation for bradsearch is pretty thin and I really don't understand wtf I am doing yet. Always happy to learn and search is something that has intrigued me for a long time but I'm not sure how long it will take me to get things straight.

Link to comment
Share on other sites

I think for basic "type things in and get results" search that would be fine.

Ideally eventually it would be great to have a way to use more things like bradsearch does where results can be filtered based on price, color, size, etc, whatever is useful for the site.

The biggest issue is that everything still needs to be sent to elasticsearch to be indexed. It works similar to how Algolia works where all data is sent to ES via JSON to be indexed. Queries are also sent via JSON, and results come back as JSON. Then the results need to be pulled out of the JSON and displayed to the user. Reindexes can be scheduled as cron jobs.

I'm going to try contacting Invertus to see if they have any idea why things didn't work with my development system. I just had a look at the brad page and my post was tagged as spam by Disqus, not sure why! I clicked on the "this is not spam!" button and hopefully they clear it and post it soon.

Link to comment
Share on other sites

It worked really well for me too, until I tried to index ~1100 items with a 2GB server. I really think it is probably the underpowered server causing the problems.

Did you do anything in particular to set it up or just let it rip?

Link to comment
Share on other sites

@mdekker said in Let's talk about Search!:

Would you guys like to help with coding?

I cannot express how much I wish I could code in PHP so that I could help with this.

I'm a business automation guy and most of my work today is in C#, VBA, and VBdotNET. In a past miserable life I did some SAP work in ABAP and many, many years ago I did work on mainframes and minicomputers. (IBM and DEC VAX, mostly COBOL but a little PL/I). I can basically read PHP but I wouldn't trust myself to actually write any code. :(

Edit: System wanted to turn vb . net into a link so had to edit.

Link to comment
Share on other sites

Domino's closes here at midnight, 1am in some places. McD's tends to be 24 hours, and of course there are places like Yoshinoya that are open 24 hours. Lots of options open but at this time of night none of them deliver.

Link to comment
Share on other sites

Osaka is a city with a lot of character. The people are much more outgoing than most Japanese, and more willing to be individuals. It's the most "asia-like" city in Japan really. It's the center of food and comedy for Japan, and the Kansai Region (Osaka, Kobe, Kyoto, Nara and some other surrounding places) pretty much was Japan until the government was moved to Tokyo in 1868. It's an interesting place to live and there's always lots of mental stimulation.

Link to comment
Share on other sites

I set up an 8GB VPS in the same datacenter as my web/db server, installed elasticsearch 5.4 on it, and set up the firewall etc so it would all work.

Then I uninstalled BRAD from my 30bz shop, followed the directions to use composer to change the BRAD module install files for ES 5+, and then reinstalled BRAD using the newly created zip file. Everything went fine, and I was able to connect to the the other ES service and reindex the 1095 items on my test site.

EDIT: Okay, I have figured a couple of things out after some more testing. Being well rested seems to help one think clearly. Whoda thought?

  1. The problem with searching for hardcore is that it actually was mentioned in the description of the items being returned. 100% my fault, should've looked at the item description in more detail.

  2. Turning off the fuzzy search option in the BRAD module dramatically improved the search results. In fact it works very, very well now. I'm not sure why fuzzy search causes so many problems on my site, maybe it is something that can be improved or tweaked or maybe it doesn't work well in ES yet. It might be because my items include words that would be otherwise uncommon in English since all products are sourced from Japan.

  3. It seams that BRAD (or ES?) puts equal weight on text found in item titles as it does on text found in item descriptions. Personally I would sooner the weight be put on the item titles and in other structured data such as manufacturer, category, attributes, and features. Being able to turn off description indexing altogether would be nice. Alternatively (and maybe even better!) would be to give the customer the option to search descriptions or not and let the shop owner decide if this option should default to "on" or "off". Perhaps these last two ideas aren't possible with ES, I'm not sure how it works internally.

I turned "fuzzy search" off on my old 2GB dev box as well and it still had problems, but maybe I was too focused on why hardcore was causing so many troubles and didn't try enough other terms.

Now I'm stoked to get 10000+ items onto a site and see how ES compares to the standard PS search! If the chart at the top of the http://bradsearch.io/ page is to believed then ~10,000 products is where the difference in speed becomes very visible.

~~Unfortunately when testing nothing has changed.~~

~~If you visit https://www.inxonline.com you can try the search yourself, in the smaller search box with the black search button. (The top search is still using the standard 30bz search and it works fine.)~~

~~Try searching for:~~ ~~1. hardcore~~ ~~This should bring up a list of fishing lines. They are the only items in the product database that contain the word hardcore. However what comes back is some fishing reels.~~

~~2. g-soul x8 line~~ ~~This should also bring up some fishing line but instead returns Casio watches. I can kind of understand the similarities between "g-shock" and "g-soul" but with the extra terms it seems like it should work?~~

~~3. casio g-shock~~ ~~This does return some Casio watches, but most of them aren't G-Shocks.~~

~~I'm sure there are other examples as well but these are the first ones I tried.~~

~~I guess there is either a problem with the way the data is being sent for index or with how the searches are being sent.~~

~~Thoughts?~~

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...