Jump to content
thirty bees forum

Succed a search even with spelling errors


Lathaneo

Recommended Posts

Hi,
I just ask a merge request from a implementation of the Levenshtein algorithm in the search controller.

What is Levenshtein algorithm : https://en.wikipedia.org/wiki/Levenshtein_distance

Levenshtein algorithm with PHP https://www.php.net/manual/en/function.levenshtein.php

The merge request page : https://github.com/thirtybees/thirtybees/pull/970

 

As I writed in comment of the merge request, the goal is to find some products with the search functionality. Even if you get some miss spelling in the search input.
That works perfectly with the vanilla ThirtyBees shop with default product.

For example, you search "chonney" instead of "honey", search controller will be able to find the close word, here honey, and will display honey product in the result.
Tooltip_020

Some precisions about the changes:

a. In case there are many close words, we take the first of the list.
b. The algorithm is really fast, only 0.005 - 0.010 second to check 2000 words
b. The algorithm could find some words, really far for the original (schlzcolad -> chocolat).
c. Later in a another commit, we could choose the better word following some requirements.

How to try:

  1. Just copy/past the code in controllers/front/SearchController.php . Here the code  https://github.com/thirtybees/thirtybees/pull/970/files
  2. Test it.

Regards,
Lathanao

Link to comment
Share on other sites

Here a script a play with Levenshtein :

 

<?php
header('Content-type: text/html; charset=utf-8');

include('../config/config.inc.php');
include('../init.php');

ini_set("display_errors", 1);
ini_set("track_errors", 1);
ini_set("html_errors", 1);
ini_set("realpath_cache_size", '5M');
ini_set('max_execution_time', 60000);
error_reporting(E_ALL);

$s = "chonney";

$s1 = "niche";
$s2 = "chien";

$s3 = "nicha";
$s4 = "niche";


$timeStart = microtime_float();
print("a. Distance : " . levenshtein_utf8($s1, $s2) . "\n");
$timeEnd   = microtime_float();
print($timeEnd - $timeStart . "\n\n");

$timeStart = microtime_float();
print("b. Distance : " . levenshtein($s1, $s2) . "\n");
$timeEnd   = microtime_float();
print($timeEnd - $timeStart . "\n\n");

$timeStart = microtime_float();
print("c. Distance : " . levenshtein_utf8($s3, $s4) . "\n");
$timeEnd   = microtime_float();
print($timeEnd - $timeStart . "\n\n");

$timeStart = microtime_float();
print("d. Distance : " . levenshtein($s3, $s4) . "\n");
$timeEnd   = microtime_float();
print($timeEnd - $timeStart . "\n\n");

$timeStart = microtime_float();

$words = getAllWords();

foreach ( $words as &$item ) {
    $item['leven'] = levenshtein_utf8($item['word'], $s);
    echo $item['word'] ." :: ". $item['leven'] . "\n";
}

uasort($words, function ($a, $b) {
    return $a['leven'] > $b['leven'] ? 1 : -1;
});

foreach ( $words as &$item ) {
    $item['leven'] = levenshtein_utf8($item['word'], $s);
    echo $item['word'] ." :: ". $item['leven'] . "\n";
}

$timeEnd   = microtime_float();
print($timeEnd - $timeStart);



print(var_dump($words));

print(array_shift($words)['word']);



function utf8_to_extended_ascii($str, &$map)
{
    // find all multibyte characters (cf. utf-8 encoding specs)
    $matches = array();
    if (!preg_match_all('/[\xC0-\xF7][\x80-\xBF]+/', $str, $matches))
        return $str; // plain ascii string

    // update the encoding map with the characters not already met
    foreach ($matches[0] as $mbc)
        if (!isset($map[$mbc]))
            $map[$mbc] = chr(128 + count($map));

    // finally remap non-ascii characters
    return strtr($str, $map);
}


function levenshtein_utf8($s1, $s2)
{
    $charMap = array();
    $s1 = utf8_to_extended_ascii($s1, $charMap);
    $s2 = utf8_to_extended_ascii($s2, $charMap);

    return levenshtein($s1, $s2);
}

function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}


function getAllWords()
{
    $sql = "SELECT `word` FROM `tb_search_word` WHERE `id_lang` = 1;";

    $result = Db::getInstance()->executeS($sql);

    return $result;
}

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...