Lathaneo Posted June 15, 2019 Posted June 15, 2019 Hi, I just ask a merge request from a implementation of the Levenshtein algorithm in the search controller. What is Levenshtein algorithm : https://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein algorithm with PHP : https://www.php.net/manual/en/function.levenshtein.php The merge request page : https://github.com/thirtybees/thirtybees/pull/970 As I writed in comment of the merge request, the goal is to find some products with the search functionality. Even if you get some miss spelling in the search input. That works perfectly with the vanilla ThirtyBees shop with default product. For example, you search "chonney" instead of "honey", search controller will be able to find the close word, here honey, and will display honey product in the result. Some precisions about the changes: a. In case there are many close words, we take the first of the list. b. The algorithm is really fast, only 0.005 - 0.010 second to check 2000 words b. The algorithm could find some words, really far for the original (schlzcolad -> chocolat). c. Later in a another commit, we could choose the better word following some requirements. How to try: Just copy/past the code in controllers/front/SearchController.php . Here the code https://github.com/thirtybees/thirtybees/pull/970/files Test it. Regards, Lathanao
Lathaneo Posted June 15, 2019 Author Posted June 15, 2019 Here a script a play with Levenshtein : <?php header('Content-type: text/html; charset=utf-8'); include('../config/config.inc.php'); include('../init.php'); ini_set("display_errors", 1); ini_set("track_errors", 1); ini_set("html_errors", 1); ini_set("realpath_cache_size", '5M'); ini_set('max_execution_time', 60000); error_reporting(E_ALL); $s = "chonney"; $s1 = "niche"; $s2 = "chien"; $s3 = "nicha"; $s4 = "niche"; $timeStart = microtime_float(); print("a. Distance : " . levenshtein_utf8($s1, $s2) . "\n"); $timeEnd = microtime_float(); print($timeEnd - $timeStart . "\n\n"); $timeStart = microtime_float(); print("b. Distance : " . levenshtein($s1, $s2) . "\n"); $timeEnd = microtime_float(); print($timeEnd - $timeStart . "\n\n"); $timeStart = microtime_float(); print("c. Distance : " . levenshtein_utf8($s3, $s4) . "\n"); $timeEnd = microtime_float(); print($timeEnd - $timeStart . "\n\n"); $timeStart = microtime_float(); print("d. Distance : " . levenshtein($s3, $s4) . "\n"); $timeEnd = microtime_float(); print($timeEnd - $timeStart . "\n\n"); $timeStart = microtime_float(); $words = getAllWords(); foreach ( $words as &$item ) { $item['leven'] = levenshtein_utf8($item['word'], $s); echo $item['word'] ." :: ". $item['leven'] . "\n"; } uasort($words, function ($a, $b) { return $a['leven'] > $b['leven'] ? 1 : -1; }); foreach ( $words as &$item ) { $item['leven'] = levenshtein_utf8($item['word'], $s); echo $item['word'] ." :: ". $item['leven'] . "\n"; } $timeEnd = microtime_float(); print($timeEnd - $timeStart); print(var_dump($words)); print(array_shift($words)['word']); function utf8_to_extended_ascii($str, &$map) { // find all multibyte characters (cf. utf-8 encoding specs) $matches = array(); if (!preg_match_all('/[\xC0-\xF7][\x80-\xBF]+/', $str, $matches)) return $str; // plain ascii string // update the encoding map with the characters not already met foreach ($matches[0] as $mbc) if (!isset($map[$mbc])) $map[$mbc] = chr(128 + count($map)); // finally remap non-ascii characters return strtr($str, $map); } function levenshtein_utf8($s1, $s2) { $charMap = array(); $s1 = utf8_to_extended_ascii($s1, $charMap); $s2 = utf8_to_extended_ascii($s2, $charMap); return levenshtein($s1, $s2); } function microtime_float() { list($usec, $sec) = explode(" ", microtime()); return ((float)$usec + (float)$sec); } function getAllWords() { $sql = "SELECT `word` FROM `tb_search_word` WHERE `id_lang` = 1;"; $result = Db::getInstance()->executeS($sql); return $result; }
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now