\FUZZYMATCH

User-friendly, powerful fuzzy-matching function for Excel. It’s like VLOOKUP that finds the most similar text.

Syntax: =\FUZZYMATCH(lookup_value, lookup_array, [threshold], [return_distances], [max_match_count], [faster]) => text / array
=LAMBDA(lookup_value,lookup_array,[threshold],[return_distances],[max_match_count],[faster],
    LET(
        len_2,LEN(lookup_value),
        bigram, OR(ISOMITTED(faster),faster<>FALSE),
        seq, IF(bigram, MID("_" & lookup_value & "_",SEQUENCE(len_2+1),2), SEQUENCE(len_2+1,1,0)),
        seq_2, IF(bigram, SEQUENCE(len_2+1,1,1,0)),
        array_1, UNIQUE(lookup_array),
        unique_lookup_array, FILTER(array_1,array_1<>""),
        distance_array, 
            MAP(unique_lookup_array,
                IF(bigram,
                    LAMBDA(text1,
                        LET(
                            t1pad, "_"&text1&"_",
                            b, VSTACK(MID(t1pad,SEQUENCE(LEN(t1pad)-1),2),seq),
                            c, VSTACK(SEQUENCE(LEN(t1pad)-1,1,-1,0),seq_2),
                        1-REDUCE(0,UNIQUE(b),LAMBDA(acc,val,acc+ABS(SUM(FILTER(c,b=val)))))/ROWS(c)
                        )
                    ),
                    LAMBDA(text1,
                        LET(len_1,LEN(text1),
                        1-TAKE(
                            REDUCE(
                                seq,
                                HSTACK(
                                    SEQUENCE(len_1,1),
                                    MAKEARRAY(len_1,len_2,
                                        LAMBDA(x,y,
                                            if(MID(text1,x,1)=MID(lookup_value,y,1),0,-1)
                                        )
                                    )
                                ),
                                LAMBDA(acc,val,
                                    TAKE(
                                        VSTACK(
                                            acc,
                                            IF(
                                                val>0,
                                                    val,
                                                MIN(
                                                    INDEX(acc,1)-val,
                                                    INDEX(acc,2)+1,
                                                    TAKE(acc,-1)+1
                                                )
                                            )
                                        ),
                                        -len_2-2
                                    )
                                )
                            ),
                            -1
                        )/MAX(len_1,len_2)
                        )
                    )
                )
            ),
        combined_array,
            HSTACK(unique_lookup_array,distance_array),
        filtered_combined,
            IF(ISOMITTED(threshold),combined_array,
                FILTER(combined_array,distance_array>threshold)    
            ),
        trimmed_combined,
            TAKE(SORT(filtered_combined,2,-1),IF(ISOMITTED(max_match_count),1,max_match_count)),
IF(ROWS(trimmed_combined)<=0,NA(),TOROW(IF(OR(ISOMITTED(return_distances),return_distances<>TRUE),CHOOSECOLS(trimmed_combined,1),trimmed_combined)))
    )
)

Documentation

Function takes a text value and a lookup array and returns a value from the lookup array that is closest match by edit distance to the lookup value.

In default form will return a single value with just the closest match.

Optional Arguments:

  • [threshold] – Fraction between 0 and 1. When supplied, will only return matches with similarity above the threshold. Default: 0.
  • [return_distances] – TRUE/FALSE. If true, will return the similarity metric for each returned match. Default: FALSE.
  • [max_match_count] – Integer, >=1. Maximum number of results returned. Default: 1.
  • [faster] – TRUE/FALSE. If TRUE, will use bigram matching. If FALSE, will use Levenshtein distance (slower). Default: TRUE.

Matching is case-insensitive.

When [return_distances] and/or [max_match_count] are supplied with values other than default, function will return a horizontal vector (1-D array) of either {match1, match2, match3,….} when return_distances is FALSE or {match1, distance1, match2, distance2, match3, distance3, …} when return_distances is TRUE. Matches are sorted by distance, with closest one first.


Blog

The holy grail of lambdas.

Hopefully, the function is written in such a way, so as to be approachable for everyone, and useful also for more advanced use cases.

Performance is… well, OK.

TODO: Performance testing.

I’m wondering if this function would benefit from an array version. I mean – it is an array function of course, but if a use case calls for autofilling a table of, say, 30k records, there might be performance gains from running the whole thing in one pass, so to speak.

I’m not sure if Excel does any result caching, or if it will calculate the function multiple times for identical inputs on each recalculation.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top