First efficient Excel formula implementation of the Levenshtein distance function for measuring text similarity
Syntax: =\LEVENSHTEIN(text1, text2, [char_compare_function])
=LAMBDA(text1,text2,[char_compare_function],
LET(
len_1,LEN(text1),
len_2,LEN(text2),
comp_f,
IF(ISOMITTED(char_compare_function),
LAMBDA(char1,char2,
IF(char1=char2,0,1)
),
LAMBDA(a,b,
LET(
comp, char_compare_function(a,b),
IF(ISLOGICAL(comp),
IF(comp,0,1),
comp
)
)
)
),
IFS(
OR(len_1=0,len_2=0),
MAX(len_1,len_2),
comp_f(text1,text2)=0,
0,
TRUE,
TAKE(
REDUCE(
SEQUENCE(len_2+1,1,0),
HSTACK(
SEQUENCE(len_1,1),
MAKEARRAY(len_1,len_2,
LAMBDA(x,y,
-comp_f(MID(text1,x,1),MID(text2,y,1))
)
)
),
LAMBDA(acc,val,
TAKE(
VSTACK(
acc,
IF(
val>0,
val,
MIN(
INDEX(acc,1)-val,
INDEX(acc,2)+1,
TAKE(acc,-1)+1
)
)
),
-len_2-2
)
)
),
-1
)
))
)
Documentation
Calculates the Levenshtein distance (edit distance) between two texts.
Returns 0 for identical texts. For differing texts calculates minimum number of insertions, deletions and substitutions between the two texts.
Case insensitive by default.
Can receive a string comparison lambda function that should return either TRUE for identical texts and FALSE for non-matching texts, or 0 for identical texts and 1 for non-matching texts with fractions for partial matches accepted.
Example of a character comparison function that will perform a case-sensitive comparison:
LAMBDA(a,b,EXACT(a,b))
Example of a character comparison function that will give a partial match when the case does not match:
LAMBDA(a,b,IF(EXACT(a,b),0,IF(a=b,0.5,1))).
The comparison function must be capable of comparing full texts, not just individual characters.
Blog
Now, this is an analysts favorite. This is the first function I thought about when I discovered that Lambda functions can be called recursively. Only after implementing a recursive version did I realize that such an approach is hugely inefficient and simply will not work in Excel for strings of any appreciable length. I believe Sergey Grashenko of Baeldung.com called it “trivial”.
Luckily, that article was a huge help in understanding the iterative algorithm for the function and adapting it to Excel. I settled on a version of “Iterative with One Row”, but instead of passing the previous row of the matrix, I implemented a queue passed as an array in the accumulator between REDUCE iterations, to be written to with VSTACK, read from with INDEX, and kept at fixed length with TAKE(,-x).
Another twist is that instead of REDUCEing on a placeholder array of sequences, the array that REDUCE is iterating on is an array of comparisons between pairs of characters in the two strings. Those comparisons would have had to be performed anyway, by performing them ahead of time we are saving ourselves from dimensioning a useless array and from having to figure out which row and column of the array we are on within the REDUCE loop (we can’t pass an index in the accumulator as that is already taken up by the queue).
That particular lambda was a lot of fun to write, and the article on Baeldung.com was super helpful. If you are interested in understanding this function, definitely read the article first.