\STRIPACCENTS

Probably the most complete accent-stripping function for Excel

Syntax: =\STRIPACCENTS(text) => text
LAMBDA(text,
    LET(
        c_a, "ƼↃÁÀȦÂÄǍĂĀÃÅĄȺẤẦẮẰǠǺǞẪẴẢȀȂẨẲẠḀẬẶÆ ǼǢḂɃƁḄḆƂƄĆĊĈČÇȻḈƇƆḊĎḐĐƋƊḌḒḎÐDZ DŽ ƉÉÈĖÊËĚĔĒẼĘȨɆẾỀḖḔỄḜẺȄȆỂẸḘḚỆƎƏƐȜḞƑǴĠĜǦĞḠĢǤƓƔḢĤḦȞḨĦḤḪⱧǶ ǶⱵIÍÌİÎÏǏĬĪĨĮƗḮỈȈȊỊḬƖIJ ĴɈḰǨĶƘḲḴⱩĹĿĽⱢⱠĻȽŁḶḼḺḸLJ ḾṀṂƜŃǸṄŇÑŅƝȠṆṊṈNJ ŊÓÒȮÔÖǑŎŌÕǪŐỐỒƟØṒṐȰȪỖṎǾȬǬỎȌȎƠỔỌỚỜỠỘƢ ỞỢŒ Ȣ ṔṖⱣƤɊƦ ƦŔṘŘŖɌⱤȐȒṚṞṜ" & "ŚṠŜŠŞṦṢȘṨƩƧṪŤŢƬṬƮȚṰṮȾÞ ŦÚÙÛÜǓŬŪŨŮŲŰɄǗǛǙǕỦȔȖƯỤṲỨỪṶṴỮƱỬỰṼṾƲɅẂẀẆŴẄẈǷẊẌÝỲẎŶŸȲỸɎỶƳỴŹŻẐŽƵȤẒẔⱫƷǮƸɁΆΒΈΉΘΊΪΪΚΌΠΡΎΫΫϒϓϔΦΏӒӐЃҒӺҐҔӶЀЁӖҼҾӘӚӜӁҖӞҘЍӤЙӢҊЇҞӃҜҠЌҚԒҢҤӇӦӨҨӪҦҪҬӰЎӮӲҮҰӼӾҲҶҸӴӋӸҌӬѶҎӅӉӍӔ Ӡ",
        c_na, "5CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEÆÆBBBBBBBCCCCCCCCODDDDDDDDDDDZDZDEEEEEEEEEEEEEEEEEEEEEEEEEEEEEYFFGGGGGGGGGGHHHHHHHHHHVHHIIIIIIIIIIIIIIIIIIIIJJJKKKKKKKLLLLLLLLLLLLLJMMMMNNNNNNNNNNNNJNOOOOOOOOOOOOOOOOOOOOOØOOOOOOOOOOOOOIOOOEOUPPPPQYRRRRRRRRRRRRR" & "SSSSSSSSSSSTTTTTTTTTTTHTUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUVVVVWWWWWWPXXYYYYYYYYYYYZZZZZZZZZZƷZSΑΒΕΗΘΙΙΙΚΟΠΡΥΥΥΥΥΥΦΩААГГГГГГЕЕЕЧЧEӘЖЖЖЗЗИИИИИІККККККЛНННОOХӨПСТУУУУUUХХХЧЧЧЧЫЬЭѴРЛНМAEZ",
        nc_a, "ԤԨԮꚘ ꞏꞒꞕꞖꞘꞚ Ꞝ Ꞟ ꞫꞬꞭꞮꞯꞰꞱꞲꞳꞸꞺꟂꟅꟆꟇꟋꟓ ꟕꬺꬻꭅꭐ ꭣ ꭦ ꭧ ʹ;`΅῭῁´ǃ῎῍῏῞῝῟ǀǁǂ·ƻᴀᶏẚ ɐɑɒⱰᶐꜲ Ꜵ Ꜷ Ꜹ Ꜽ ᴥʙᵬᴃᶀᴄꜾᴐᶗɕʗᴅᵭᶁᶑᴆʤ ƍꝹȸ ʣ ʥ ȡꝱẟᴇᶒⱸⱻɚɘᶕᶓᴈɜᶔɝɞʚꜰꟻꝻᵮᶂʩ ɢᵷꝾꞠʛᶃɤɡᵹʜɦẖɥꞍʮʯꜦɧɪꟾᴉᵻᶖᵼᴊȷǰɟʄʝᴋĸꞢᶄʞꝀꝂꝄʟᴌƛᶅɭꝆꝈꞀɬɮ Ỻ ʪ ʫ ȴꝲᴍᵯɱᶆꝳɰꟽᴟɴᴎᵰꞤʼnᶇɳȵꝴᴏᴑᴒᴓꝌⱺꝊɶ ɷᴕ ᴖᴗꝎ ᴔ ᴘᵱᶈꝐꝒꝔꟼⱷ ɸ ʠꝖꝘȹ ᴙᵲᵳᶉɹᴚɺɻ" & "ɼɾɿʁꞂꝚⱹꝵꝶꝜꜱᵴʂᶊȿᶋᶘʆʅƪßᴤſẛẜẝꞄᴛẗᵵƫꞆʇʧ ʨ Ꜩ ꝷȶᵺ ʦ ᴜᵾᶙᵿᴝᴞᵫ ᴠᶌⱴꝞⱱỼꝠ ᴡẘⱲʍᶍʏẙỾʎᴢᵶᶎʐɀʑꝢᴣᶚƺʓƾʕʖʡʢꜪꜬꜮꞋʘʬʭꟿꝨꝪ Ꝭ ꝮꝸꞎꞐꞦꞨꟺὰᾰᾱᾷἀἁἄἂἃᾆᾇᾶᾳᾴᾲἆἇᾀᾁᾄᾂᾃϵὲἐἑἔἒἓὴῇἠἡἤἢἣᾖᾗῆῃῄῂἦἧᾐᾑᾔᾒᾓιὶῐῑῒἰἱἴἲἳῖῗἶἷὸὀὁὄὂὃῤῥϲὺῠῡῢὐὑὔὒὓῦῧὖὗὼῷὠὡὤὢὣᾦᾧ",
        nc_na, "ПНЛOO.CHBFAEOEUEEgLIQKTJXUAWSZDythpmnruiuodztsʹ;`¨¨¨´!᾿᾿᾿῾῾῾III·2AaaʾaaaDaAAAOAUAVAYaBbBbCCOocCDdddDdzdDdbdzdzdddEeeEeeeEEeEeeEFFFfffnGgGGGgyggHhhhHhhHhIIIIIIJjjJsJKkKkKKKKLLlllLLLllzlLlslzllMmmmmmMmNNnNnnnnnOOOOOoOOEoouooOOoePppPPPPphphQQQqpRrrrrRrr" & "rrrRRRrrRRSssssSSsssSsssSzSTtttTttsTCTzttthtsUuuuUUUEvvvVVVVYwwWwxYyYyzzzZzZZzzzzsssss344'OwnMVetisCULNRSmααααααααααααααααααααααεεεεεεεηηηηηηηηηηηηηηηηηηηηιιιιιιιιιιιιιιοοοοοορρςυυυυυυυυυυυυυωωωωωωωωω",
        a, CONCAT(LOWER(c_a),c_a,nc_a),
        na, CONCAT(LOWER(c_na),c_na,nc_na),
    REDUCE("",SEQUENCE(LEN(text)),
        LAMBDA(acc,val,
            LET(
                l, MID(text,val,1)
				ul, UNICODE(l),
			IF(AND(ul>767,ul<880),
				acc,
				IF(l=" ",acc&l,
					LET(
						f, FIND(l,a),
					acc & IF(NOT(ISERROR(f)),MID(na,f,IF(MID(a,f+1,1)=" ",2,1)),l)
				)
				)
            )
        )
    )
)

Documentation

Will remove accents from over 850 latin, greek and cyrillic unicode characters, converting them into base, unaccented versions.
It follows the Unicode form for canonical decomposition removing the combining characters, but also extends it arbitrarily to some characters for which decomposition is not defined.
It also removes any characters from the unicode block Combining Diacritical Marks.


Blog

The sets of characters defined in the function are:

  • c_a: case, accented – accented characters with two cases,
  • nc_a: no case, accented – accented characters non-convertible to upper/lower case
  • c_na: case, not accented – non-accented versions of the c_a characters,
  • nc_na: no case, not accented – non-accented versions of the nc_a characters.

Some “interesting” characters get decomposed into two characters (ex. DZ -> DZ). Those are right-padded with a space in the accented sets
After defining the character sets, the function concatenates them into two lists of equal length. It then uses a REDUCE/SEQUENCE loop over the text length, looking up each character and appending its unaccented version, if found.
Noteworthy takeaway from this particular function is that the Excel FIND function is case-sensitive.

Now, if you want to do this differently or better, the best resource are complete Unicode CSV specs published by the Unicode consortium, in particular:
The Folder,
The File,
and The Metadata.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top