views:

851

answers:

3

How would I go on about changing éëíïñÑ (etc) to their counterparts? ie, eeiinN.

I was thinking about doing regex matching against é -> é and replacing both & and acute/grave; with empty strings, but I can't seem to find an AS3 function that encodes accents to their non-numerical entities (ê and the like). I've already tried using an associative array, a la entities["À"] = "A";, but AS3 seems to dislike the unicode keys.

Any suggestions would be greatly appreciated.

Thanks!

+3  A: 

This is called "unicode decomposition", so you may want to Google for that. However, if you are dealing with languages other than your own, don't do this.

I know the idea seems reasonable to native English speakers who know no other languages, but to people for whom those characters are letters it makes as much sense as replacing "W" with "VV", "d" with "cl" and "Q" with "O," would to an English speaker.

P.S. Since you asked:

You could loop through the string doing charCodeAt() and do your associative array on the integers. But I still don't recommend it.

"Doña" means "lady" but "dona" means "doughnut". "De" means "from" and "dé" means "give". And so on and so forth.

They aren't just normal letters with annoying flyspecks, they are actually as distinct as "E" and "F" or "P" and "R".

MarkusQ
Yeah, I'm only using it for Spanish accents. Would you happen to have some code for this?
hb
I'm well aware of that; however, some government ID algorithms depend on Á being A and the like. I'm actually working around it using escape() and keeping a database of the escaped sequences - hackish, but it works.
hb
+2  A: 

Here's some code I adapted from http://www.actionscript.org/forums/showthread.php3?p=900420, filling in all the Latin-1 characters that were missing from that sample, and getting things into Unicode order:

 /**
  * Helper arrays for unicode decomposition
  */
 private static var pattern:Array = new Array(29);
 pattern[0] = new RegExp("Š", "g");
 pattern[1] = new RegExp("Œ", "g");
 pattern[2] = new RegExp("Ž", "g");
 pattern[3] = new RegExp("š", "g");
 pattern[4] = new RegExp("œ", "g");
 pattern[5] = new RegExp("ž", "g");
 pattern[6] = new RegExp("[ÀÁÂÃÄÅ]","g");
 pattern[7] = new RegExp("Æ","g");
 pattern[8] = new RegExp("Ç","g");
 pattern[9] = new RegExp("[ÈÉÊË]","g");
 pattern[10] = new RegExp("[ÌÍÎÏ]", "g");
 pattern[11] = new RegExp("Ð", "g");
 pattern[12] = new RegExp("Ñ","g");
 pattern[13] = new RegExp("[ÒÓÔÕÖØ]","g");
 pattern[14] = new RegExp("[ÙÚÛÜ]","g");
 pattern[15] = new RegExp("[ŸÝ]", "g");
 pattern[16] = new RegExp("Þ", "g");
 pattern[17] = new RegExp("ß", "g");
 pattern[18] = new RegExp("[àáâãäå]","g");  
 pattern[19] = new RegExp("æ","g");
 pattern[20] = new RegExp("ç","g");
 pattern[21] = new RegExp("[èéêë]","g");
 pattern[22] = new RegExp("[ìíîï]","g");
 pattern[23] = new RegExp("ð", "g");
 pattern[24] = new RegExp("ñ","g");
 pattern[25] = new RegExp("[òóôõöø]","g");
 pattern[26] = new RegExp("[ùúûü]","g");
 pattern[27] = new RegExp("[ýÿ]","g");
 pattern[28] = new RegExp("þ", "g");

 private static var patternReplace:Array = [
  "S",
  "Oe",
  "Z",
  "s",
  "oe",
  "z",
  "A",
  "Ae",
  "C",
  "E",
  "I",
  "D",
  "N",
  "O",
  "U",
  "Y",
  "Th",
  "ss",
  "a",
  "ae",
  "c",
  "e",
  "i",
  "d",
  "n",
  "o",
  "u",
  "y",
  "th"];

 /**
  * Returns the Unicode decomposition of a given run of accented text. 
  * @param value The original string
  * @return The string without accents
  */  
 private static function decomposeUnicode(str:String):String
 {
  for (var i:int = 0; i < pattern.length; i++)
  {
   str = str.replace(pattern[i], patternReplace[i]);
  }
  return str;
 }
A: 
    private var sdiakA:Array;
    private var bdiakA:Array;
    private function initReplaceDiacritic(){
        var sdiak = "áäčďéěíĺľňóôöŕšťúůüýřžÁÄČĎÉĚÍĹĽŇÓÔÖŔŠŤÚŮÜÝŘŽ";
        var bdiak = "aacdeeillnooorstuuuyrzAACDEEILLNOOORSTUUUYRZ";
        sdiakA = new Array();
        bdiakA = new Array();

        for (var i=0;i<sdiak.length;i++)
            sdiakA.push(new RegExp(sdiak.charAt(i), "g"))
        for (i=0;i<sdiak.length;i++)
            bdiakA.push(bdiak.charAt(i))
    }
    private function replaceDiacritic(string:String){               
        for (var i:int = 0; i < sdiakA.length; i++)
            string = string.replace(sdiakA[i], bdiakA[i]);
        return (string)
    }
initReplaceDiacritic();
var str = replaceDiacritic("šžřáíéééíčšřčš");