views:

81

answers:

5

How do I create an array of strings from a string, eg.

"hello world" would return ["hello", "world"]. This would need to take into account punctuation marks, etc.

There's probably a great RegEx solution for this, I'm just not capable of finding it.

+1  A: 

Any reason that:

var myString:String = "hello world";

var reg:RegExp = /\W/i;

var stringAsArray:Array = myString.replace(reg, "").split(" ");

Won't work?

Justin Niessner
That doesn't strip out full stops or commas, but does strip out apostrophes. So we get "doesnt." instead of "doesn't", etc. Essentially, I'd like to take a paragraph of text and end up with an array of the words in it, minus the space and fullstops.Good effort though.
dr_tchock
@dr_tchock - Just keep working on the RegEx. \W is supposed to match all non-word characters (which would include all punctuation except for the underscore character).
Justin Niessner
RegEx blows my tiny little mind into a billion pieces. I shall try though.. thanks.
dr_tchock
+1  A: 

How about AS3's String.split?

var text:String = "hello world";
var split:Array = text.split(" "); // this will give you ["hello", "world"]
// then iterate and strip out any redundant punctuation like commas, colons and full stops
danyal
It's the stripping out the punctuation I'm interested in. I know how to do this with a rather clunky if/else - I'm looking for a more elegant solution though (enter RegExp..)
dr_tchock
A: 

This seems to do what you want:

package
{
import flash.display.Sprite

public class WordSplit extends Sprite
{
    public function WordSplit()
    {
        var inText:String = "This is a Hello World example.\nIt attempts,\
            to simulate! what splitting\" words ' using: puncuation\tand\
            invisible ; characters ^ & * yeah.";

        var regExp:RegExp = /\w+/g;
        var wordList:Array = inText.match(regExp);

        trace(wordList);
    }
}
}

If not, please provide a sample input and output specification.

James Fassett
Nearly works! Splits at an apostrophe though, unfortunately.
dr_tchock
As I said you need to provide a complete input and output specification. I can't keep guessing what you believe does and doesn't constitute a word.
James Fassett
It's pretty obvious what does and doesn't constitute a word, no guessing required. Thanks for the help though.
dr_tchock
Not as obvious as you think. You need an apostrophe now. How about hyphenated words? Do you consider currency ($100) a word? Your regular expression will become your specification.
James Fassett
You are right of course, thanks for pointing that out. For now, I don't think that'll be a problem though, hopefully.
dr_tchock
A: 

Think I've cracked it, here is the function in full:

public static function getArrayFromString(str:String):Array {
        return str.split(/\W | ' | /gi);
    }

Basically, it uses the 'not a word' condition but excludes apostrophes, is global and ignores case. Thanks to everyone who pointed me in the right direction.

dr_tchock
It's good you have something you are happy with. Like a newborn child, take a photo of this, because I think it is the last time you will see it so small. Other unwanted characters are on their way, such as the other species of apostrophe. Feel free to post the regex back here for interest's sake if it becomes particularly frightening...
danyal
A: 

I think you might want something like this:

public static function getArrayFromString(str:String):Array {
    return str.split(/[\W']+/gi);
}

Basically, you can add any characters that you want to be considered delimiters into the square brackets. Here's how the pieces work:

  1. The brackets define a set of characters.
  2. The things in the brackets are the characters in the set (with \W being "not a word")
  3. The plus sign means "one or more of the previous item"—in this case, the character set. That way, if you have something with several of the characters in a row, you won't get empty items in your array.
matthew
Thanks, I will try that out, too.
dr_tchock