tags:

views:

212

answers:

4

Hi guys,

I need to get the company name and it's ticker symbol in different arrays. Here is my data which is stored in a txt file:

3M Company      MMM
99 Cents Only Stores    NDN
AO Smith Corporation    AOS
Aaron's, Inc.   AAN

and so on

How would I do this using regex or some other techniques? Thanks

+1  A: 

Try this regular expression:

(.+)\s*([A-Z]{3})$

Perhaps someone with more PHP experience could flesh out a code example using preg_split or something similar.

Andrew Hare
+2  A: 

Iterate over each line, and collect the data with a regular expression:

^(.+?)\s+([A-Z]+)$

The backreference $1 will contain the company name, $2 will contain the ticker symbol.

You can also split the string in two with a two or three-space delimiter and trim the resulting two strings. This only works if you are sure the company name and ticker symbol are always separated by enough spaces, and the company name itself doesn't contain that amount of spaces.

molf
+2  A: 

Is the format of the text file imposed on you? If you have the choice, I'd suggest you don't use spaces to separate the fields in the text file. Instead, use | or $$ or something you can be assured won't appear in the content, then just split it to an array.

Flubba
A: 

Thanks for your answers, I've converted the data into the following form. Is it now simpler to create the arrays?

<name>3M Company</name> <symbol>MMM</symbol>
<name>99 Cents Only Stores</name> <symbol>NDN</symbol>
<name>AO Smith Corporation</name> <symbol>AOS</symbol>
<name>Aaron's, Inc.</name> <symbol>AAN</symbol>
<name>Aaron's, Inc.</name> <symbol>AAN.A</symbol>
<name>ABB Ltd (ADR)</name> <symbol>ABB</symbol>

</name> and <symbol> are separated with a tab.
Baha
Definitely. /^<name>(.+?)<\/name>\s*<symbol>(.+?)<\/symbol>$/imWhere $1 will contain the name, and $2 would be your ticker symbol.
Chris
If you're going to go that far, why not go a little further and make it a well-formed XML file, then use SimpleXML to read it into an object. You just need an outer containing element and an XML prolog by the look of things.
Flubba
Flubba's right, this is exactly what XML was invented for. But right now you've got the main disadvantage of XML--more than double the number of characters--without the chief advantage--the ability to use an existing library to do the heavy lifting.
Alan Moore
thanks for the comments
Baha