views:

110

answers:

3

I am calling a web service and all I get back is a giant blob of text. I am left to process it myself. Problem is not all lines are necessarily the same. They each have 2 or 3 sections to them and they are similar. Here are the most common examples

text1 [text2] /text3/
text1/test3
text1[text2]/text3
text1 [text2] /text /3 here/

I am not exactly sure how to approach this problem. I am not too good at doing anything advanced as far as manipulating strings.

I was thinking using a regular expression might work, but not too sure on that either. If I can get each of these 3 sections broken up it is easier from there to do the rest. its just there doesn't seem to be any uniformity to the main 3 sections that I know how to work with.

EDIT: Thanks for mentioning i didn't actually say what I wanted to do.

Basically, I want to split these 3 sections of text into their own strings seperate stings so basically take it from one single string to an array of 3 strings.

string[0] = text1
string[1] = text2
string[2] = text3

Here is some of the text I get back from a call as an example

スルホ基 [スルホき] /(n) sulfo group/
鋭いナイフ [するどいナイフ] /(n) sharp knife/
鋭い批判 [するどいひはん] /(n) sharp criticism/
スルナーイ /(n) (See ズルナ) (obsc) surnay (Anatolian woodwind instrument) (per:)/zurna/
スルピリン /(n) sulpyrine/
スルファミン /(n) sulfamine/
剃る [そる(P);する] /(v5r,vt) to shave/(P)/

As the first line for an example I want to pull it out into an array

string[0] = スルホ基
string[0] = [スルホき]
string[0] = /(n) sulfo group/
+3  A: 

Those example seem a bit random, there has to be some kind of order, isn't there a spec for the service? If not i suggest more example so that we can understand the rules.

Paul Creasey
Alright i'll add a few lines of what I am getting back.
percent20
A: 

Read up on some of the info here on finite state machines, and see if you can use some of the concepts on your input parsing problem.

If there is some order to the groups on each line, then maybe you can use a regex to separate the groups out.

Edit: after seeing your samples, you may get by with a regex, breaking on some of those specific delimiters. It will take maybe half an hour to test theory: pick yourself up a free regex tester, make yourself a regex that will isolate out just one of those groups, and pump a few sample lines through. If it performs reliably on the real data that you have, then expand it and see if you can also isolate out the other groups.

I should mention though that your regexes will break or just become a nightmare if there is any sort of vagaries in your data (and frequently there is). So test long and hard before settling on them. If you find you start to have exceptions in your data, then you will need to choose some sort of parsing algorithm (the FSM i mentioned above is a pattern you can follow if you implement a parsing mechanism).

slugster
Wow read the first little bit of that and it looks like it might be something to see about doing.
percent20
Heh, it is a bit of an eyeful when you first see it, but you can simplify it down considerably.
slugster
A: 

The most stupid answer is "Use regex". But more information needed for better one.

Stremlenye
Why is regex stupid and what more information is needed?
percent20
For an amount of row, it is very slow to parse it by regex. Only if you get this rows like a one string.If this rows earns some standarted split symbols use string.Split() function. Some kind of hack is to write you own (de)serialization Formatter for this rows, which will deserialize text to Array based object.
Stremlenye