tags:

views:

310

answers:

6

What I want?

I want to display weather information on my page. I want to display the result in the browser specific culture.

What am I doing?

I use MSN RSS for this purpose. MSN returns the report in XML format. I parse the XML and display results.

What problem am I facing?

When displaying the report, I have to parse an XML node, <data> which will be different values in different culture.

For e.g.,

en-US: "Lo: 46°F. Hi: 67°F. Chance of precipitation: 20%"

de-DE: "Niedrig: 46°F. Höchst: 67°F. Niederschlag %: 20%"

I want to read only low, high and chance of precipitation values. i.e., I want to read 46, 67 and 20%.

Can somebody please give me a solution for this?

May be RegX or someother method is also fine with me :-)

Thanks in advance!

+2  A: 

If you only want the numbers, you can use a regular expression, for example the following:

(\d+).*?(\d+).*?(\d+%)

A quick test in PowerShell shows that it does work at least for your input data:

PS Home:\> function test ($re) {
>>   $a -match $re; $Matches
>>   $b -match $re; $Matches
>> }
>>
PS Home:\> $a = "Lo: 46°F. Hi: 67°F. Chance of precipitation: 20%"
PS Home:\> $b = "Niedrig: 46°F. Höchst: 67°F. Niederschlag %: 20%"
PS Home:\> test "(\d+).*?(\d+).*?(\d+%)"
True

Name                           Value
----                           -----
3                              20%
2                              67
1                              46
0                              46°F. Hi: 67°F. Chance of precipitation: 20%
True
3                              20%
2                              67
1                              46
0                              46°F. Höchst: 67°F. Niederschlag %: 20%

However, it won't work anymore if any locale might use numbers in the description strings.

You can add other constraints, like requiring a colon before every match:

: (\d+).*?: (\d+).*?: (\d+%)

This should deal with spurious numbers elsewhere in the string. But the best way overall would actually be to get your data from a source which gives you the data for machine reading, not for human consumption

Joey
The RegeX worked. I combined your and Tor Haugen's answers to get my problem solved. Thanks!
Vijay
A: 

use regex (but i don't know the regex formula ;) )

You can also do a forloop over the sentence, and check each char if it's a integer. Each time you encounter once, place it in a string. when finding something else than an integer, parse the string to an int and voila. Do this 3 times

PoweRoy
Certainly! I can do this. But, is that the best solution?
Vijay
I would do a regex like johannes. definately cleaner but harder to read.
PoweRoy
By the way, there's no best solution. Every solution has its pro and cons. regex pro: clean, smaller and probably faster. Regex con: harder to read, hard to master regex
PoweRoy
Ok. Let me check it out. Thanks :)
Vijay
A: 

Its quite weird you are not getting XML with values in different nodes which would make more sense to me (they you could pick which values use for different locales).

But, if you want to extract data from given strings try this or something simmilar if you are not a fan of RegEx:

string dataUS = "Lo: 46°F. Hi: 67°F. Chance of precipitation: 20%";
string dataDE = "Niedrig: 46°F. Höchst: 67°F. Niederschlag %: 20%";
string[] stringValues = dataU.Split(new string[] {": "}, 4, StringSplitOptions.None);
List<int> values = new List<int>();
for (int i = 1; i < 4; i++)
{
 StringBuilder sb = new StringBuilder();
 foreach (char c in stringValues[i].Trim())
 {
  if (Char.IsDigit(c))
  {
   sb.Append(c);
  }
  else
  {
   values.Add(Convert.ToInt32(sb.ToString()));
   break;
  }
 }
}

(im spliting on ": " instead of digits)

Mike Nowak
A: 

I suggest using Regex to get the values that you want according to UI culture language one by one : I mean you can have a Regex to get the Lo temp. "(Lo|Niedrig):(\d+)" , a regex to get Hi temp "(Hi|Höchst):(\d+)" and a regex to get chance of perception and so on. In all of the above examples you can get the number from second element of the match.

Beatles1692
You can also use non-grouping parentheses for the literal parts: `(?:Lo|Niedrig)` to avoid having groups you don't do anything with.
Joey
+2  A: 

You should consider always fetching the RSS using the same culture. That way, you'll have an easier task parsing the content. If you'll only be using the numbers, it shouldn't stop you from emitting culture-specific content to the end user.

So if you go for the en-US version, you could do it like this:

Regex re = new Regex(@"Lo: (\d+)°F. Hi: (\d+)°F. Chance of precipitation: (\d+)%");
var match = re.Match(forecast);
if (match.Success)
{
    var groups = match.Groups;
    lo = int.Parse(groups[1].Captures[0].Value);
    hi = int.Parse(groups[2].Captures[0].Value);
    prec = int.Parse(groups[3].Captures[0].Value);
}
Tor Haugen
I combined your and Johannes Rössel's answers to get my problem solved. Thanks!
Vijay
+1  A: 

The following should extract the two numbers and chance of precipitation, as well as the units that are used (for culturally dependent units).

(?<lo>\d+°.).*?(?<hi>\d+°.).*?(?<precipitation>\d+)

If you don't want units extracted, then you can use

(?<lo>\d+)°.*?(?<hi>\d+)°.*?(?<precipitation>\d+)
ICR