views:

303

answers:

6

Hi all!!

The below are the set of log data found in text file

**********************************************************************************
**2008/04/06** 00:35:35 193111               1008                O          9448050132# 74                               
**2008/04/06** 00:35:35 193116               1009                 O          9448050132# 74                               
 **12/15/2008**   8:36AM 106  01 090788573                             00:01'23" ..06  
**10/10/2008** 14:32:32 4400 4653  00:00:56 26656            0    0           OG AL# 
 &       0000    0000                                      
N 124 00 8630    T001045 **10/16** 05:04 00:01:02 A 34439242360098
***************************************************************************************

I need to extract only date details(may be 200/04/06 or 10/16) from all of the above lines and display it in textbox.

I know how to segregate date if the data is ordered like below

***************************************************************************************
10/10/2008 14:32:32 4400 4653  00:00:56 26656            0    0           OG AL#

10/10/2008 14:33:29 4400 4653  00:00:02 26656434         0    0           OG LL#

10/10/2008 14:33:31 4400 4653  00:00:11 26656434         0    0           OG LL#
***************************************************************************************

The code for it is:

        StreamReader rr = File.OpenText("C:/1.txt");
        string input = null;
        while ((input = rr.ReadLine()) != null)
        {                
            char[] seps = { ' ' };
            string[] sd = input.Split(seps, StringSplitOptions.RemoveEmptyEntries);

            string[] l = new string[1000];

            for (int i = 0; i < sd.Length; i++)
            {
                l[i] = sd[i];
                textBox4.AppendText(l[i] + "\r\n");

                //The date is 10 characters in length. ex:06/08/2008
                if (l[i].Length == 10)                    
                textBox1.AppendText(l[i]+"\r\n");

                //The time is of 8 characters in length. ex:00:04:09
                if (l[i].Length == 8)
                textBox2.AppendText(l[i] + "\r\n");

                //The phone is of 11 characters in length. ex:9480455302#
                if (l[i].Length == 11)
                textBox3.AppendText(l[i] + "\r\n");                    
            }                
         }

Can you please help me with this!!!!

A: 

IT seems the dates have a / in them, you could use that to get an index and then go back till you hit the start of the line or a space and go forward until you hit a space.

pseudo code:

get position of first / in the line

index = position

startpos, endpos;

while index != 0

while char[index] != ' '

index-- // do this till you are at the start of the date (i.e. start of line of space in front of date //index found? startpos = index

index = position while char[index] != ' ' index++ // do this till you are at the space after date

//index found?

endpos = index

date = substring(startpos, endpos - startpos)

P.S. I suck at RegEx...

Colin
+4  A: 

they best option in this context is to use Regular Expression which are more accurate and won't require any sort of formating... a general Regex would be "[0-9]{2}[/]{1}[0-9]{2}[/]{1}[0-9]{4}" you can tweak it to fit your needs, in matches you can find the match value which is the exact date.. i happen to see a good regex evaluator built in silverlight http://regexhero.net/

Usman Masood
\d{2,4}/\d{2}/\d{2,4} is matching correctly
idursun
Its not correct. It will also match if date or year has 3 digits. Besides it won't match 10/16
Rashmi Pandit
A: 
Regex is the best choice if you consider to an iterative approach

 while ((input = rr.ReadLine()) != null)
{
   foreach(var item in input.Split(' ') )
{
    if(item.Contains("/"))
            textBox4.AppendText( item + "\r\n");

}



  }
Rony
+2  A: 

I tried regex in console app with the text you provided. This works:

        Regex reg = new Regex(@"\d{4}/\d{2}/\d{2}|\d{2}/\d{2}/\d{4}|\d{2}/\d{2}");

        string str = @"2008/04/06 00:35:35 193111 1008 O 9448050132# 74
           2008/04/06 00:35:35 193116 1009 O 9448050132# 74
           12/15/2008 8:36AM 106 01 090788573 00:01'23' ..06
           10/10/2008 14:32:32 4400 4653 00:00:56 26656 0 0 OG AL# & 0000 0000
           N 124 00 8630 T001045 10/16 05:04 00:01:02 A 34439242360098";

        MatchCollection mc = reg.Matches(str);

        foreach (Match m in mc)
        {
            Console.WriteLine(m.Value);
        }

I think you can read lines one by one and get matches from each line and keep them in some list or array to use later.

TheVillageIdiot
This won't match 12/15/2008
Rashmi Pandit
thanks @rashmi for pointing this. I've updated the regex string!
TheVillageIdiot
+1  A: 

There are a few oddities in your code. Most notably, the following line inside the while loop:

string[] l = new string[1000];

This will create a 1000-element string array for each round in the while loop. Later, you will use only element i in that array, leaving the 999 other elements unused. Judging from the rest of the code, you could just as well simply use sd[i].

Also, I am guessing that textBox1, textBox2 and textBox3 should never contain the same value; if a value goes into one of them, it should never go into another one of them (except textBox4 that seem to collect all data). Then there is also no need to keep testing the value, once the correct textbox is found.

Finally the following line inside the while loop:

char[] seps = { ' ' };

This will create an identical character array for each round in the while loop; you can move that outside the loop and just reuse the same array.

As for the date detection; from the data that you present, the date is the only data that contains a / character, so you could test for that rather than the length.

You can try the following:

StreamReader rr = File.OpenText("C:/1.txt");
string input = null;
char[] seps = { ' ' };
while ((input = rr.ReadLine()) != null)
{    
    string[] sd = input.Split(seps, StringSplitOptions.RemoveEmptyEntries);
    for (int i = 0; i < sd.Length; i++)
    {
        textBox4.AppendText(sd[i] + "\r\n");

        if (sd[i].Contains("/"))
        {
            // The string contains a / character; assume it is a date
            textBox1.AppendText(sd[i] + "\r\n");
        }
        else if (sd[i].Length == 8)
        {
            //The time is of 8 characters in length. ex:00:04:09
            textBox2.AppendText(sd[i] + "\r\n");
        }
        else if (sd[i].Length == 11)
        {
            //The phone is of 11 characters in length. ex:9480455302#
            textBox3.AppendText(sd[i] + "\r\n");
        }
    }                
 }
Fredrik Mörk
Thanks Fredrik for your suggestion. I have used your method.
Srikanth V M
Dude! Thanks for your answer. It was superb! You spotted my query correctly! Hats off to you!!!!Have a nice time!
Srikanth V M
A: 

You should use regex for finding the dates in your log file. The correct regex for your date formats is:

@"(\d{2}|\d{4}){1}/\d{2}(/\d{2}|\d{4})*"

as this will handle dd/mm/yyyy or yyyy/mm or dd/mm etc

This is the C# code you can use:

Calling function:

private static void RegexGetDates()
{
    string fileText = File.ReadAllText("..\\..\\Data\\RegexSample2.txt");

    List<string> matchesList = MyRegEx.GetMatchedDates(fileText);
    foreach (string s in matchesList)
        Console.WriteLine(s);
}

Function to get dates from the input string:

/// <returns>Returns all dates in logString as List<string><returns>
public static List<string> GetMatchedDates(String logString)
{
    List<string> dateList = new List<string>();
    Regex r;
    // Matches all the data between the quotes inside var matches
    r = new Regex(@"(\d{2}|\d{4}){1}/\d{2}(/\d{2}|\d{4})*", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
    for (Match m = r.Match(logString); m.Success; m = m.NextMatch())
    {
        dateList.Add(m.Value);
    }

    return dateList;
}
Rashmi Pandit