ansaurus

Question

extracting first letter of a String with Regex

Answer 1

+1 A:

I'm not sure you need to use the lookahead or lookbehind assertions in your regexp:

 sarnold@haig:/tmp$ cat date.pl
 #!/usr/bin/perl -w

 while(<>) {
     /^(\[\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d\])/;
     print "$1\n";
 }
 sarnold@haig:/tmp$ cat data
 [2010-01-15 06:18:10.203] [0x00001388] [SHDNT] Shutdown Count Down = 2/5
 [2010-01-15 06:18:11.203] [0x00001388] [SHDNT] Shutdown Count Down = 3/5
 sarnold@haig:/tmp$ ./date.pl data
 [2010-01-15 06:18:10.203]
 [2010-01-15 06:18:11.203]

I couldn't tell from your description if you do want the [ and ] around your date, or if you don't want them. If you don't want the square brackets, move them outside the parens:

     /^\[(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d)\]/;

sarnold@haig:/tmp$ ./date.pl data
2010-01-15 06:18:10.203
2010-01-15 06:18:11.203

Note that I've also anchored the regexp at the beginning of the line, in case the output includes a date-time thing in bracket somewhere else. Also, I over-specified the date-time compared to your example. Consider it paranoia. If you wanted to replace \d\d\d\d with \d{4} you could, but in this example I find the longer form more readable.

sarnold 2010-07-20 04:25:08

@sarnold Thanks but all am trying to do is remove the brackets around the date for several files at once. While the rest of the data is still the same. I don't want to do any coding. Only simple one regex line.

Precious 2010-07-20 05:01:40

Answer 2

+2 A:

keep it simple. There's no need to use regular expression. If the date/time part is all you want, then use fields and field delimiters. here's an awk expression. Just print out the first column (closing square bracket as field delimiters.)

$ cat file
[2010-01-15 06:18:10.203] [0x00001388] [SHDNT] Shutdown Count Down = 2/5
[2010-01-15 06:18:11.203] [0x00001388] [SHDNT] Shutdown Count Down = 3/5

$ awk -F"]" '{print $1"]"}' file
[2010-01-15 06:18:10.203]
[2010-01-15 06:18:11.203]

or just print out fields 1 and 2 using spaces as delimiters

$ awk '{print $1,$2}' file
[2010-01-15 06:18:10.203]
[2010-01-15 06:18:11.203]

Update: To remove the square brackets, simply use gsub() or sub() on fields 1 and 2

$ awk '{gsub(/^\[/,"",$1);gsub(/\]$/,"",$2)}1' file
2010-01-15 06:18:10.203 [0x00001388] [SHDNT] Shutdown Count Down = 2/5
2010-01-15 06:18:11.203 [0x00001388] [SHDNT] Shutdown Count Down = 3/5

ghostdog74 2010-07-20 04:28:28

@ghosdog74 Thanks for your response. but i dont want to print out the date. i just want to remove the square brackets around the date [2010-01-15 06:18:10.203] i want it to be 2010-01-15 06:18:10.203. Because i want to view the log file in log4view and it is telling me unknown format because of the [] around the date. So i was searching the [ and ] and replacing it with empty space.

Precious 2010-07-20 04:35:01

@Precious, see my edit

ghostdog74 2010-07-20 05:58:30

Answer 3

A:

I agree with ghostdog that you should keep it simple, but you can keep it simple with regular expressions too:

^ matches the beginning of a line.
. matches any single character.
*? matches the previous thing zero or more times NON-GREEDILY, meaning it doesn't take more than it has to to make the rest of the regex match.

Put this together and you get ^.*?\] which matches from the beginning of the line to the first ] that it sees.

EDIT: Just saw your reply to ghostdog, which clarified the problem. It's still easier to match the entire date with the braces. Once you have that, just replace the entire string with itself, minus the first and last character. I don't know what language you're using, but in Python it would be something like this:

new_string = re.sub(r'^.*?\]',original_string,lambda m:m.group()[1:-1])

kerkeslager 2010-07-20 04:39:13

Thanks. But this search will highlight the whole search match. Here is what i have done which allowed me to highlight the [ in the beginning of date but it adds the digit to it. ^\[(.?[0-9]) What i want to do is highlight only the '[' while making the digit only criteria but doesn't need to be included in the result. Am i making any sense??? because look around q(?=u) prints 'q' (followed by 'u') it omits 'u' from the result.

Precious 2010-07-20 04:46:02

This is good but i am really not using any programming language just a simple text editor with a search and replace function.

Precious 2010-07-20 09:25:39

Answer 4

A:

Because your input format is so rigid take the really simple way:

$ cut -c 2-24 <<EOF
[2010-01-15 06:18:10.203] [0x00001388] [SHDNT] Shutdown Count Down = 2/5
[2010-01-15 06:18:11.203] [0x00001388] [SHDNT] Shutdown Count Down = 3/5
EOF

2010-01-15 06:18:10.203
2010-01-15 06:18:11.203

msw 2010-07-20 04:41:05

Answer 5

A:

Not entirely sure you need a regular expression here. If it's a matter of finding the first character, or determining the text within the square brackets. Perhaps I've misunderstood your question?

C# example:

LINQ:

string[] firsts = myFile.ReadAllLines().Select(f=>f[0]);

Looping with foreach:

string[] allLines = myFile.ReadAllLines();
foreach (string line in allLines)
{
    char firstChar= line[0];
    Console.WriteLine("First char: " + firstChar.ToString());

    if (firstChar = '[')
    {
       int closing = line.IndexOf(']');
       string textWithin = line.SubString(0, closingSquare-1);
       Console.WriteLine("Found this text within the square brackets: " + textWithin);
    }
}

p.campbell 2010-07-20 04:44:07

Answer 6

+2 A:

The following regular expression:

^\[([^\]]+)\]

will capture the date at the beginning of the string plus square brackets, and will put the stuff in between the square brackets into a group that can be extracted by itself.

Note that your text editor may have a slightly different syntax. Here's how this breaks down:

^ = beginning of line/string
\[, \] = literal [ and ] characters
() = signifies a group to capture
[^\]] = matches any character _except_ a close bracket
        (this keeps the match from being too greedy)
+ = one or more of the previous

EDIT: This assumes your regex facility supports groups (which most do). The easiest way to explain groups is just to show you how they work with one such engine. In the Python interpreter:

>>> import re
>>> s = '[2010-01-15 06:18:10.203] [0x00001388] [SHDNT] ...'
>>> r = re.compile(r'^\[([^\]]+)\]')
>>> m = r.search(s)

This creates a regular expression object and searches the string for the first set of text that matches it. The result is returned in a match object:

>>> m
<_sre.SRE_Match object at 0x1004d9558>

To get the entire set of text that was matched, the Python convention is to invoke group() on the match object:

>>> m.group()
'[2010-01-15 06:18:10.203]'

and to get just the stuff in parentheses, I pass the number of the group I want (in this case there's just one set of parens, so just one group):

>>> m.group(1)
'2010-01-15 06:18:10.203'

If I perform a replace instead of a search, I use the sub function. Sub takes the string I want to replace the full match by, followed by the input string, and returns the string with the replacement performed if a match was found:

>>> r.sub('spam spam spam', s)
'spam spam spam [0x00001388] [SHDNT] ...'

However, the replacement string supports escape sequences that refer to specific values of groups captured by the match. A group substitution is indicated by \N, where N is the number of the group. Hence:

>>> r.sub(r' \1 ', s)
' 2010-01-15 06:18:10.203  [0x00001388] [SHDNT] ...'

which is what you want.

Owen S. 2010-07-20 04:59:09

Okay great. it captures the whole date. So how can i only include the [ and ] in the result while making the date criteria only. in this example q(?=u) it suppose to give us 'q' as result while making sure it is followed by 'u' but not adding 'u' to the result. How can i acheive this because i only want to replace [ and ] with empty space at the end.

Precious 2010-07-20 05:07:28

Well, this will match everything including the square brackets, but extract the part within parentheses into a group that you can stick into the replacement string. So, assuming that your text editor's regex engine handles replacements like this, you can write as your replacement something like ' \1 ' (with spaces on either side of \1) to replace the whole match (brackets included) with just the date in group 1 and a space on either side.

Owen S. 2010-07-20 07:25:42

@Owen Thanks alot for your time. I like the idea its exactly what i am trying to accomplish but can you explain to me how i do this "extract the part within parentheses into a group that you can stick into the replacement string"???

Precious 2010-07-20 09:11:12

You don't do the extraction, the regex engine does when you use parens. I've expanded my answer to show you how it works in Python. Consult your editor's documentation to see if/how it can be done with your particular engine.

Owen S. 2010-07-20 15:28:40

Answer 7

A:

Ah, thanks for your additional comment in one of the answers.

In vim, I'd probably use the visual selection tool: put the cursor on the first [, type ^V, G (to get to the end of the file), then x to delete the column. Then repeat with the first ] character, ^V, G (but G will put the cursor on the wrong character -- so use l or the right-arrow-key to move over to the ]) and then type x to delete the column.

If it didn't line up perfectly in columns (perhaps the .203 could be fewer characters, say .2) then I would do this:

:%s/^\[//
:%s/\(\d\)] /\1 /

Noting of course that the second regex is much more brittle; it'll delete the first ] that is between a digit and a space on every line. Non-vim won't be so annoying about escaping ( and ).

Of course, if you're not using a vi-clone, hopefully this can translate well enough. :)

sarnold 2010-07-20 04:59:45

Thanks but i'm not using Vim.

Precious 2010-07-20 05:10:22

ansaurus

tags:

views:

answers:

extracting first letter of a String with Regex

related questions