ansaurus

Question

Answer 1

A:

You can repeat things by putting them in parenthesis, like this:

([^-\t]*\t){2,2}

And the full pattern to match the title would be this:

([^-\t]*\t){2,2}([^-\t]+).*

You said you tried it. I'm not sure what is different, but the above worked for me on your sample data.

Sam 2010-10-20 16:41:24

I was trying things out myself, and just used what you wrote here and it's not working for me either. I tried with plain parens as you typed it `( ... )` (not expecting it to work) and escaped parens `\\( ... \\)`, also escaping the `\+` ... my `sed --version` says `GNU sed version 4.1.5` and it's on RedHat Enterprise 5.1 [oh look, the backslash didn't show in the comment until I doubled it `\\\(`]

Stephen P 2010-10-20 16:54:23

Answer 2

+1 A:

I think you might be going about this the wrong way. If you're simply wanting to extract the name of the film, and it's release year, then you could try this regex:

(?:\t)[\w ()]+(?:\t)

As seen in place here:

http://regexr.com?2sd3a

Note that it matches a tab character at the beginning and end of the actual desired string, but doesn't include them in the matching group.

andy matthews 2010-10-20 16:52:16

It might also help to explain what you want the result to be.

andy matthews 2010-10-20 16:53:36

Cheers this works perfectly, thanks for the link, will help with debugging/learning better regex.

akd5446 2010-10-20 17:06:30

I like how concise this is and see in your link how it *matches*, but how is it used, and with what command, to *extract* the name/date from the line? I don't see using it with `sed` since it doesn't have a capturing group and replacement. I'll upvote if you add an example of using it in a command to actually produce output that lists the name(s) from a file.

Stephen P 2010-10-20 23:48:39

I'd love to give you a real life example, but I don't know sed. I'm sure this could be rewritten without using non-capturing groups but I'll have to do some research on it.

andy matthews 2010-10-23 17:20:19

Thats a Perl regular expression which `sed` doesn't understand.

Dennis Williamson 2010-10-26 00:58:48

Answer 3

A:

why are you doing things the hard way??

$ awk '{$1=$2=$NF=""}1' file
  Shutter Island (2010)

ghostdog74 2010-10-20 17:05:04

Thanks this also works. Will go and learn some more linux.

akd5446 2010-10-20 17:14:22

Answer 4

+1 A:

If this is a tab separated file with a regular format I'd use cut instead of sed

cut -d' ' -f3 films.txt

Note there's a single tab between the quotes after the -d which can be typed at the shell prompt by typing ctrl+v first, i.e. ctrl+v ctrl+i

Stephen P 2010-10-20 17:05:52

there are spaces between movie names.

ghostdog74 2010-10-20 17:10:03

Thank you this also works, would up-vote but cant yet.

akd5446 2010-10-20 17:12:11

@ghostdog : according to the OPs regex there are tabs, not spaces.

Stephen P 2010-10-20 17:20:00

Answer 5

A:

This works for me:

sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt

If your sed supports -r you can get rid of most of the escaping:

sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt

Change the first 2 to select different fields (0-3).

This will also work:

sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt

Change the 3 to select different fields (1-4).

Dennis Williamson 2010-10-26 01:20:42

ansaurus

tags:

views:

answers:

Repeating a regex pattern

related questions