views:

241

answers:

2

How do I match the URL address in this string, I have other code that matches text and it seems to work, but when I try to use it here, it doesn't, it keeps saying there is "No such file or directory. I didn't know grep -o only worked on files?

matchString='url={"urlPath":"http://www.google.com/","thisIsOtherText"'
array=($(grep -o 'url={"urlPath":"([^"]+)"' "$matchString"))
grep: url={"urlPath":"http://www.google.com/","thisIsOtherStuff": No such file or directory

Anyway, could you please help me with matching the URL from in the "matchString" variable (it doesn't have to use grep).

Preferred output: http://www.google.com/

A: 

I am not familiar with grep, but have knowledge of regex.

You may have to add escapes with for the "

 array=($(grep -o 'url\=\{\"urlPath\"\:\"([^\"]*)\"' "$matchString"))
JerA
user:~# array=($(grep -o 'url\=\{\"urlPath\"\:\"([^\"]*)\"' "$matchString")); echo "$array"grep: Unmatched \{user:~# array=($(grep -o 'url\={\"urlPath\"\:\"([^\"]*)\"' "$matchString")); echo "$array"grep: : No such file or directory
Mint
+3  A: 

You need to echo the string through a pipe to grep:

array=($(echo "$matchString" | grep -o 'url={"urlPath":"([^"]+)"'))

Grep reads from a file or standard input. It doesn't accept a string argument to search within.

Also, grep is going to output the entire match, not the part in parentheses. You probably need to use sed.

array=($(echo "$matchString" | sed 's/url={"urlPath":"\([^"]\+\).*"/\1/'))

The sed command works like this:

  • s/// is the substitute command and its delimiters. You can use another delimiter for convenience if it makes the expression more readable or helps eliminate having to do some escapes. Between the first two delimiters is what we want to change. Between the middle one and the last one is what we want to change it to.

  • url={"urlPath":" is just the literal text we are using to help make the match

  • \( \) encloses a capture group. What falls bewteen here is what we want to snag.

  • [^"] matches any character that's not a double-quote

  • \+ match one or more of the preceding pattern. So, in this case, that's one or more characters that are not quotation marks.

  • .* match zero or more of any character. In this case, it starts at the quote after google.com/ and goes to the end of the string.

  • \1 outputs what was captured by the first (and only in this case) capture group.

Visually:

url={"urlPath":"       http://www.google.com/       ","thisIsOtherText"
-----literal----       -------non-quote------       ---any character---
url={"urlPath":"   \(  [^"]                    \)   .*
Dennis Williamson
Cheers, the sed one works.Not sure how my other code works with the grep, though I think it might be file.
Mint
Also could you please explain how the regex in there all works and the \1?
Mint
Thanks! Very detailed. I would give you two ticks if I could :)
Mint