tags:

views:

80

answers:

7

I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,

extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?

  • Little modification: the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
  • guys situation demands a little more modification. The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .

things are becoming complex with sec.....i have to learn reg ex :(

thanks in advance.

+1  A: 

Here is a decent introductory tutorial for regex in JS.

David Dorward
+2  A: 

You can use the regex:

\w+:\w+

Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word

so \w+:\w+ would match a pair of words separated by a colon.

codaddict
+1  A: 

Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".

A working example:

<html>
<head>
<script type="text/javascript">
function test() {
    var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
    var text = "Java Tutorial Format:Pdf With Location:Tokyo  Javascript";

    var match = null;
    while ( (match = re.exec(text)) != null) {
        alert(match[1] + " -- " + match[2]);
    }

}
</script>
</head>
<body onload="test();">

</body>
</html>

A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp

Jaú
+1 for the complete example, but -1 for a regex that's too complicated for its own good. ;)
Alan Moore
A: 
([^:]+):(.+)

Meaning: (everything except : one or more times), :, (any character one ore more time)

You'll find good manuals on the net... Maybe it's time for you to learn...

Macmade
Does not work: take this simple input: "ab cd:ef gh" you'll match 'ab cd' and 'ef gh' instead of 'cd' and 'ef'
codaddict
This regexp is very wrong, you might make good use of the manuals too.
kemp
Didn't understand that, sorry. But the regexp works, and you just have to adjust it the following way: ([^:\s]+):([^\s]+)
Macmade
A: 

Use this snippet :

 
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all  ( '/(\w+\:\w+)/',$str ,$val ) )
 {
 print_r ( $val ) ;
 }
 else
 {
 print "Not matched \n";
 }
pavun_cool
A: 

here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python

>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
...     if ":" in i:
...         print i
...
Format:Pdf
Location:Tokyo

You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg

>>> for i in s.split():
...     if ":" in i:
...         a=i.split(":")
...         if len(a) == 2:
...             print i
...
Format:Pdf
Location:Tokyo
ghostdog74
A: 

Continuing Jaú's function with your additional requirement:

function test() {
    var words = ['Format', 'Location', 'Size'],
            text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript", 
            match = null;
    var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
    while ( (match = re.exec(text)) != null) {
        alert(match[1] + " = " + match[2]);
    }
}
streetpc