views:

69

answers:

1

I have made a search engine and I am comparing a search string against a list of words. However I want to omit words like "How,do,i". So if the user search for "How do I find my IP?". If I have 3 other items beginning with How do I it wouldn't really be able to return a good relevancy.

right now its setup (a-z0-9)+userinput+(a-z0-9) Wanted to have a list of words to be omitted so they would not be matched. I am using javascript only please assist.

A: 

Another updated example, now using \b

<html>
    <head>
    <script type="text/javascript">
        var unwanted = new RegExp(
            "\\b("+ [
                "(how|where|what) do (I|you)",
                "some",
                "other",
                "words",
                "or phrases",
                "that",
                "is",
                "unwanted"
          ].join("|")+ ")(?=\\b)",
        "ig" // Ignore case, global
        );

        function dosearch(e){
            var search = e.search.value;
            search = search.replace(unwanted,"");

            alert(search);
            return false;
        }
        </script>
</head>
<body>
    <form method="post" action="" onsubmit="return dosearch(this)">
        search: <input type="text" name="search">
    </form>
</body>
</html>
some
this works very well but for some reason it does not match the first word.EX: "what do I do if I can not view a web page" "What" is not ignored even if its not in the list of unwanted words. It seems to skip over the first word in the phrase. Any idea why?
Roy Rideaux
@Roy Rideaux: What? In the example above "what" isn't in the list of unwanted words and shouldn't be removed. None of the words in "what do I do if I can not view a web page" match, because it looks for the phrases "how do I" or "where do I". Are you sure you add a space before and after to the search-string like I do above? That's necessary for the regexp to work, otherwise it gets complicated.
some
I have tried it in FF3.6, Opera 10, Chrome 6 and MSIE 8 and I can't replicate your problem. It works as it should.
some
I changed the regexp anyway, and added "what". The first row matches "how do I", "how do you", "where do I", "where do you", "what do I" and "what do you".
some
Search Google for "1000" most common words, store into a file or w/e. Iterate through that file via an asynchronous call, strike out the matched words; then perform your search.
Russell Dias
@Roy Rideaux: I updated the regexp to use \b for word-boundaries instead of space. Since that works even at the start and end of string, it's no longer necessary to add spaces before and after the search string.
some
You're right i left out the spaces. I would vote for you a point but i dont have enough points yet
Roy Rideaux
@Roy Rideaux: You can always accept the answer. :) The usage usage of \b (for word boundaries) have some side effects: it matches any word break, like "-". So it might be better to use space in your case.
some