This is an excellent example of how an test-first driven approach can help you arrive at a solution. It might not be the very best one, but having tests written allow you to refactor with confidence and instantly see if you break any of the existing tests. Anyway, you could set up a few tests like:
public function setUp () {
$this->searchParser = new App_Search_Parser();
}
public function testSingleWordParsesToAllWords () {
$this->searchParser->parse('Transport');
$this->assertEquals(
$this->searchParser->getAllWords(),
array('Transport')
);
$this->assertEquals($this->searchParser->getNotWords(), array());
$this->assertEquals($this->searchParser->getAnyWords());
}
public function testParseOfCombinedSearchString () {
$query = 'energy food "olympics 2010" Terrorism ' .
'OR "government" OR cups NOT transport';
$this->searchParser->parse($query);
$this->assertEquals(
$this->searchParser->getAllWords(),
array('energy', 'food', 'olympics 2010')
);
$this->assertEquals(
$this->searchParser->getNotWords(),
array('Transport')
);
$this->assertEquals(
$this->searchParser->getAnyWords(),
array( 'terrorism', 'government', 'cups')
);
}
Other good tests would include:
testParseTwoWords
testParseTwoWordsWithOr
testParseSimpleWithNot
testParseInvalid
- Here you have to decide what invalid input looks like and how you interpret it, i.e:
- 'NOT Transport': Search for anything that doesn't contain Transport or inform the user that he has to include at least one search term too?
- 'OR energy': Is it ok to begin with a combinator?
- 'food OR NOT energy': Does this mean "search for food or anything that doesn't contain energy", or does it mean "search for food and not energy", or doesn't it mean anything? (i.e. throw exception, return false or whatnot)
testParseEmpty
Then, write the tests one by one, and write a simple solution that passes the test. Then refactor and make it right, and run again to see that you still pass the test.
Once a test passes and the code is refactored, then write the next test and repeat the procedure. Add more tests as you find special cases and refactor the code so that it passes all tests. If you break a test, back-up and re-write the code (not the test!) such that it passes.
As for how you can solve this problem, look into preg_match, strtok or rely simply loop through the string adding up tokens as you go.