I'm trying to write a function in PHP that takes an array of strings (needle
) and performs a comparison against another array of strings (haystack
). The purpose of this function is to quickly deliver matching strings for an AJAX search, so it needs to be as fast as possible.
Here's some sample code to illustrate the two arrays;
$needle = array('ba','hot','resta');
$haystack = array(
'Southern Hotel',
'Grange Restaurant & Hotel',
'Austral Hotel',
'Barsmith Hotel',
'Errestas'
);
Whilst this is quite easy in itself, the aim of the comparison is to count how many of the needle
strings appear in the haystack
.
However, there are three constraints;
- The comparison is case-insensitive
- The
needle
must only match characters at the beginning of the word. For example, "hote" will match "Hotel", but "resta" will not match "Errestas". - We want to count the number of matching
needles
, not the number ofneedle
appearances. If a place is named "Hotel Hotel Hotel", we need the result to be1
not3
.
Using the above example, we'd expect the following associative array as a result:
$haystack = array(
'Southern Hotel' => 1,
'Grange Restaurant & Hotel' => 2,
'Austral Hotel' => 1,
'Barsmith Hotel' => 2,
'Erresta' => 0
);
I've been trying to implement a function to do this, using a preg_match_all()
and a regexp which looks like /(\A|\s)(ba|hot|resta)/
. Whilst this ensures we only match the beginning of words, it doesn't take into account strings which contain the same needle
twice.
I am posting to see whether someone else has a solution?