tags:

views:

38

answers:

3

Hey,

so I am trying to match word in a wall of text and return few words before and after the match. Everything is working, but I would like to ask if there is any way to modify it so it will look for similar words. Hmm, let me show you an example:

preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravená)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

This code returns a match, but I would like it to modify it so

preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravena)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

would also return a match. Its slovak language and I tried with range of unicode characters and also with \p{Sk} (and few others) but to no avail. Maybe I just put it in the wrong place, I dont know...

Is something like this possible?

Any help is appreciated

A: 

(pripraven[áa]) or (pripravena\p{M}*) or, more likely, some combination of these approaches.

I don't know of any other, more concise, way of specifying "all Latin-1 vowels that are similar to 'a' in my current locale".

RedGrittyBrick
Yes, that would work, but it wont solve the issue with words like [čc]u[čc]oriedka, I would have to map any character that can be used like this. Maybe there is an easier solution, but still - thanks :)
realshadow
@realshadow, of course you would write a function which does the replacement for you, e.g. `preg_map_slovak('čučoriedka')`
splash
A: 

I don't know if there is a "ignore accent" switch. But you could replace your search query with something like:

$query = 'pripravená';
$query = preg_replace(
  array('=[áàâa]=i','=[óòôo]=i','=[úùûu]=i'),
  array( '[áàâa]'  , '[óòôo]'  , '[úùûu]'  ),
  $query
);
preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*('.$query.')(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

That would convert your 'pripravená' query into 'pripraven[áàâa]'.

sod
A: 

You could use strtr() to strip out the accents: See the PHP manual page for a good example - http://php.net/manual/en/function.strtr.php

$addr = strtr($addr, "äåö", "aao");

You'd still need to specify all the relevant characters, but it would be easier than using a regex to do it.

Spudley