views:

294

answers:

5

How to encompass the first letter in every word with HTML tags?

For example:

$string = replace_first_word("some simple text",'<big>{L}</big>');
echo $string; //returns: <big>s</big>ome <big>s</big>imple <big>t</big>ext

edit: ohhh, forgot to mention one important point, it Needs to work with UTF-8 Unicode... because I'm planning to support both Russian and English with this.

+3  A: 

Standard warning: manipulating HTML with regexes is a bad idea because it's next to impossible to correctly handle nesting, content inside tags vs outside, etc. So if you need a complete solution, parse the HTML and then manipulate text nodes.

In this psecific example that you've given, this should do it.

$output = preg_replace('!\b([a-zA-Z])!`, '<big>$1</big>`, $input);

It means find a word boundary (\b), which is zero width, and wrap the following letter in a <big> element.

cletus
+6  A: 

Try this:

$string = preg_replace('/(?:^|\b)(\p{L})(\p{L}*)/u', '<big>$1</big>$2', 'some simple words');

Or if you want it in a function:

function replace_first_word($str, $format) {
    return preg_replace('/(?:^|\b)(\p{L})(\p{L}*)/u', str_replace('{L}', '$1', $format).'$2', $str);
}
Gumbo
i would accept this as the accepted answer, but i really need it to work with unicode characters. :(
YuriKolovsky
@YuriKolovsky: Oh, I forgot the *u* modifier. It should work now.
Gumbo
@GumboI can't get it to work, it still only select the first letter of every "latin" word.
YuriKolovsky
@YuriKolovsky: It seems that it needed some more tweaking. It should work now.
Gumbo
i accepted this as its the closest to what i asked, even though it does not work... mine works but putting my own answer as the best answer sounds wrong.
YuriKolovsky
A: 

i think what you need is

preg_replace('~(?<=\p{^L}|^)\p{L}~u', '<big>$0</big>', $input);

note that \b does not work properly with utf8.

stereofrog
note that \b does not work properly with utf8: yes i noticed :(
YuriKolovsky
did you test my code? is it working for you?
stereofrog
It works for latin characters, but not with UTF-8for example: "test text" works finewhile: "тест техт" outputs: теÑÑ‚ техт
YuriKolovsky
works for me. make sure your output is utf8 encoded.
stereofrog
A: 

here is a better and possibly faster version that i just found out myself, that supports utf-8 multibyte characters.

in my experience regex functions are slow in php, so here is a string manipulation based function.

function replace_first_word($text,$format='<big>{L}</big>'){
 //*** UTF-8 replace first letter of every word ***
 //split words
 $words = explode(' ', $text);
 //pick up each word
 foreach($words as &$word){
  //find out first letter of word
  $first = substr($word, 0,1);
  //remove first letter from word
  $word = substr($word,1);
  //replace first letter with formatted letter
  $first = str_replace('{L}',$first,$format);
  //add replaced letter to word
  $word = $first.$word;
 }
 //glue words back together and return them
    return implode(' ',$words);
}

also before php6 comes out, remember to set these 2 variables in php.ini to better support utf-8

mbstring.func_overload "7"
mbstring.internal_encoding "UTF-8"
YuriKolovsky
If you expect a multibyte string then you should use multibyte functions.
Gumbo
oh, your right.just set these in php.inimbstring.func_overload "7"mbstring.internal_encoding "UTF-8"
YuriKolovsky
+1  A: 

You could use the CSS property

p { text-transform: capitalize; }

From Sitepoint's CSS reference on text-transform:

capitalize

  • transforms the first character in each word to uppercase; all other characters remain unaffected — they’re not transformed to lowercase, but will appear as written in the document
Gordon
thanks gordon, i actually know the guy who wrote the reference :)its the best one yet.but i don't need to capitalize every letter of a already capitalized sentence text-transform: uppercase; ;)
YuriKolovsky
@YuriKolovsky would `p { text-transform: capitalize; font-variant: Small-caps;}` be a solution?
Gordon
I'm doing some q/a maintenance: yes, it would be a solution that works in all browsers except IE5. It would be the most elegant solution here, if I didn't mention HTML it would be the Best answer. I don't know why you didn't get up-voted more.
YuriKolovsky