ansaurus

Question

keep HTMLformat after replace some text (using PHP and JS)

Answer 1

A:

dclowd9901 2010-04-01 21:19:43

That would remove the HTML formatting completely, and the post was specifically about *keeping* HTML formatting.

Matti Virkkunen 2010-04-01 21:23:06

Yeah, just noticed that. Sorry for the mixup.

dclowd9901 2010-04-01 21:25:01

Answer 2

+1 A:

Well, there might be a better way, but off the top of my head (assuming that tags won't appear in the middle of words, HTML is well-formed, etc.)...

Essentially, you'll need three things (sorry if this sounds patronising, not intended that way): 1. A method of sub-string matching that ignores tags. 2. A way of making the replacement preserving the tags. 3. A way of putting it all together.

1 - This is probably the most difficult bit. One method would be to iterate through all of the characters in the source string (strings are basically arrays of characters so you can access the characters as if they are array elements), attempting to match as many characters as possible from the search string, stopping when you've either matched all of the characters or run out of characters to match. Any characters between and including '<' and '>' should be ignored. Some pseudo-code (check this over, it's late and there may be mistakes):

findMatch(startingPos : integer, subject : string, searchString : string){
    //Variables for keeping track of characters matched, positions, etc.
    inTag = false;
    matchFound = false;
    matchedCharacters = 0;
    matchStart = 0;
    matchEnd = 0;

    for(i from startingPos to length(searchString)){
        //Work out when entering or exiting tags, ignore tag contents
        if(subject[i] == '<' || subject[i] == '>'){
            inTag = !inTag;
        }
        else if(!inTag){
            //Check if the character matches expected in search string
            if(subject[i] == searchString[matchedCharacters]){
                if(!matchFound){
                    matchFound = true;
                    matchStart = i;
                }
                matchedCharacters++;

                //If all of the characters have been matched, return the start and end positions of the substring
                if(matchedCharacters + 1 == length(searchString)){
                    matchEnd = i - matchStart;
                    return matchStart, matchEnd;
                }
            }
            else{
                //Reset counts if not found
                matchFound = false;
                matchCharacters = 0;
            }
        }
    }
    //If no full matches were found, return error
    return -1;
}

2 - Split the HTML source code into three strings - the bit you want to work on (between the two positions returned by the matching function) and the part before and after. Split up the bit you want to modify using, for example:

$parts = preg_split("/(<[^>]*>)/",$string, -1, PREG_SPLIT_DELIM_CAPTURE);

Keep a record of where the tags are, concatenate the non-tag segments and perform substring replace on this as normal, then split the modified string up again and reassemble with the tags in place.

3 - This is the easy part, just concatenate the modified part and the other two bits back together.

I may have horribly over complicated this mind, if so just ignore me.

Moonshield 2010-04-01 22:39:24

Sadi 2010-04-03 02:52:41

Answer 3

+3 A:

I would do this:

if (preg_match('/(.*)novice((?:<.*>)?\s(?:<.*>)?programmer.*)/',$inString,$attributes) {
  $inString = $attributes[1].'learner'.$attributes[2];
}

It should match any of the following:

novice programmer
novice</b> programmer
novice </b>programmer
novice<span> programmer

A test version of what the regex states would be something like: Match any set of characters until you reach "novice" and put it into a capturing group, then maybe match something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it), but then there only match something with a white space and then maybe match again something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it) which must then be followed by programmer followed by any number of characters and put that into a capture group.

I would do some specific testing though, as I may have missed some stuff. Regex is a programmers best friend!

Kitson 2010-04-02 11:26:36

It is very hard coded, but may be a possible solution, thank you

Sadi 2010-04-03 02:39:41

One more thing, novice was also replaced, just you can not see the affect as both word (search-replace) are same "novice".

Sadi 2010-04-03 03:24:48

No, it isn't `preg_replace`... it is `preg_match`, it will only trigger if the pattern is matched and the capture groups are moved into $attributes and then reassembled into the desired string. As far as the hard coding, it was to give you what you were looking for, but regular expressions can be adapted to whatever you really need.

Kitson 2010-04-06 15:52:47

"I am Sadi, novice programmer. I am simple. I am Sadi, novice programmer. I am simple" -- Not working properly with this string, Here the result occur twice. I have tried with preg_match_all and preg_match. And it never replace the programmer. It keeps it as it is.Any Idea please?

Sadi 2010-04-17 06:43:48

Answer 4

A:

Interesting problem.

I would use the DOM and XPath to find the closest nodes containing that text and then use substring matching to find out which bit of the string is in what node. That will involve character-per-character matching and possible backtracking, though.

Here is the first part, finding the container nodes:

<?php
error_reporting(E_ALL);
header('Content-Type: text/plain; charset=UTF-8');

$doc = new DOMDocument();
$doc->loadHTML(<<<EOD
<p>
    <span>
        <i>
            I am <b>Sadi, novice</b> programmer.
        </i>
    </span>
</p>
<ul>
    <li>
        <div>
            I am <em>Cornholio, novice</em> programmer of television shows.
        </div>
    </li>
</ul>
EOD
);
$xpath = new DOMXPath($doc);
// First, get a list of all nodes containing the text anywhere in their tree.
$nodeList = $xpath->evaluate('//*[contains(string(.), "programmer")]');
$deepestNodes = array();
// Now only keep the deepest nodes, because the XPath query will also return HTML, BODY, ...
foreach ($nodeList as $node) {
    $deepestNodes[] = $node;
    $ancestor = $node;
    while (($ancestor = $ancestor->parentNode) && ($ancestor instanceof DOMElement)) {
        $deepestNodes = array_filter($deepestNodes, function ($existingNode) use ($ancestor) {
            return ($ancestor !== $existingNode);
        });
    }
}
foreach ($deepestNodes as $node) {
    var_dump($node->tagName);
}

I hope that helps you along.

janmoesen 2010-04-02 12:43:52

"That will involve character-per-character matching and possible backtracking, though." Though it sounds good, it may not good solution for production environment. But I will take a look at your solution. Thank you

Sadi 2010-04-03 02:48:23

Answer 5

A:

Since you didn't give exact specifics on what you will use this for, I will use your example of "I am sadi, novice programmer".

$before = 'I am <b>sadi, novice</b> programmer';
$after = preg_replace ('/I am (<.*>)?(.*), novice(<.*>)? programmer/','/I am $1$2,     learner$3 programmer/',$string);

Alternatively, for any text:

$string = '<b>Hello</b>, world!';
$orig = 'Hello';
$replace = 'Goodbye';
$pattern = "/(<.*>)?$orig(<.*>)?/";
$final = "/$1$replace$2/";
$result = preg_replace($pattern,$final,$string);
//$result should now be 'Goodbye, world!'

Hope that helped. :d

Edit: An example of your example, with the second piece of code: $string = 'I am sadi, novice programmer.';
$orig = 'novice';
$replace = 'learner';
$pattern = "/(<.>)?$orig(<.>)?/";
$final = "$1$replace$2";
$result = htmlspecialchars(preg_replace($pattern,$final,$string));
echo $result;

The only problem is if you were searching for something that was more than a word long.

Edit 2: Finally came up with a way to do it across multiple words. Here's the code:

function htmlreplace($string,$orig,$replace) 
 {
  $orig = explode(' ',$orig);
  $replace = explode(' ',$replace);
  $result = $string;
  while (count($orig)>0)
   {
    $shift = array_shift($orig);
    $rshift = array_shift($replace);

    $pattern = "/$shift\s?(<.*>)?/";
    $replacement = "$rshift$1";
    $result = preg_replace($pattern,$replacement,$result);
   }
  $result .= implode(' ',$replace);
  return $result;
 }

Have fun! :d

Hussain 2010-04-16 00:24:06

Please look at the example. It search using more than one word "novice programmer". It could be a whole sentence. The extra white space (e.g. new line, tab) and any tag should be ignored during the search.

Sadi 2010-04-16 02:49:49

Um, I don't think it's taking into consideration whitespace... Another fix coming on the way, gim a few minutes.

Hussain 2010-04-16 03:52:17

not working properly. It works like replace by word. Even replace by word not working always. example: $inString = 'I am Sadi, novice programmer. I am simple. I am Sadi, novice programmer. I am simple programmer';echo htmlreplace($inString, 'novice programmer', 'lame developer'); Result: I am Sadi, lame developer. I am simple. I am Sadi, novice developer. I am simple developer

Sadi 2010-04-21 03:16:29

Answer 6

+3 A:

ok i think this is what you want. it takes your input search and replace, splits them into arrays of strings delimited by space, generates a regexp that finds the input sentence with any number of whitespace/html tags, and replaces it with the replacement sentence with the same tags replaced between the words.

if the wordcount of the search sentence is higher than that of the replacement, it just uses spaces between any extra words, and if the replacement wordcount is higher than the search, it will add all 'orphaned' tags on the end. it also handles regexp chars in the find and replace.

<?php
function htmlFriendlySearchAndReplace($find, $replace, $subject) {
    $findWords = explode(" ", $find);
    $replaceWords = explode(" ", $replace);

    $findRegexp = "/";
    for ($i = 0; $i < count($findWords); $i++) {
        $findRegexp .= preg_replace("/([\\$\\^\\|\\.\\+\\*\\?\\(\\)\\[\\]\\{\\}\\\\\\-])/", "\\\\$1", $findWords[$i]);
        if ($i < count($findWords) - 1) {
            $findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";
        }
    }
    $findRegexp .= "/i";

    $replaceRegexp = "";
    for ($i = 0; $i < count($findWords) || $i < count($replaceWords); $i++) {
        if ($i < count($replaceWords)) {
            $replaceRegexp .= str_replace("$", "\\$", $replaceWords[$i]);
        }
        if ($i < count($findWords) - 1) {
            $replaceRegexp .= "$" . ($i + 1);
        } else {
            if ($i < count($replaceWords) - 1) {
                $replaceRegexp .= " ";
            }
        }
    }

    return preg_replace($findRegexp, $replaceRegexp, $subject);
}
?>

here are the results of a few tests :

Original : <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <b>Advanced Programmer</b>

Original : Hi, <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : Hi, <b>Advanced Programmer</b>

Original : I am not a <b>Novice</b> Programmer
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b>Advanced</b> Programmer

Original : Novice <b>Programmer</b> in the house
Search : Novice Programmer
Replace : Advanced Programmer
Result : Advanced <b>Programmer</b> in the house

Original : <i>I am not a <b>Novice</b> Programmer</i>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <i>I am not a <b>Advanced</b> Programmer</i>

Original : I am not a <b><i>Novice</i> Programmer</b> any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i> Programmer</b> any more

Original : I am not a <b><i>Novice</i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i></b> Programmer any more

Original : I am not a Novice<b> <i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced<b> <i> </i></b> Programmer any more

Original : I am not a Novice <b><i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced <b><i> </i></b> Programmer any more

Original : <i>I am a <b>Novice</b> Programmer</i> too, now
Search : Novice Programmer too
Replace : Advanced Programmer
Result : <i>I am a <b>Advanced</b> Programmer</i> , now

Original : <i>I am a <b>Novice</b> Programmer</i>, now
Search : Novice Programmer
Replace : Advanced Programmer Too
Result : <i>I am a <b>Advanced</b> Programmer Too</i>, now

Original : <i>I make <b>No money</b>, now</i>
Search : No money
Replace : Mucho$1 Dollar$
Result : <i>I make <b>Mucho$1 Dollar$</b>, now</i>

Original : <i>I like regexp, you can do [A-Z]</i>
Search : [A-Z]
Replace : [Z-A]
Result : <i>I like regexp, you can do [Z-A]</i>

oedo 2010-04-18 09:46:44

I like the solution. But here is little bug.$inString = 'I am Sadi, novice programmer. I am simple. I am Sadi, novice programmer. I am simple';echo htmlFriendlySearchAndReplace('Novice programmer', 'lame developer', $inString);Result is: I am Sadi, lame programmer. I am simple. I am Sadi, novice developer. I am simple

Sadi 2010-04-21 03:03:54

sorry, edited answer to fix. change this line : `$findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";`

oedo 2010-04-21 07:21:04

Thank you, now it is working very well. Only remain problem is it can not work if it found tag in the middle of the word. e.g. Novice And of-course it is quite difficult to solve as we can not determine easily the position of the tag. If you can please post the solution of it.

Sadi 2010-04-21 08:07:42

You may move the tag forward or backward :) Thank you very much for the solution. I have tried similar solution (as your function) but failed because I am bad with regex :(

Sadi 2010-04-21 08:08:56

Urrghh!!!! I can not accept the answer :( The accept button has gone :( May be because of the bounty... But it is the best solution

Sadi 2010-04-21 08:11:19

that's very strange. now the bounty has gone too? only 10 points for that answer then :(

oedo 2010-04-21 11:49:47

you got 10 for my up vote... not the bounty... but your answer work out of the box.... :(

Sadi 2010-04-22 05:16:19

oh well, no worries. glad i could help anyway.

oedo 2010-04-22 10:17:55

Hi oedo, Woul you please help me here (http://stackoverflow.com/questions/2728288/split-string-into-smaller-part-with-constrain-php-regex-html) with your skill of regex :)

Sadi 2010-04-28 14:25:03

At last this answer was accepted by me somehow.... thanks again Oedo :)

Sadi 2010-09-05 09:18:08

ansaurus

tags:

views:

answers:

keep HTMLformat after replace some text (using PHP and JS)

related questions