views:

475

answers:

2

I have large text files 140k or larger full of paragraphs of text and need to insert a sentence in to this file at random intervals only if the file contains more then 200 words.

The sentence I need to insert randomly throughout the larger document is 10 words long.

I have full control over the server running my LAMP site so I can use PHP or a linux command line application if one exists which would do this for me.

Any ideas of how best to tackle this would be greatly appreciated.

Thanks

Mark

+1  A: 

You could use str_word_count() to get the number of words in the string. From there, determine if you want to insert the string or not. As for inserting it "at random," that could be dangerous. Do you mean to suggest you want to insert it in a couple random areas? If so, load the contents of the file in as an array with file() and insert your sentence anywhere between $file[0] and count($file);

Jonathan Sampson
"Do you mean to suggest you want to insert it in a couple random areas?" Yes this is what I was thinking needed to be done. Thanks Mark
A: 

The following code should do the trick to locate and insert strings into random locations. From there you would just need to re-write the file. This is a very crude way and does not take into account punctuation or anything like that, so some fine-tuning will most likely be necessary.

$save = array();
$words = str_word_count(file_get_contents('somefile.txt'), 1);

if (count($words) <= 200)
  $save = $words;
else {
  foreach ($words as $word) {
    $save[] = $word;
    $rand = rand(0, 1000);
    if ($rand >= 100 && $rand <= 200)
      $save[] = 'some string';
  }
}

$save = implode(' ', $save);

This generates a random number and checks if it's between 100 and 200 inclusive and, if so, puts in the random string. You can change the range of the random number and that of the check to increase or decrease how many are added. You could also implement a counter to do something like make sure there are at least x words between each string.

Again, this doesn't take into account punctuation or anything and just assumes all words are separated by spaces. So some fine tuning may be necessary to perfect it, but this should be a good starting point.

Steven Surowiec
Thanks for this Steven, I appreciate it and will give it a try. Mark