if you need to compare this lists only once, i'd suggest converting docs to txt and then you'll be able to compare using regex. otherwise you'll need to use third party software to access info in the docs... like here maybe http://stackoverflow.com/questions/188452/reading-writing-a-ms-word-file-in-php
+1
A:
kgb
2010-09-23 08:56:37
not a public app. so I would only be too happy to copy paste into a form.
abel
2010-09-23 08:59:04
+2
A:
Create an array of both (possibly using the file()
function, depending on the format of the text, or possibly just an explode()
on content), and use array_diff()
.
Wrikken
2010-09-23 09:04:41
+1
A:
$oldList = file('oldList.txt');
$newList = file('newList.txt');
$list = array_udiff($newList, $oldList, 'compare');
function compare($new, $old) {
similar_text($old, substr($new, 3), $percent);
return $percent >= 80 ? 0 : 1;
}
This is my basic idea. To find all texts similar by 80% and remove them from the $newList
. You should adjust the percentage to satisfy your needs. The M/s
is removed by substr($new, 3)
.
nikic
2010-09-28 14:22:45
+1
A:
If there are no key fields for uniquely identifying the records, I think you will have to use something like similar_text or levenshtein.
$arOld = file('olddata.txt');
$arNew = file('newdata.txt');
foreach($arNew as $line){
$line = trim(substr($line, 3));
foreach($arOld as $old){
similar_text($line, $old, $percentage);
if ($percentage < 60){
echo $line;
}
}
}
Joyce Babu
2010-10-04 09:42:30
I ran the script using samples from the orig post. the output is posted in the orig question
abel
2010-10-04 11:26:51
On second thought, it is not going to work. It requires a little modification to work. Now it will print lots of lines.
Joyce Babu
2010-10-04 11:27:01
yes it does print out a lot of lines. The principle would be to match everyword from one text block with all the words of the second word block and then echo those which match. However company namess are multiple words....
abel
2010-10-04 11:35:02
+1
A:
Try this
set_time_limit(500)
$arOld = file('olddata.txt');
$arNew = file('newdata.txt');
foreach($arNew as $line){
if(substr($line, 0, 3) === 'M/s '){
$line = trim(substr($line, 3));
foreach($arOld as $old){
similar_text($line, $old, $percentage);
if ($percentage > 80){
continue;
}
}
echo $line;
}
}
Joyce Babu
2010-10-04 11:29:53
updated the code to check only lines beginning with M/S. Also fixed an error.
Joyce Babu
2010-10-04 11:48:39
... to check if the 'if cond' ever matches, even though there is an M/s at the beginning of many lines
abel
2010-10-04 12:42:37
Oops! 'M/s ' is 4 characters. You need to change substr($line, 0, 3) to substr($line, 0, 4)
Joyce Babu
2010-10-04 12:50:04