views:

39

answers:

2

Hi There

I am hoping that someone maybe able to help me. I need to create a regex that will remove all duplicates from a input file - I am creating a ftp.exe script to upload files and do not want duplicates in the execute command.

Here is a short example of the script: There might be up to 20 or more of the same duplicates in the file...

I have created ( brackets around the different parts ) and thought that it maybe used to group

    (mkdir /breeds
    cd /breeds
    mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/breeds/*.*
    )
(mkdir /breeds
    cd /breeds
    mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/breeds/*.*
    )
(cd /
    mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/*.*
    )
(cd /
    mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/*.*
    )

How can I write a javascript regex.match to stip out the duplicate values.

A: 

While identifying duplicates in your text is pretty easy for a human, it is a rather hard task for regex (especially since the text seemingly can be anything and there is no fixed amount of lines that make up a group).

Consider:

mkdir /breeds
cd /breeds
mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/breeds/*.*

and

mkdir /breeds
cd /breeds
mput C:\Inetpub\wwwroot\site.co.za/admin/buckets\application\sites\site_-_org/*.*

Does this create a duplicate? Are the first two lines of each a duplicate or does the entire group need to match (but in that case, how is a group determined?).

You are not going to find a single regular expression that will do what you want to do here. You need to find a way to actually parse your input based on a series of rules that you supply.

For an example, you could split your input into an array lines and then doubly traverse that array looking for groups of equivalent lines (but even then you'd need to determine some rules, such as what the minimum number of lines to form a group is).

Daniel Vandersluis
nice break up of the question Daniel... as I mentioned below it might then be easier to write it temporary to mysql and then doing a GROUP BY and then deleting the temp file? I am using asp
Gerald Ferreira
`GROUP BY` what, though?
Daniel Vandersluis
Ok I have done it and it seems to work I am writing everything between the ( brackets ) to the mysql db - and then just import it again with a group by statement which filters out all the duplicates... not the most elegant and fastest way but it seems to work
Gerald Ferreira
+1  A: 

One way to do it would be to combine each "group" into one line (e.g. separate the commands with semicolons), then use unix | sort | uniq to remove duplicate lines, then split the lines back up again.

LarsH
@LarsH OP wants to do this in Javascript and it doesn't look like he's using unix anyways ("ftp.exe")
Daniel Vandersluis
I was thinking to write it to mysql and then do a group by function - but thought that regex might be easier and faster...
Gerald Ferreira
@Daniel good points. Guess I should read the question more closely. Still, there was not a js solution forthcoming, and sometimes it's helpful to have a solution using means you hadn't thought of.
LarsH