tags:

views:

683

answers:

6

Hello, I have a bat file that I should use to delete a part of one file and save into another one. I need to delete all the symbols between text "[aaa bbb]" and "[ccc ddd]". That is if I have the text:

[aaa bbb]
1
2
3
[ccc ddd]

I should have as output:

[aaa bbb]
[ccc ddd]

Thank you

EDIT: I would like to clarify the question. I should delete all the symbols between marker1 and marker2. Marker1 and marker2 are just some words or parts of text but not obligatory lines. For example I would have:

[aaa bbb] [ccc]
1
2
3
4
5
[www yyy]

If I want to delete the text between [aaa bbb] and [www yyy] I should have as output:

[aaa bbb] 
[www yyy]
A: 

I looked at cmd and at power shell - can't find anything useful. Get yourself ActivePerl?

Arkadiy
A: 

If you trust the VB script "sed-like" of this answer...

sed.vbs:

Dim pat, patparts, rxp, inp
pat = WScript.Arguments(0)
patparts = Split(pat,"/")
Set rxp = new RegExp
rxp.Global = True
rxp.Multiline = False
rxp.Pattern = patparts(1)
Do While Not WScript.StdIn.AtEndOfStream
  inp = WScript.StdIn.ReadLine()
  WScript.Echo rxp.Replace(inp, patparts(2))
Loop

You can type
cscript /Nologo sed.vbs s/^\d+\s*$/ < in.txt (in.txt being your initial text)

and you will obtain the expected output...

^\d+\s*$

Would target any line beginning with one or more digit, followed by 0 or more spaces within one line.


That is not the best "pure sed" solution and it can not actually delete lines, but this is a native "vista-compliant" solution...


Actually, the following hack deliberately interpreting the "d sed-command" could be able to 'delete' lines:

Dim pat, patparts, rxp, inp
pat = WScript.Arguments(0)
patparts = Split(pat,"/")
Set rxp = new RegExp
rxp.Global = True
rxp.Multiline = False
rxp.Pattern = patparts(1)
Do While Not WScript.StdIn.AtEndOfStream
  inp = WScript.StdIn.ReadLine()
  out = rxp.Replace(inp, patparts(2))
  if not patparts(2)="d" or not out="d" Then
    WScript.Echo out
  end if
Loop

cscript /Nologo sed.vbs s/^\d+\s*$/ < in.txt would actually produce:

[aaa bbb]
[ccc ddd]


In a .bat, you could have a sed.bat:

cscript /Nologo sed.vbs %1 < %2

and then execute that .bat like this:

C:\prog\sed>sed.bat s/^\d+\s*$/d in.txt
VonC
My read on "sed in Vista" title is "Vista solution only", no other external libraries/gnu utilities to import. If this is not what you are after, please edit your question.
VonC
My read is "He is using sed on Vista." =)
PEZ
Since it does not answer the question, I make this a "community wiki" and leave it there, for archive. It can give ideas, if no "sed.exe" is available...
VonC
Indeed. I had no idea you could use vbs like that (well, it has never occured to me, I always use cygwin on windows). Really good to know.
PEZ
yes I use sed on Vista
Seacat
NB. The lines to be printed doesn't match a particular pattern. It's the markers that need to be matched and the text between them deleted.
PEZ
+2  A: 

Take a look at the section "Delete between marker 1 and marker2" on this sed hints page

Applying it on your example. clean.sed:

/^\[aaa bbb\]$/,/^\[ccc ddd\]$/{
 /^\[aaa bbb\]$/!{
   /^\[ccc ddd\]$/!d
 }
}

Run using:

sed -f clean.sed inputfile.txt

To edit the input file "in place", use the -i option to sed:

sed -i.bak -f clean.sed datafile.txt

A backup copy of the file with the name "datafile.txt.bak" is saved before editing the original.

EDIT: Since the assumption that the markers where always on a line of their own was wrong, heres a script that can handle markers in the middle of a line:

/\[aaa bbb\]/,/\[ccc ddd\]/{
  s/\[aaa bbb\].*/[aaa bbb]/
  s/.*\[ccc ddd\]/[ccc ddd]/
  /\[aaa bbb\]$/!{
    /^\[ccc ddd\]/!d
  }
}

For this input:

foo[aaa bbb]1
2
3
4
5[ccc ddd]bar
foo
[aaa bbb]
1
2
3
[ccc ddd]
bar

It produces:

foo[aaa bbb]
[ccc ddd]bar
foo
[aaa bbb]
[ccc ddd]
bar

Note! It can't handle files where the markers can appear on the same line.

EDIT again: If the input format for marker 1 is such that you can always count on it being on a line of its own you can simplify the script some:

/^\[aaa bbb\]$/,/\[ccc ddd\]/{
  s/.*\[ccc ddd\]/[ccc ddd]/
  /^\[aaa bbb\]$/!{
    /^\[ccc ddd\]/!d
  }
}

(Anchoring marker 1 at the beginning and end of a line and skipping the trimming of the marker 1 line.)

PEZ
How can I use it in the bat file? I don't know how to use multilines command.Thanks
Seacat
that is I writeD:\tmp\sed.exe ...commandcan I use multiline command here?
Seacat
Putting the sed script in a file of it's own should work. (sed -f)
PEZ
I updated the answer. Let me know if it works or not (I'm not on Windows at the moment so can't check there.)
PEZ
Good read on the question. +1
VonC
it seems to me it works! But I have one more question. What if I want just to delete 1 line after marker1 and marker1 is whole line?
Seacat
You mean marker1 is always on a line of it's own?
PEZ
If I understood you right, yes
Seacat
The script handles that too, but it could be a bit simplified then. Though... your question's example doesn't look like that so I wonder if you understood me right. =)
PEZ
Revised my answer again. (I added the simpler sed script anyway, in case we understand each other correct with the marker1 constraints).
PEZ
My question is additional. Actually it is a little bit different from primary question but I'm just not sure (because of changing requirements) what exactly I need to do so I would like to have several solutions just in case. Thanks
Seacat
How can I save the result into the same file? I have a right result but I should use commands to save into an additional file and rename it. Can I do the same using the primary file?
Seacat
Use the -i option of sed. (I revised my answer to give some more info on that.)
PEZ
If you feel I've answered your question, please tick it off as the chosen answer.
PEZ
There is my string D:\tmp\sed.exe -i.bak -f sedscript.sed D:\tmp\test.txt and it says that illegal option -i
Seacat
if I use D:\tmp\sed.exe -f -i.bak sedscript.sed D:\tmp\test.txt it says nothing but doesn't save changes into the file.
Seacat
maybe your sed doesn't have the -i option. which sed is it?
PEZ
+1  A: 

Note that sed is available for Windows, along with a whole bunch of other GNU utilities. I'm not sure if you're asking whether there's an equivalent, or how to actually do it once you've got the tool.

Rob
kenny
+1  A: 
D:\tmp\sed.exe -f sedscript.sed D:\tmp\test.txt >c:\tmp\test2.txt


/^\[Product Feature\]$/,/^\[Dm$/{
 /^\[Product Feature\]$/!{
 /^\[Dm$/!d 
 }
 } 
Seacat
Do you get any error messages?
PEZ
no, no any messages. it just doesn't delete or maybe doesn't save into output file, I can say more exactly
Seacat
I mean I can not say more exactly
Seacat
Can you show some of your input?
PEZ
If you run the command on the prompt and skip the redirect to an outputfile (pipe it through more if the output is plenty) you can debug it faster.
PEZ
<pre>D:\tmp>testbat.batD:\tmp>D:\tmp\sed.exe -f sedscript.sed D:\tmp\test.txt 1>c:\tmp\test2.txt</pre>I don't know where the "1"symbol is from, I have not it in my script
Seacat
This is truly strange. And D:\tmp\test.txt isn't empty? (I must check, sorry).
PEZ
I can not give you my input file but I can say that the symbols to delete between are inside of the file, not in beginning or end
Seacat
Yes, the sed script doesn't assume the markers are in the beginning and end. It just deletes everything in between them. At least on my Mac it does. Have you run it on the prompt (outside the bat file), without redirecting to a file, just to see if you get any error messages?
PEZ
Hmmm very strange.... it deletes ALL the symbols not between marker1 and marker2 but after marker1
Seacat
Then it's something wrong with the marker2 regex. What does marker2 look like?
PEZ
It's just test that begins from '[' like [Dm
Seacat
If I delete brakes from marker1 that it deletes nothing
Seacat
But the regex matches a line _exactly_ matching "[Dm". No white space, no nothing after it. If your marker2 can have anything after "[Dm" then remove the "$". Or if it's always the same marker, match all of it.
PEZ
If you delete the brackets from the marker1 regex then it doesn't match your marker1 and thus the deletion never takes place.
PEZ
Ok, I guessed what happens. The marker should be as whole line. It's wrong, I need to fing any text independently on is it whole lint or just its part.
Seacat
I mean to find.
Seacat
Do you also need to delete all the text before the marker on the same line? Then this approach doesn't work. But sed can probably still do it. I think you need to revise your question and your text example so that it reflects your case.
PEZ
I mean for example if line is "[aaa]" the script should fine "[aaa]" and "[aaa" and even "a"
Seacat
Ok, I will write a new version of my question. Although I have never write "between lines" but always "between some symbols". It is not the same imho...
Seacat
I agree it's not the same. I made an assumption based on your example. But anyway.
PEZ
Anyway thank you very much.
Seacat
I revised my answer to handle markers that are more "free". Check it out. It can't handle input where the markers can appear on the same line though.
PEZ
A: 

I would like to clarify the question. I should delete all the symbols between marker1 and marker2. Marker1 and marker2 are just some words or parts of text but not obligatory lines. For example I would have:

[aaa bbb] [ccc]
1
2
3
4
5
[www yyy]

If I want to delete the text between [aaa bbb] and [www yyy] I should have as output:

[aaa bbb] 
[www yyy]
Seacat
I edited your question for you, putting this answer-text there. You should delete this "answer", (since it's not an answer).
PEZ
And, I revised my answer to deal with this clarified question.
PEZ