tags:

views:

107

answers:

5

Am using vim and have a large text file that contains some html thrown in throoghout. Am trying to prepare it for the web and need to add <p></p> tags to the lines that are not yet formatted. Here is an example of what I have:

Paragraph text one one line [... more ... ]
Other paragraph text on the next line [... more ... ]  
<h1>html element thrown in on its own line</h1>
More paragraph text [... more ... ]  
<!-- some other element (always own line) -->
There is still more text!

I am looking for a way to search for the lines that don't begin with a < character and, for those lines, add opening and closing <p></p> tags ... so that, afterwards, my file resembles this:

<p>Paragraph text one one line [... more ... ] </p>
<p>Other paragraph text on the next line [... more ... ]   </p>
<h1>html element thrown in on its own line</h1>
<p>More paragraph text [... more ... ]   </p>
<!-- some other element (always own line ) -->
<p>There is still more text! </p>

How do I find lines that don't match a starting < character?

+7  A: 
^([^<].*)$

Make sure your options disallow "Dot matching newline" and replace with:

<p>$1</p>

Vim requires you to escape certain characters, but I don't actially have vim, so this is my best guess at the whole rule:

s:^\([^<].*\)$:<p>\1</p>:g
John Gietzen
What does it mean to _disallow dot matching newline_ ? Sorry, am a new to vim. I used `%s:^\([^>].*\)$:<p>\1</p>:g` and it added paragraph tags to _every_ line (even those with tags already). Almost there...
thornomad
The angle bracket in that last expression is pointing the wrong way. `[^>]` should be `[^<]`.
Nefrubyr
@thornomad: Sorry, Nefrubyr is correct. The angle bracket was wrong. I have corrected the line.
John Gietzen
Got it - working perfectly now. Thanks.
thornomad
+1  A: 

here's the logic. go through the file, check for < at the start of the line, if not there, construct a new string with the <p> and </p> and echo it out. There's really no need for complicated regex

with bash

#!/bin/bash
shopt -s extglob
while read -r line
do
    case "$line" in
        "<"*) echo $line ;;
        *) echo "<p>$line</p>";;
    esac   
done <"file"

with awk

$ awk '!/^</{$0="<p>"$0"</p>"}{print}' file

output

$ awk '!/^</{$0="<p>"$0"</p>"}1' file
<p>Paragraph text one one line [... more ... ]</p>
<p>Other paragraph text on the next line [... more ... ]  </p>
<h1>html element thrown in on its own line</h1>
<p>More paragraph text [... more ... ]  </p>
<!-- some other element (always own line) -->
<p>There is still more text!</p>
ghostdog74
"No need for complicated regex", and you're supplying solution requiring launching external tools?
depesz
vim, to the shell, IS an external tool as well. Whether its awk, sed, vim, ed etc, All these tools does things to files! There is really not much difference between them. Even plain old shell can be used to "edit" files as well. And NO, my solution is not launched from vim if that's what you are saying. They are run from the command line.
ghostdog74
OP said he is running vim. So calling shell/bash stuff is external.
depesz
OP also puts an sed tag, what gives then? the disadvantage of using vim in his situation. He has a large file. Not saying vim cannot support large file, but if he is going to edit a large file, he's better off using awk/sed other than vim (in interactive mode). another disadv, doing it in vim its one time only. Putting the edit command in a script is still better for use next time.
ghostdog74
A: 

This should work:

:%s/^\s*[^<]\+$/<p>&<\/p>/g
Maxim Kim
it has to start with something else than <, but < can be later on. I would change the \+ into *.
depesz
I wouldn't. My regex does not process empty lines. There wouldn't be <p></p>.
Maxim Kim
+1  A: 
:%s/^[^<].*/<p>&<\/p>/

alternatively:

:v/^</s#.*#<p>&</p>#

that's all that is needed.

depesz
What about empty lines?
Maxim Kim
What about them?
depesz
A: 

Another way to do it:

:v/^</normal I<p>^O$</p>

^O is done actually pressing CTRL+o

Or, if you use the surround.vim plugin:

:v/^</normal yss<p>
kemp