tags:

views:

467

answers:

5

I have a 2GB text file on my linux box that I'm trying to import into my database.

The problem I'm having is that the script that is processing this rdf file is choking on one line:

mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/&gt;
<link r:resource="http://www.oekoworld.com/"/&gt;
</Topic>
=^

I want to replace the </Topic> with </Line>. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.

Any ideas/suggestions?

G-Man

+5  A: 
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
David Zaslavsky
+6  A: 
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
chaos
That doesn't work if the opening tag is also on the same line...
David Zaslavsky
Good catch. Fixed.
chaos
Running it now. Thanks!
GeoffreyF67
+2  A: 

Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.

Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).

The line tags are self closing, so the extra </topic> must be started somewhere else...
Adam Davis
That made me look and it was actually ExternalPage I needed to replace. Thanks!
GeoffreyF67
+4  A: 

The tool for editing text files in Unix, is called ed (as opposed to sed, which as the name implies is a stream editor).

ed was once intended as an interactive editor, but it can also easily scripted. The way ed works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s command, which takes the same regexp that sed would. So, to change the 42nd line, you would write something like 42s/old/new/.

Here's the entire command:

FILENAME=/path/to/whereever
LINENUMBER=25462599

ed -- "${FILENAME}" <<-HERE
    ${LINENUMBER}s!</Topic>!</Line>!
    w
    q
HERE

The advantage of this is that ed is standardized, while the -i flag to sed is a proprietary GNU extension that is not available on a lot of systems.

Jörg W Mittag
A: 

My shell script:

#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
        if (NR == line) {
                print new_content;
        } else {
                print $0;
        }
}' $3

Arguments:

first: line number you want change
second: text you want instead original line contents
third: file name

This script prints output to stdout then you need to redirect. Example:

./script.sh 5 "New fifth line text!" file.txt

You can improve it, for example, by taking care that all your arguments has expected values.

SourceRebels