ansaurus

Question

need help! sed or awk how to replace text

Answer 1

A:

hi i think its better to use sed something like this:

sed 's/"/'/g' your file

eliavs 2010-08-06 08:33:12

thank you help! i tried it , this will replace all doublequote.

drk 2010-08-06 08:35:55

Answer 2

+2 A:

If you want to repleace all occurences of a single character, you can also use the command tr, simpler than sed or awk:

   cat myfile.txt | tr \" \'

Notice that both quotes are escaped. If you have other chars than quotes, you just write:

   cat myfile.txt | tr a A

Edit: Note that after the question was edited this answer is no longer valid: it replaces all double quotes, not only the one inside the Name property.

Sergio Acosta 2010-08-06 08:33:29

Answer 3

A:

If you mean just the double quote in 'Rico"s', you can use:

sed "s/Rico\"s/Rico's/"

as in:

pax> echo '{"name": "National Res...rto Rico"s Economy.", "key": "blah"}'
     | sed "s/Rico\"s/Rico's/"
{"name": "National Res...rto Rico's Economy.", "key": "blah"}

paxdiablo 2010-08-06 08:40:26

thank you help! but i have a json file , this file have 4 millions json data, may have 10 thousands data like this

drk 2010-08-06 08:46:48

@user358347: then you need a parser, not a regular expression engine.

paxdiablo 2010-08-06 09:44:24

Answer 4

A:

Assuming your data is exactly like you showed and the extra double quotes only appear in the name value field:

Update:

I made the script slightly more robust (handling ', ' inside fields).

BEGIN {
    q = "\""
    FS = OFS = q ", " q
}
{
    split($1, arr, ": " q)
    gsub(q, "'", arr[2])
    print arr[1] ": " q arr[2], $2, $3
}

Put this script in a file (say dequote.awk) and run the script with
awk -f dequote.awk input.json > output.json.

Update 2:

Okay, so your input is extremely difficult to process. The only thing other thing I can think of is this:

{
    start = match($0, "\"name\": ") + 8
    stop = match($0, "\", \"key\": ")
    if (start == 8 || stop == 0) {
        print
        next
    }
    pre = substr($0, 1, start)
    post = substr($0, stop)
    name = substr($0, start + 1, stop - start - 1)
    gsub("\"", "'", name)
    print pre name post
}

Explanation: I try to chop the line in three parts:

Up to the first double quote for the "name" value field;
the "name" value field minus the double quotes;
the closing double quote and the rest of the line.

In part 2 I replace all double quotes by single quotes. Then I glue the three parts back together and print them.

schot 2010-08-06 08:59:02

please tell me how to run this script , and how to scanner the json file

drk 2010-08-06 09:06:25

i tried it , not work , maybe i cut some data: {"last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "type": {"key": "/type/author"}, "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico"s Economy.", "key": "/authors/OL2108538A", "revision": 1}

drk 2010-08-06 10:01:54

the arr[2] have to change it , because the name key maybe at the start, or at the end

drk 2010-08-06 10:03:05

@drk See my updated version.

schot 2010-08-06 10:56:58

thanks,but still not work well, so i'm ready change my json data structure.

drk 2010-08-06 11:43:02

Answer 5

A:

awk '{for(i=1;i<=NF;i++) if($i~/name/) { gsub("\042","\047",$(i+1)) }   }1' file

ghostdog74 2010-08-06 14:01:34

Answer 6

A:

Adding some other weird error cases to your input

{ "last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"},
  "type": {"key": "/type/author"},
  "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico"s Economy.",
  "key": "/authors/OL2108538A",
  "revision": 1,
  "has \" escaped quote": 1,
  "has \" escaped quotes \"": 1,
  "has multiple " internal " quotes": 1,
}

this Perl program that corrects unescaped internal double-quotes using the heuristic that a string's actual closing quote is followed by optional whitespace and either a colon, comma, semicolon, or curly brace

#! /usr/bin/perl -p

s<"(.+?)"(\s*[:,;}])> {
  my($text,$terminator) = ($1,$2);
  $text =~ s/(?<!\\)"/'/g;  # " oh, the irony!
  qq["$text"] . $terminator;
}eg;

produces the following output:

$ ./fixdqs input.json
{ "last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"},
  "type": {"key": "/type/author"},
  "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico's Economy.",
  "key": "/authors/OL2108538A",
  "revision": 1,
  "has \" escaped quote": 1,
  "has \" escaped quotes \"": 1,
  "has multiple ' internal ' quotes": 1,
}

Delta from input to output:

$ diff -ub input.json <(./fixdqs input.json)
--- input.json
+++ /dev/fd/63
@@ -1,9 +1,9 @@
 { "last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"},
   "type": {"key": "/type/author"},
-  "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico"s Economy.",
+  "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico's Economy.",
   "key": "/authors/OL2108538A",
   "revision": 1,
   "has \" escaped quote": 1,
   "has \" escaped quotes \"": 1,
-  "has multiple " internal " quotes": 1,
+  "has multiple ' internal ' quotes": 1,
 }

Greg Bacon 2010-08-06 16:18:26

ansaurus

tags:

views:

answers:

need help! sed or awk how to replace text

Update:

Update 2:

related questions