views:

62

answers:

2

I have a large number of records of the following type, which I have to modify

  1. I would like to remove the created_by="29" line without leaving a space. Note: A wild card inside the created_by value would be preferable

  2. I would like to remove the entire line creation_date="..." and the /> greater should move after state="1"/>

  3. Insert a new static line before state variable (e.g. modified_by="30")

XML:

<user id="1"
      org_id="3"
      created_by="29"
      state="1"
      creation_date="2010-06-01"/>

What kind of regular expression should I use to change this?

+2  A: 

A regular expression is the wrong way to approach this problem for a whole host of reasons, many of which are outlined in the answers to this question.

Instead, you will find that you'll have fewer headaches if you use a proper XML parser and use XPath to identify the parts of your XML document that you want to change.

josh3736
@josh3736 - I am just trying to manipulate sample data here using Eclipse. I don't intend to do this programmatically, if its possible to resolve it using a simple search-and-replace paradigm, I will probably stick with it or else would do it manually.
Samuel
+2  A: 

Assuming the attributes always appear in the same order:

search: (\s+)created_by="[^"]+"(\s+state="[^"]+")\s+creation_date="[^"]+"

replace: $1modified_by="30"$2

If you need to specify the element name, you can add this to the beginning of the regex:

(<user(?:\s+\w+="[^"]+")+?)

...and change the capture-group references in the replacement like this:

$1$2modified_by="30"$3

Alan Moore