Good morning -
I'm interested in seeing an efficient way of parsing the values of an heirarchical text file (i.e., one that has a Title => Multiple Headings => Multiple Subheadings => Multiple Keys => Multiple Values) into a simple XML document. For the sake of simplicity, the answer would be written using:
- Regex (preferrably in PHP)
- or, PHP code (e.g., if looping were more efficient)
Here's an example of an Inventory file I'm working with. Note that Header = FOODS, Sub-Header = Type (A, B...), Keys = PRODUCT (or CODE, etc.) and Values may have one more more lines.
**FOODS - TYPE A**
___________________________________
**PRODUCT**
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
**CODE**
Sell by date going back to February 1, 2009
**MANUFACTURER**
Quesos Mi Pueblito, LLC, Passaic, NJ.
**VOLUME OF UNITS**
11,000 boxes
**DISTRIBUTION**
NJ, NY, DE, MD, CT, VA
___________________________________
**PRODUCT**
1) Peanut Brittle No Sugar Added;
2) Peanut Brittle Small Grind;
3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating
**CODE**
1) Lots 7109 - 8350 inclusive;
2) Lots 8198 - 8330 inclusive;
3) Lots 7075 - 9012 inclusive;
4) Lots 7100 - 8057 inclusive;
5) Lots 7152 - 8364 inclusive
**MANUFACTURER**
Star Kay White, Inc., Congers, NY.
**VOLUME OF UNITS**
5,749 units
**DISTRIBUTION**
NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN
**FOODS - TYPE B**
___________________________________
**PRODUCT**
Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;
**CODE**
990-10/2 10/5
**MANUFACTURER**
San Mar Manufacturing Corp., Catano, PR.
**VOLUME OF UNITS**
384
**DISTRIBUTION**
PR
And here's the desired output (please excuse any XML syntactical errors):
<foods>
<food type = "A" >
<product>Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese</product>
<product>La Fe String Cheese</product>
<code>Sell by date going back to February 1, 2009</code>
<manufacturer>Quesos Mi Pueblito, LLC, Passaic, NJ.</manufacturer>
<volume>11,000 boxes</volume>
<distibution>NJ, NY, DE, MD, CT, VA</distribution>
</food>
<food type = "A" >
<product>Peanut Brittle No Sugar Added</product>
<product>Peanut Brittle Small Grind</product>
<product>Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</product>
<code>Lots 7109 - 8350 inclusive</code>
<code>Lots 8198 - 8330 inclusive</code>
<code>Lots 7075 - 9012 inclusive</code>
<code>Lots 7100 - 8057 inclusive</code>
<code>Lots 7152 - 8364 inclusive</code>
<manufacturer>Star Kay White, Inc., Congers, NY.</manufacturer>
<volume>5,749 units</volume>
<distibution>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</distribution>
</food>
<food type = "B" >
<product>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice</product>
<code>990-10/2 10/5</code>
<manufacturer>San Mar Manufacturing Corp., Catano, PR</manufacturer>
<volume>384</volume>
<distibution>PR</distribution>
</food>
</FOODS>
<!-- and so forth -->
So far, my approach (which might be quite inefficient with a huge text file) would be one of the following:
Loops and multiple Select/Case statements, where the file is loaded into a string buffer, and while looping through each line, see if it matches one of the header/subheader/key lines, append the appropriate xml tag to a xml string variable, and then add the child nodes to the xml based on IF statements regarding which key name is most recent (which seems time-consuming and error-prone, esp. if the text changes even slightly) -- OR
Use REGEX (Regular Expressions) to find and replace key fields with appropriate xml tags, clean it up with an xml library, and export the xml file. Problem is, I barely use regular expressions, so I'd need some example-based help.
Any help or advice would be appreciated.
Thanks.