I'm using cURL to get a web page and present to our users. Things have worked well until I came upon a website using considerable amounts of Ajax that's formatted so:
33687|updatePanel|ctl00_SiteContentPlaceHolder_FormView1_upnlOTHER_NATL|
<div id="ctl00_SiteContentPlaceHolder_FormView1_othernationalities">
<h4>
<span class="tooltip_text" onmousemove="widetip=false; tip=''; delayToolTip(event,tip,widetip,0,0);return false"
onmouseout="hideToolTip()">
<span id="ctl00_SiteContentPlaceHolder_FormView1_lblProvideOTHER_NATL">Provide the following information:</span></span>
</h4>
|
266|scriptBlock|ScriptContentNoTags|
document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL').dispose = function() {
Array.remove(Page_Validators, document.getElementById('ctl00_SiteContentPlaceHolder_FormView1_dtlOTHER_NATL_ctl00_csvOTHER_NATL'));
}
So, each part of the response is 4 parts: 2 and 3 are just identifiers, 4 is the real "body", and 1 is the length of the body. The problem comes in that we modify the body, and I need to be able to update the length of the 1st part to indicate that; otherwise, we throw a parsing error when inserting this into the web page.
I'm trying to figure out a combination of shell commands (awk, sed, whatever) to: a) read the saved file b) run regex on it to gather each individual block of information (using '(\d*?)\|(.?)\|(.?)\|(.*?)\|') c) make the first capturing group equal to the length of the last capturing group d) save all the regex matches to a new document or back to the original
Any input from "the collective" would be GREATLY appreciated.