views:

34

answers:

2
FalseAWS.MechanicalTurk.XMLParseErrorThere was an error parsing the XML question or answer data in your request. Please make sure the data is well-formed and validates against the appropriate schema. (1284779956270)Array00

I'm trying to send entire emails to mechanical turk, and I am using the mtturk.lib.php library to send this. I tried urlencode and htmlentities to attempt to send it, I'm sure there's a function that will make this code "formatted well enough" to send it through.

$thequestion = '<a href="linkgoeshere" target="_blank">click here</a>';

$QuestionXML = '<QuestionForm xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd"&gt;
  <Question>
    <QuestionContent>
      <Text>'.$thequestion.'</Text>
    </QuestionContent>
    <AnswerSpecification>
      <FreeTextAnswer/>
    </AnswerSpecification>
  </Question>
</QuestionForm> ';
+1  A: 

HTML is not a form of XML; don't try to parse it as such. Your best bet is to use a HTML5 parser, or, failing to obtain that, an SGML parser.

Delan Azabani
I'm using the default parser in this form. Basically, I just want to be able to send it to mechanical turk. - I think I said that wrong, I need to format the data for mechanical turk as a certain type of xml - I will place my current code here now in an edit up top. I need to pass html through the xml form to mechanical turk
Bob Cavezza
If that is the case, you could parse the original content using a HTML5/SGML parser, as suggested, into a DOM, then serialise it as XML.
Delan Azabani
I would only need to parse this one string function - could you recommend a parser? - I'm assuming I parse this html using the html5/sgml parser, then the rest of the function in the xml should parse as normal?
Bob Cavezza
A popular and heavily in-development parser for PHP and Python is html5lib, at http://code.google.com/p/html5lib/
Delan Azabani
A: 

html code inside an xml document may be embedded in a number of ways:

  1. escape it all the hard way with htmlspecialchars() and send it
  2. escape it with a <![CDATA[ ... ]]> section
  3. convert it to XHTML, specify the right namespace in an xmlns attribute

different xml parsers may support or not the third method, I'd go with the first or the second one.

ZJR
I'm going to try all 3 - 1.) Didn't work still get same error "There was an error parsing the XML question or answer data in your request. Please make sure the data is well-formed and validates against the appropriate schema" -2.) Didn't work so well either - how would I fix #3 - I'm not sure if I will be allowed to do this based on the way I need to submit the data but I'll make an attempt
Bob Cavezza