views:

59

answers:

2

I have a set of XML documents that all share the same schema. (They're SAPI grammars with semantic tags, if that matters.) I can use the documents to match text strings, returning a set of attributes with known values.

My problem is that I'd like to take a set of attribute values and generate a string from the grammar that (when submitted to the grammar) would produce the same set of attribute values. A further complication is that different grammars have the tags in different order (the grammars are for different natural languages), so I can't do a straightforward tree walk.

Does anybody have a good approach to this problem?

EDIT: Here's an example set of grammars:

Grammar 1 (English):

<GRAMMAR LANGID="409">
    <DEFINE>
    <ID NAME="NUMBERS1THROUGH8_ID" VAL="6503" />
    <ID NAME="NUMBERCOMMAND" VAL="-1"/>
    <ID NAME="NUMBER1" VAL="1"/>
    <ID NAME="NUMBER2" VAL="2"/>
    <ID NAME="NUMBER3" VAL="3"/>
    <ID NAME="NUMBER4" VAL="4"/>
    <ID NAME="NUMBER5" VAL="5"/>
    <ID NAME="NUMBER6" VAL="6"/>
    <ID NAME="NUMBER7" VAL="7"/>
    <ID NAME="NUMBER8" VAL="8"/>
    <ID NAME="NUMBER9" VAL="9"/>
    </DEFINE>

<RULE NAME="ChooseSynoynms">
   <L>
      <P>choose</P>
      <P>number</P>
      <P>select</P>
      <P>click</P>
   </L>
</RULE>

<RULE NAME="NumberList">
   <LN PROPNAME="numberCommand" PROPID="NUMBERCOMMAND">
     <PN VAL="NUMBER1">one</PN>
     <PN VAL="NUMBER2">two</PN>
     <PN VAL="NUMBER3">three</PN>
     <PN VAL="NUMBER4">four</PN>
     <PN VAL="NUMBER5">five</PN>
     <PN VAL="NUMBER6">six</PN>
     <PN VAL="NUMBER7">seven</PN>
     <PN VAL="NUMBER8">eight</PN>
     <PN VAL="NUMBER9">nine</PN>
   </LN>
</RULE>

<RULE ID="NUMBERS1THROUGH8_ID" TOPLEVEL="INACTIVE">
    <O %COMMAND_WEIGHT%><RULEREF NAME="ChooseSynoynms"/></O>
    <RULEREF NAME="NumberList" />
    <O>
        <P PROPNAME="ExplicitOK" VAL="1">ok</P>
    </O>
</RULE>
</GRAMMAR>

Grammar 2: (German)

<GRAMMAR LANGID="409">
    <DEFINE>
    <ID NAME="NUMBERS1THROUGH8_ID" VAL="6503" />
    <ID NAME="NUMBERCOMMAND" VAL="-1"/>
    <ID NAME="NUMBER1" VAL="1"/>
    <ID NAME="NUMBER2" VAL="2"/>
    <ID NAME="NUMBER3" VAL="3"/>
    <ID NAME="NUMBER4" VAL="4"/>
    <ID NAME="NUMBER5" VAL="5"/>
    <ID NAME="NUMBER6" VAL="6"/>
    <ID NAME="NUMBER7" VAL="7"/>
    <ID NAME="NUMBER8" VAL="8"/>
    <ID NAME="NUMBER9" VAL="9"/>
    </DEFINE>

<RULE NAME="ChooseSynoynms">
   <L>
      <P>wahlen</P>
      <P>Nummer</P>
      <P>auswahlen</P>
      <P>klicken</P>
   </L>
</RULE>

<RULE NAME="NumberList">
   <LN PROPNAME="numberCommand" PROPID="NUMBERCOMMAND">
     <PN VAL="NUMBER1">eins</PN>
     <PN VAL="NUMBER2">zwei</PN>
     <PN VAL="NUMBER3">drei</PN>
     <PN VAL="NUMBER4">vier</PN>
     <PN VAL="NUMBER5">funf</PN>
     <PN VAL="NUMBER6">sechs</PN>
     <PN VAL="NUMBER7">sieben</PN>
     <PN VAL="NUMBER8">acht</PN>
     <PN VAL="NUMBER9">neun</PN>

   </LN>
</RULE>

<RULE ID="NUMBERS1THROUGH8_ID" TOPLEVEL="INACTIVE">
      <P><O>auf</O></P> <RULEREF NAME="NumberList"/>
      <O>
        <P PROPNAME="ExplicitOK" VAL="1">OK</P>
      </O>
      <P><RULEREF NAME="ChooseSynoynms"/></P>
</RULE>
</GRAMMAR>

What I want to do is to specify "NumberCommand = 5" and get "choose 5" from the English grammar, and "funf klicken" from the German grammar.

+1  A: 

Have you tried using XPath?

http://en.wikipedia.org/wiki/XPath_1.0
http://w3schools.com/XPath/xpath_syntax.asp

It's also a little difficult to parse exactly what you're trying to do from the description. It might help if you pasted some example subset of the XML documents in question.

EDIT:

Here is a potential XPath query to get "NUMBER5" entries (warning, untested):

/GRAMMAR/RULE[@NAME='NumberList']/LN[@PROPNAME='numberCommand']/PN[@VAL='NUMBER5']

Here's some example PHP code to actually make use of it:

$xml = new SimpleXMLElement($xmlstring);
$result = $xml->xpath(
    "/GRAMMAR/RULE[@NAME='NumberList']".
    "/LN[@PROPNAME='numberCommand']/PN[@VAL='NUMBER5']");

foreach($result as $xmlelement)
    echo (string) $xmlelement;

However, I can't see how to retrieve appropriate values for ChooseSynonyms, unless they're supposed to be randomized, in which case I would just retrieve them all and then pick one at random from the code side.

Luke Dennis
Sounds like a case for using XSLT, possibly to generate XSLT...
Murph
except that (as far as I understand xslt), the trees have to have the same structure; that isn't the case for the documents in question. (They generate the same attributes, but via different traversals.)
Eric Brown
A: 

So what I've decided to do is to traverse the grammar rules directly (using the parsed form, not the XML) and use a set containing the semantic tags. When I reach a node containing semantic information, I select the node that matches the appropriate semantic tag (and remove the match from the set); otherwise, I make a transition at random. When I reach the end node, I verify that the set is empty; if not, it's an error (I've generated a valid reco that doesn't have all the required tags).

Eric Brown