views:

102

answers:

2

I'm trying to create a discriminated union for part of speech tags and other labels returned by a natural language parser.

It's common to use either strings or enums for these in C#/Java, but discriminated unions seem more appropriate in F# because these are distinct, read-only values.

In the language reference, I found that this symbol ``...`` can be used to delimit keywords/reserved words. This works for

type ArgumentType =
| A0 // subject
| A1 // indirect object
| A2 // direct object
| A3 //
| A4 //
| A5 //
| AA //
| ``AM-ADV``

However, the tags contain symbols like $, e.g.

type PosTag =
| CC // Coordinating conjunction
| CD // Cardinal Number
| DT // Determiner
| EX // Existential there
| FW // Foreign Word
| IN // Preposision or subordinating conjunction
| JJ // Adjective
| JJR // Adjective, comparative
| JJS // Adjective, superlative
| LS // List Item Marker
| MD // Modal
| NN // Noun, singular or mass
| NNP // Proper Noun, singular
| NNPS // Proper Noun, plural
| NNS // Noun, plural
| PDT // Predeterminer
| POS // Possessive Ending
| PRP // Personal Pronoun
| PRP$ //$ Possessive Pronoun
| RB // Adverb
| RBR // Adverb, comparative
| RBS // Adverb, superlative
| RP // Particle
| SYM // Symbol
| TO // to
| UH // Interjection
| VB // Verb, base form
| VBD // Verb, past tense
| VBG // Verb, gerund or persent participle
| VBN // Verb, past participle
| VBP // Verb, non-3rd person singular present
| VBZ // Verb, 3rd person singular present
| WDT // Wh-determiner
| WP // Wh-pronoun
| WP$ //$ Possessive wh-pronoun
| WRB // Wh-adverb
| ``#``
| ``$``
| ``''``
| ``(``
| ``)``
| ``,``
| ``.``
| ``:``
| `` //not sure how to escape/delimit this

``...`` isn't working for WP$ or symbols like (

Also, I have the interesting problem that the parser returns `` as a meaningful symbol, so I need to escape it as well.

Is there some other way to do this, or is this just not possible with a discriminated union?

Right now I'm getting errors like

  • Invalid namespace, module, type or union case name
  • Discriminated union cases and exception labels must be uppercase identifiers

I suppose I could somehow override toString for these goofy cases and replace the symbols with some alphanumeric equivalent?

+1  A: 

The spec doesn't seem clear about what characters are allowed to be escaped in double-backticks in what contexts.

I think your best bet is to use standard identifiers for the DU cases, and override ToString as you suggest.

Brian
+1  A: 

From my experience, double-backtick marks identifiers are/seem to be fully supported only in let Bindings or type members. So that means you can put about any sequence of characters inside (excepting the @ character which is reserved for F# codegen).

When you want to use them as identifiers in module, type or DU cases definition, it doesn't play as nice since some characters are not supported.

E.g. ., /, *, +, $, [, ], \ or & generate an "Invalid namespace, module, type or union case name" error.

Stringer Bell