tags:

views:

69

answers:

3

I have a lost of sentences generated from http://www.ywing.net/graphicspaper.php, a random computer graphics paper title generator, some of example sentences sorted are as following:


  • Abstract Ambient Occlusion using Texture Mapping
  • Abstract Ambient Texture Mapping
  • Abstract Anisotropic Soft Shadows
  • Abstract Approximation
  • Abstract Approximation of Adaptive Soft Shadows using Culling
  • Abstract Approximation of Ambient Occlusion using Hardware-accelerated Clustering
  • Abstract Approximation of Distributed Surfaces using Estimation
  • Abstract Approximation of Geometry for Texture-mapped Ambient Occlusion
  • Abstract Approximation of Mipmaps for Opacity
  • Abstract Approximation of Occlusion Fields for Subsurface Scattering
  • Abstract Approximation of Soft Shadows using Reflective Texturing
  • Abstract Arbitrary Rendering
  • Abstract Attenuation and Displacement Mapping of Geometry
  • Abstract Attenuation of Ambient Occlusion using View-dependent Texture Mapping
  • Abstract Attenuation of Light Fields for Mipmaps
  • Abstract Attenuation of Non-linear Ambient Occlusion
  • Abstract Attenuation of Pre-computed Mipmaps using Re-meshing

    - ...

I would like to try reverse engineering the grammar behind and learn how to do it in some sort of ways, like in common lisp way or NLTK way. Any ideas about that?

-- Drake

A: 

This seems to be an interesting problem. How ever, I was under the impression that it is not easy to guess a generator from it's generated sequence of bits. What you can get is a model that may be or may not be a close approximation of the original generator. The approximation will be closer when a large number of sequences generated is processed.

A simple technique would be to create a parse tree and create a vocabulary in each portion of the tree.

Some thing like this:

  Abstract
  |--------|
           |Ambient , Anisotropic,(Approximation, Attenuation)
                                        |
                                        of
                                        |
                                   xxxx      yyyy
                                     |         |
                                   using       for

xxxx -> list of vocabularies

yyyy -> list of vocabularies

pyfunc
A: 

There are approaches to learning grammar of a language given a number of sentences based on genetic programming. E.g., Learning Context-Free Grammars using an Evolutionary Approach.

Also wikipedia lists some other approaches.

dmitry_vk
A: 

You may be interested in Alignment-Based Learning by Menno van Zaanen. It has been years since I read his papers, but the basic idea is to

  1. find a common substring
  2. assign it a grammar rule
  3. rewrite the text to use this rule
  4. check whether rewritten-text+grammar is shorter than original-text.

Run this for all combinations of all common substrings to find the best grammar.

This is a bit like what an optimal compression algorithm would do. The theory behind it is Minimum Description Length.

Nathan Sanders