Suppose you have a repository of 10,000 function names and possibly their frequency of use in a corpus of code which can be in C/C#/C++. (they have different conventions usually prescribed)
Some Samples may be:
DoPaint
OnPaint
CloseWindow
DeleteGraphOnClose
FreeConnection
ConnectInternat (smallTypo, but part of code)
FreeSoH
Now given a function name, how can we predict if the name follows the convention of Human Generated Name?
Note:
- Obviously all candidate names will be valid names
- generated names can have arbitrary characters and will be treated as bad
- Letter cases can get garbled up
Some candidates:
Z090292 - not likely
onDelete - likely
CloseWindow - likely
iGetIndex - unlikely
Any pointers on technique and software are welcome