Is there a list of language codes in YAML or JSON somewhere out there?
Another format is fine, I can convert it if necessary.
Is there a list of language codes in YAML or JSON somewhere out there?
Another format is fine, I can convert it if necessary.
It is available in HTML via the link you have posted in your question :) Seriously, if that list in Wikipedia is complete, than it is easy to grab it using lxml.html (in Python) or any similar library in your favorite language.
Checkout source code of Wikipedia entry.
It's a very simple format - table cells are separated by ||
. That's much easier to parse than HTML.
I think the United Nations or the ISO actually publish that list in CSV format. That would be the ultimate source.
However, I'm not sure if they publish it for free.
EDIT: Actually, the link is in the Wikipedia article you linked to. The US Library of Congress has been designated the official registration authority by the ISO and they publish the entire, official, up-to-date list as a trivial to parse text file for free.
The format looks like this:
ara||ar|Arabic|arabe arc|||Official Aramaic (700-300 BCE); Imperial Aramaic (700-300 BCE)|araméen d'empire (700-300 BCE) arg||an|Aragonese|aragonais arm|hye|hy|Armenian|arménien arn|||Mapudungun; Mapuche|mapudungun; mapuche; mapuce arp|||Arapaho|arapaho art|||Artificial languages|artificielles, langues arw|||Arawak|arawak asm||as|Assamese|assamais ast|||Asturian; Bable; Leonese; Asturleonese|asturien; bable; léonais; asturoléonais ath|||Athapascan languages|athapascanes, langues
That's 5 fields separated by vertical bars:
So, this is actually in CSV format, if you interpret that as character separated values instead of comma separated values, which most CSV parsers let you do.