views:

191

answers:

3

Currently I am thinking and searching for the adequate programming languages for a project (more than one programming language will be used).

Questions: Is there a big mistake presented in my list (other than my belief of go as a suitable language)? Any tipps before I start?

project: opensource project with semantic web (including rdf/owl, topicmaps), web applications and services, machine learning. distributed system. working version by late August 2011 mathematics/logic and speed is important, as well as fast network transfer data will be up to 64GB

requirements: 1) open source languages only

my beliefs (which are subjective but will probably/quite certainly not change): 1) go is a good programming language at the moment/future. this is my belief and 'bet' at the moment.

Programming Languages in this precedence:

A1) go (golang.org) reasons: my belief, should be fast in the future (now execution time ist about twice as long as a java programm) fast growing community. hopefully improves in the next 12 months. used as a general programming language and for web related stuff. (I want to get away from the java virtual machine, partly because of my beliefs partly because of my privious experience)

A2) java resons: there are many important programs and libraries available, e.g. Jena, Pellet, XML Calabash, etc. go is given a strong precedance, but second choice is java.

A3) haskell reason: if haskell is not too slow and a functional approach is a good choice, haskell is chosen. I think go and haskell together is a better option than erlang or other functional programming languages. (this may present a believe)

A4) R reason: for problems where mathematics and statistic count, e.g. maschine learning. there are very good packages available for this in R. R is partly more a collection of packages than a programming language.

B1) C / C++ reason: if for some subsystem written in go/java/haskell/R is to slow, C and C++ are considered.

B2) Prolog reason: I think in a lot of situations, haskell is a better option, but prolog is supposed to be faster in some cases and also a good option. there is are many libraries available for prolog.

C1) Python reason: I think go and the rest of the mentioned languages are a better option. could be used for some natural language processing.

C2) Perl

C3) PHP good for cms integration, e.g. drupal

  • mathematia: is very suitable where R has limits. but it's not open source and many researchers back R.

  • Ruby: I am not familiar with Ruby therefore it is not included, some of my friends swear on it.

Not considers because of the requirement to use open source languages: C#, F#, etc.

some PAPERS and LINKS, that could be interesting: http://prs.ism.ac.jp/web/packages/sets/sets.pdf (if you search for the title with quotes you can find the coresponding paper)

http://finzi.psych.upenn.edu/R/library/sets/html/fuzzyinference.html
http://cran.r-project.org/web/views/MachineLearning.html

http://tmra.de/2010/documents/TMRA2010_proceedings.pdf
Three papers are possibly interesting: Topic Maps Graph Visualization & Suggested GTM (it's use to transfrom a natural question to an answer), A new approach to semantic integration, Defining Domain-Specific Facets for Topic Maps with TMQL Path Expressions. At least those papers are free to download ;-)

+1  A: 

Most of my books on semantic processing focus on Java, Perl or Python. It is my belief that basing a project on so many different languages means that it is far beyond a single person project, so I'd question your statement "more than one programming language will be used."

If this is an academic project, I'd be interested in seeing even a draft of your papers and your reading list.

As for R, I know I must be doing something wrong, but I tend to use it (like Matlab) to mock up and test some algorithm that I'd later implement in some other programming language.

Tangurena
thank you for your answer. I plan that this project involves 2-4 persons. java will be definitely be used, but I plan to build a wrapper with REST-http around the some existing libraries and use go and/or other languages as well. (go to present results on the web, like query-results). I don't have a complete reading list at the moment, but I would be happy to share it. One part: I would like to use Bayesian inference/learning and I hope R has some support for it. (ps: I did include Perl in the list)
mrsteve
http://tmra.de/2010/documents/TMRA2010_proceedings.pdfThree papers are possibly interesting: Topic Maps Graph Visualization -)
mrsteve
+3  A: 

Haskell has a significant semantic web project in Swish. Swish is unfortunately somewhat bitrotted, it appears, and certainly doesn't take advantage of much new in the Haskell world since it was written some time ago. However, the code should provide a good starting point, and the author's motivation of Haskell as an implementation language stands up well: http://www.ninebynine.org/RDFNotes/Swish/Intro.html#WhyHaskell

Haskell is a pure functional programming language. This means that it has no language constructs that cause side-effects, and Haskell expressions exhibit referential transparency (their value does not depend on some unknown "state"). This gives rise to greater semantic clarity, reducing the possibility of surprises when dealing with complex information. It also means that expression values are not dependent on order of evaluation, a significant feature when dealing with RDF information that may be incoming from a variety of sources.

Haskell's pure functional style makes it amenable to formal semantic analysis; it thus provides an analyzeable and testable (executable) way to extend RDF semantics, and to bridge the gap between logical theory and running software.

My own past work on using RDF inference tools has exposed a need for access to full programming language capabilities when developing applications using on RDF inference, for incorporating new datatype inference primitives into the application, but without the attendant complexities and uncertainties of a conventional programming language. The non-imperative Haskell programming style is a good match with the declarative nature of semantic web deduction.

Haskell embodies a very powerful type system for all values that can be expressed (based on the "Hindley-Milner" type system). It is fully type-secure, statically-checked, and polymorphic (meaning that functions, and indeed simple data values, can have many different types, and hence be used safely in conjunction with a range of types of other values). Many conventional languages have strong typing, but require some of the type guarantees to be sacrificed in order to write generic code, typically exposing a possibility of run-time datatype violation errors. A Haskell function can be written to support a wide range of different uses while retaining full type security, checked at compile-time. In practical terms, this leads to more reliable software.

Many Semantic Web systems and proposals make use of Prolog-like logic programming techniques. Haskell's lazy evaluation model makes it very easy to write programs that mimic Prolog's search-and-backtrack style of evaluation, and also to embed these computations in a framework that smoothly handles features that logic programming does not handle so well. (The RDF graph query logic in Swish is an example of this.)

Higher order functions as first class values provide an easy way to represent and manipulate relationships between values, and inference processes that operate on such values. Inferences processes described in RDF can be "compiled" to higher order functions in Haskell. Mixing functions (computable values) and conventional data to represent information provides for smoother integration between the RDF language and software that manipulates it.

The same author, who has worked on W3C RDF recommendations, also has another repo of Haskell RDF material, although I'm not sure about the relation: http://www.mail-archive.com/[email protected]/msg07446.html

There's also rdf4h, which is much more current, although perhaps not as rich: http://protempore.net/rdf4h/doc/

In addition, hxt, which is one of the two leading full-featured xml libraries for Haskell, which is under continued development and support, has been used for RDF processing as well. See "A Cookbook for the Haskell XML Toolbox with Examples for Processing RDF Documents": http://www.fh-wedel.de/~si/HXmlToolbox/cookbook/doc/thesis.pdf

I don't know of a great deal of machine learning libraries in haskell, but there's a support vector machine implementation, as well as a few neural network libraries. Haskell also has a strong collection of web libraries, both for interaction with web services and for hosting them, as well as a very active developer community in the area.

Go, Java or Haskell will all provide performance within an order of magnitude of C/C++, and all three will allow much easier parallelism, and Haskell especially so.

sclv
haskell and what you can do with it (agda) seems best. thanks for all answers.
mrsteve
+3  A: 

Are you wanting to do stuff with semantic data, or are you in it to write semantic data libraries? My sense is the former (although reinventing the wheel is always sooooo tempting), in which case JRuby gives a great set of options because so much quality RDF code is written in Java. You can write your project-specific code in a more-or-less familiar imperative style (because some of us like side effects) and call out to the metric butload of java libraries to do the heavy lifting right from within your Ruby code (it's as easy as "require 'some_library.jar'").

You might even want to look at neo4j (http://github.com/andreasronge/neo4j) depending on which way you end up going; they have nice JRuby wrapper around their stuff already.

Bill Dueber
Representing the functional angle, I'd think clojure would be a far better fit than jruby for working with Java RDF libraries.
sclv
thanks for the feedback. I know JRuby and Java would probably be a good option and would give fast results as well. same for clojure. However I plan to use existing data libraries, write a java wrapper them and export data to REST/xml and google protcol buffers. Then to use this data in any programming language.
mrsteve