views:

1400

answers:

7

I'm a business major, two-thirds of the way through my degree program, with a little PHP experience, having taken one introductory C++ class, and now regretting my choice of business over programming/computer science.

I am interested in learning more advanced programming; specifically C, and eventually progressing to using the CUDA architecture for artificial neural network data analysis (not for AI, vision, or speech processing, but for finding correlations between data-points in large data sets and general data/statistical analysis).

Any advice about how I should start learning C? As well as ANN/Bayesian technology for analyzing data? There are so many books out there, I don't know what to choose.

Since CUDA is fairly new, there doesn't seem to be much learner-friendly (i.e. dumbed-down) material for it. Are there learning resources for CUDA beyond the NVIDIA documentation?

Further, what resources would you recommend to me that talk about GPGPU computing and massively parallel programming that would help me along?

A: 

If you're looking for a friendly introduction to parallel programming, instead consider Open MPI or Posix Threading on a CPU cluster. All you need to get started on this is a single multi-core processor.

The general consensus is that multi-programming on these new architectures (gpu, cell, etc) have a way to go in terms of the maturity of their programming models and api's. Conversely, Open MPI and PThreads have been around for quite a while and there are lots of resources around for learning them. Once you have gotten comfortable with these, then consider trying out the newer technologies.

While there's certainly programming interfaces for many other languages, C is probably the most common modern language (Fortran and Pascal are still kicking around in this area) in use in high performance computing. C++ is also fairly popular though, several Bioinformatics packages use this. In any case, C is certainly a good starting place, and you can bump up to C++ if you want more language features or libraries (will probably be at the cost of performance though).

Dana the Sane
Michael J Quinn's book on Parallel Programming is also a good start
nairdaen
+4  A: 

I don't recommend trying to learn CUDA first since it's a new technology and you don't have much background in programming.

Since you don't have much experience in C (or C++), CUDA will be a pain to learn since it lacks maturity, libs, nice error messages, etc.

CUDA is meant for people who are familiar with C (C++ experience helps too) and have a problem which needs performance improvement by recoding or rethinking the solution of a well known problem.

If you're trying to solve "ANN/Bayesian" problems I would recommend creating your solution in C++ or C, your choice. Don't bother about creating threads or multithreading. Then, after evaluation the response times of your serial solution try to make it parallel by using OpenMP, Boost threads, w/e. After this, if you still need more performance, then I would recommend learning CUDA.

I think these are valid points because CUDA has some pretty cryptic errors, hard to debug, totally different architecture, etc.

If you're still interested, these are some links to learn CUDA:

Online courses:

Forum (the best source of information):

Tools:

Problems solved in CUDA:

Edison Gustavo Muenz
Thanks for your advice. I know that learning CUDA is a big undertaking for someone without much (or in my case, hardly any) C/C++ experience, and I'll likely start out with some of the other, more novice-friendly, things you recommended.
Kyle Lowry
A: 

If you are interested in data mining, you might also want to look at the open source system called Orange. It is implemented in C++ but it also supports end-user programming in Python or in a visual link-and-node language.

I don't know if it supports NNs but I do know people use it for learning datamining techniques. It supports stuff like clustering and association rules.

(Also, in case you didn't know about it, you might want to track down somebody in your B-school who does operations management. If you're interested in CS and datamining, you might find likeminded people there.)

Gabe Johnson
A: 

Link: gpgpu.org Has some interesting discussion

TokenMacGuy
+3  A: 

Dear Kyle,

You've expressed 3 different goals:

  • Learning to program in C
  • Learning to write code for the CUDA platform
  • Learning to use Bayes' Nets and/or Neural nets for data analysis

Firstly: these things are not easy for people who already have several degrees in the field. If you only do one, make sure to learn about Bayesian inference. It's by far the most powerful framework available for reasoning about data, and you need to know it. Check out MacKay's book (mentioned at the bottom). You certainly have set yourself a challenging task - I wish you all the best!

Your goals are all fairly different kettles of fish. Learning to program in C is not too difficult. I would if at all possible to take the "Intro to Algorithms & Data Structures" (usually the first course for CS majors) at your university (it's probably taught in Java). This will be extremely useful for you, and basic coding in C will then simply be a matter of learning syntax.

Learning to write code for the CUDA platform is substantially more challenging. As recommended above, please check out OpenMPI first. In general, you will be well-served to read something about computer architecture (Patterson & Hennessy is nice), as well as a book on parallel algorithms. If you've never seen concurrency (i.e. if you haven't heard of a semaphore), it would be useful to look it up (lectures notes from an operating systems course will probably cover it - see MIT Open Courseware). Finally, as mentioned, there are few good references available for GPU programming since it's a new field. So your best bet will be to read example source code to learn how it's done.

Finally, Bayesian nets and Neural nets. First, please be aware that these are quite different. Bayesian networks are a graphical (nodes & edges) way of representing a joint probability distribution over a (usually large) number of variables. The term "neural network" is somewhat vaguer, but generally refers to using simple processing elements to learn a nonlinear function for classifying data points. A book that gives a really nice introduction to both Bayes' nets and Neural nets is David J.C. MacKay's Information Theory, Inference and Learning algorithms. The book is available for free online at http://www.inference.phy.cam.ac.uk/mackay/itila/. This book is by far my favorite on the topic. The exposition is extremely clear, and the exercises are illuminating (most have solutions).

Dan
Thanks for your answer. I'm always up for a challenge. :)
Kyle Lowry
A: 

thanks for the heads up about CUDA. I thought it seemed rather cryptic, but I will continue to work learning C and wait to see what happens with CUDA. However, I have downloaded and installed yellow dog linux for cuda along side Ubuntu and Windows. It is nice for Cuda with Linux, but be wary of the wireless support. I still would like to learn how to make the N-Body example, it is so awesome. Thanks again. Take CAre

-tertl3

William
A: 

The latest CUDA releases (3.1, 3.2) have a full featured set of functions called CuBLAS that will handle multi-coring matrix operations for you on single card setups. Paralleling the backproagation will be a bit more of a challenge, but I'm working on it.

jwilson75503