What are the relevant skills in the arsenal of a Data Scientist? With new technologies coming in every day, how does one pick and choose the essentials?
A few ideas germane to this discussion:
- Knowing SQL and the use of a DB such as MySQL, PostgreSQL was great till the advent of NoSql and non-relational databases. MongoDB, CouchDB etc. are becoming popular to work with web-scale data.
- Knowing a stats tool like R is enough for analysis, but to create applications one may need to add Java, Python, and such others to the list.
- Data now comes in the form of text, urls, multi-media to name a few, and there are different paradigms associated with their manipulation.
- What about cluster computing, parallel computing, the cloud, Amazon EC2, Hadoop ?
- OLS Regression now has Artificial Neural Networks, Random Forests and other relatively exotic machine learning/data mining algos. for company
Thoughts?