Posts Tagged ‘what is bigdata; bigdata job; data scientist; what is bigdata scientist; what is data scientist; what is data analyst’

If you live in Australia you probably have heard about the Mining Boom, so was the famous “DotCOM bubble” in way back in 199x . Recently, this new buzzword “BigData” in IT data mining space seems to be a new technology trend for many enterprises and Telco’s. They need bigdata to be implemented to solve their traditional issues ASAP. As we all know Cloud computing has made life so easier and now it looks like we’ll be doing clouds forever! End of the day who want to wait for the delivery of their servers , rack-n-stack and then build. Takes forever!.  Obviously, this is totally new era in application space after the cloud boom that bring up new IT jobs and solves traditional problem of big data sets. Hue..? new jobs! good for IT industry, we’ll never be out of the job. Most big data products are based on Hadoop, Splunk, Cloudera and uses smarter algorithm to index the data and present it onto human readable format to IT analysts. The Hadoop dominate the big data space. SPlunk and Cloudera both top products that are availabile today are based on Hadoop. I have some experience with Hadoop deployment. A year ago (before this buggle started) I had an opportunity to deployed 37 nodes Hadoop cluster for parallel processing of un-structured data. I know the hurdles and challenges we went through in deploying in this domain. I think Hadoop codes have matured over the period of time. IT engineers who works in big data domain have the following titles/role:
1. DATA SCIENTIST  (probably eq. to CCIE?)
2. DATA ANALYST (probably eq. to CCNA)

Data scientist – you may be joking here! Nope! Keep reading…
Here are my thoughts on these newly defined roles for folks who are or will be working on Big Data domain. Who are these data scientists working on Big Data? An industry accelerated PhD’s?, with multiple masters degree?, folks who have no university degrees? Well, the simple answer is “anybody”, it doesn’t need a PhD degree to get a title of “Scientist”. Funny but this very true! Personally I love the job title or term “Data Scientist”. It has certainly made the folks who are really smart and working in IT industry (without any degrees) job title glamorous! It has given both name and fame to the role. Don’t get me wrong here but many organizations have started hiring a data scientist to solve their structured and non-structured data problem. Mostly, data scientists work on futuristic products. New product development requires some data to correlate inputs that comes from big data co-relation.
DJ Patil , the co-inventor of this term defines data scientist as:
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”
Jake Porway,
Data without Borders and the New York Times defines it as:
“A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistican who knows how to derive insights from the information within. S/he combines the skills to create new protoypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds.” Data Scientist needs strong data skills, strong knowledge of statistics and ability to program algorithms.
Anyone invests in BigData products – Big Banks, Telecos, Manage IT service companies etc.
Data Scientists are not cheap either! Obviously, you get what you pay.

Data Analyst – Every big data problem doesn’t need a data scientist. Now if you are just starting in big data domain, you might have to start your career as a Data Analysist. Remember your career on a NOC engineer role, first few months at job, night shifts and tame at looking at the screen for SNMP/SYSLOG Traps? Well this role will be a bit upgraded version of that but you’ve opportunity to become a data scientist. Not to mention but an opportunity to learn from data scientist as well. Every analyst needs to be able to tell and sell his story from the insights that come out of big data analysis. A data analyst is not expected to having programming skills to build algorithms, but needs strong SQL skills in addition to good understanding of analytics packages. Typically data analysis engineer is cheaper than data scientist.


1. Online Platform companies.
2. Content sites
3. Big Banks – fraud detection, app logs, data correlation et all.
4. Parallel processing of data that can not be processed by traditional databases (SQL,Oracle, Informix et. all)
5. Share market – and a list goes on! Infinite possibilities.
Not to mention but the original user or abuser of Hadoop are using it for ages! – Yes LinkedIN, FaceBook, Google, Yahoo.
Please leave your comments. What do you think of this new buzzworld! In the next post (when? probably when this bubble is gone), I will cover building career in big data domain.

Cheers, Push
2xCCIE (Voice/Security)