Bigdata bubble – New era in application space
If you live in Australia you probably have heard about the Mining Boom, so was the famous “DotCOM bubble” in way back in 199x. These are the pattern we’ll see all our live. Many of you have probably have heard recently new buzzword “bigdata” that is being used in IT data mining space. Well google, Yahoo and other biggie has been using this technology for years now but why this bubble coming at the boom level now? Well, that is very good question. We all have so many device these days and end user is generating data at the explosion rate that it seems like there is nothing to control it or store it. Most of the database (SQL/MYSQL/ORACLE) have limitation on what they can or can’t store. Most of *SQL are designed for managed data (i.e. you know raw and columns in advance). So what we are seeing in our daily life, has became the company’ that you work or ISP that you connect to store data their problem. The data is massive, those shiny smart phone are the major culprits of data explosion. So actually it becomes ISP or enterprise’s issue to store and run report of such data that can not be manage or stored by our existing SQL based databases.
How do we solve this issue now? Well, the answer is simple – bigdata/Hadoop solves this issue. Provide parallel scalability. This is exactly what ISP’s and enterprises are looking for these days. End of the day who want to through their logs generated from such a device that you’ll definitely use it in future. I think nobody want to through their data.
Hadoop based bigdata solution solve this issues. It seems to be a brand new technology trend for many enterprises and Telco’s. They need this hadoop based solution to solve their bigdata problem. As we all know Cloud computing has made life so easier for IT manager, it only takes few clicks to order a compute node. It looks like now we’ll be doing clouds forever! End of the day who want to wait for the delivery of their servers , rack-n-stack and then build. it was really taking forever to get a server up and running.
Obviously, bigdata is totally new era in an application space after cloud boom and not only solve the issue but opens up new job for IT workers.. This is specially good for todays’ tight economy where many people are looking for work. Specially this is very good news for IT industry, we’ll have more domain to taste or to work with. Most big data products are based on Hadoop, Splunk, Cloudera and uses smarter algorithm to index the data sets and present it onto human readable format to IT analysts. The Hadoop dominate the big data space. SPlunk and Cloudera both are making top products available today and are based on Hadoop magic. I was lucky enough to get my hands dirty on early stage of Hadoop implementation. A year ago, before this bungle started, I had an opportunity to deployed 37 nodes Hadoop cluster for parallel processing of un-structured data. I know the hurdles and challenges we went through in deploying in this domain. I think Hadoop codes have matured over the period of time. IT engineers who works in big data domain have the following titles/roles:
|!||1. The Data Scientist ( equivalent to Cisco CCIE)|
|2. The Data Analyst (equivalent to Cisco CCNA)|
Data scientist – you may be joking here! Nope! Keep reading…
Here are my thoughts on these newly defined roles for folks who are or (will be) working in bigdata domain. You are probably thinking that ‘data scientist’ title seems to be cool. Well, who are these data scientists? are they an industry accelerated PhDs’ or nerd with multiple masters degree or folks who have no university degrees? Well, the simple answer is “anybody”, data scientist role or title doesn’t need a PhD degree. Funny but this very true! The fact is many IT workers do not even have university degree but they are still working in IT. Personally I love the job title or term “Data Scientist”. It has certainly made the folks who are really smart and working in IT industry (without any degrees) job title glamorous! It has given both name and fame to the role. Don’t get me wrong here but many organisations have started hiring a data scientist to solve their structured and non-structured data problem. Mostly, data scientists work on futuristic products. New product development requires some data to correlate inputs that comes from big data co-relation.
DJ Patil , the co-inventor of this term defines data scientist as:
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”
Jake Porway, Data without Borders and the New York Times defines it as:
“A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistican who knows how to derive insights from the information within. S/he combines the skills to create new protoypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds.” Data Scientist needs strong data skills, strong knowledge of statistics and ability to program algorithms.
WHO HIRES DATA SCIENTISTS?:
Anyone invests in BigData products – Big Banks, Telecos, Manage IT service companies etc. will need these newly created titles.
Mind you, the data scientists are not cheap to hire! There could be an argument but end of the day you get what you pay.
Now, who are these so called ‘data analyst’? keep reading…..
Data Analyst – Every big data problem doesn’t need a data scientist. Now if you are just starting in big data domain, you might have to start your career as Data Analysis. Remember when you first started career it IT after university and joined network operating center (NOC) ? on a NOC engineer role – first few months at job, night shifts and tame at looking at the screen for SNMP/SYSLOG Traps? Well this role will not be like that. Data analyst role is a bit upgraded version of that but you’ve opportunity to become a data scientist. That means, the quicker you learn, the qucker your heap pocket fills with $$$. Not to mention but an opportunity to learn from data scientist as well. Every analyst needs to be able to tell and sell his story from the insights that come out of big data analysis. A data analyst is not expected to having programming skills to build algorithms, but needs strong SQL skills in addition to good understanding of analytics packages. Typically data analysis engineer is cheaper than data scientist.
BIGDATA POTENTIAL MARKET:
1. Online Platform companies.
2. Content sites
3. Big Banks – fraud detection, app logs, data correlation et all.
4. Parallel processing of data that can not be processed by traditional databases (SQL,Oracle, Informix et. all)
5. Share market – and a list goes on! Infinite possibilities.
Not to mention but the original user or abuser of Hadoop are using it for ages! – Yes LinkedIN, FaceBook, Google, Yahoo. Please leave your comments. What do you think of this new buzzworld! In the next post (when? probably when this bubble is gone), I will cover building career in big data domain.