Bigdata bubble – New era in application space
If you live in Australia you probably have heard about the Mining Boom, so was the famous “DotCOM bubble” in way back in 199x. These are the pattern we’ll see all our life. Many of you have probably have heard recently new buzzword “bigdata” that is being used in IT data mining space. Well google, Yahoo and other biggie have been using this technology for years now, but why this bubble coming at the boom level now? Well, that is a very good question. We all have so many devices these days and end user is generating data at the explosion rate that it seems like there is nothing to control it or store it. Most of the database (SQL/MYSQL/ORACLE) has limitations on what they can or can’t store. Most of *SQL are designed for managing data (i.e. You know raw and columns in advance). So what we see in our daily life, has become the company’ that you work or ISP that you connect to store data their problem. The data are massive, those shiny smart phone is the major culprits of data explosion. So it actually becomes an ISP or enterprise’s issue to store and run report of such data that cannot be managed or stored in our existing SQL based databases.
How do we solve this issue now? Well, the answer is simple – bigdata/Hadoop solves this issue. Provide parallel scalability. This is exactly what ISP’s and enterprises are looking for these days. End of the day who want to through their logs generated from such a device that you’ll definitely use it in future. I think nobody wants to through their data.
Hadoop based bigdata solution solves this issue. It seems to be a brand new technology trend for many enterprises and Telco’s. They need this Hadoop based solution to solve their bigdata problem. As we all know Cloud computing has made life so easy for IT manager, it only takes a few clicks to order a compute node. It looks like now we’ll be doing clouds forever! End of the day who want to wait for the delivery of their servers, rack-and-stack and then build. It was really taking forever to get a server up and running.
Obviously, bigdata is totally new era in an application space after cloud boom and not only solve the issue, but opens up new job for IT workers.. This is especially good for today’s tight economy where many people are looking for work. Specially this is very good news for IT industry, we’ll have more domain to taste or to work with. Most big data products are based on Hadoop, Splunk, Cloudera and uses smarter algorithms to index the data sets and present it on to human readable format to IT analysts. The Hadoop dominates the big data space. Spunk and Cloudera both are making top products available today and are based on Hadoop magic. I was lucky enough to get my hands dirty on early stage of Hadoop implementation. A year ago, before this bungle started, I had an opportunity to deploy 37 nodes Hadoop cluster for parallel processing of un-structured data. I know the hurdles and challenges we went through in deploying in this domain. I think Hadoop codes have matured over the period of time. IT engineers who work in the big data domain have the following titles/roles:
|!||1. The Data Scientist ( equivalent to Cisco CCIE)|
|2. The Data Analyst (equivalent to Cisco CCNA)|
Data scientist – you may be joking here! Nope! Keep reading…
Here are my thoughts on these newly defined roles for folks who are or (will be) working in bigdata domain. You are probably thinking that ‘data scientist’ title seems to be cool. Well, who are these data, scientists? Are they an industry accelerated PhDs’ or nerd with multiple masters degree or folks who have no university degrees? Well, the simple answer is “anybody”, data scientist role or title doesn’t need a PhD degree. Funny, but this very true! The fact is many IT workers do not even have a university degree, but they are still working in IT. Personally, I love the job title or term “Data Scientist”. It has certainly made the folks who are really smart and working in IT industry (without any degrees) job title glamorous! It has given both name and fame in the role. Don’t get me wrong here, but many organizations have started hiring a data scientist to solve their structured and non-structured data problem. Mostly, data scientists work on futuristic products. New product development requires some data to correlate inputs that come from big data co-relation.
DJ Patil, the co-inventor of this term defines the data scientist as:
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”
Jake Porway, Data without Borders and the New York Times defines it as:
“A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistician who knows how to derive insights from the information within. S/he combines the skills to create new prototypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds.” Data Scientist needs strong data skills, strong knowledge of statistics and ability to program algorithms.
WHO HIRES DATA SCIENTISTS?:
Anyone invests in BigData products – Big Banks, Telecos, Manage IT service companies etc. Will need these newly created titles.
Mind you, the data, scientists are not cheap to hire! There could be an argument, but end of the day you get what you pay.
Now, who are these so called ‘data analyst’? Keep on reading…..
Data Analyst – Every big data problem doesn’t need a data scientist. Now if you are just starting in big data domain, you might have to start your career as Data Analysis. Remember when you first started career its IT after university and joined the network operating center (NOC)? On a NOC engineer role – first few months at job, night shifts and tame at looking at the screen for SNMP/SYSLOG Traps? Well, this role will not be like that. Data analyst role is a bit upgraded version of that, but you’ve the opportunity to become a data scientist. That means, the quicker you learn, the quicker you hip pocket fills with $$$. Not to mention but an opportunity to learn from data scientists as well. Every analyst needs to be able to tell and sell his story from the insights that come out of big data analysis. A data analyst is not expected to have programming skills to build algorithms, but needs strong SQL skills in addition to better understanding of analytics packages. Typically, data analysis engineer is cheaper than data scientist.
BIGDATA POTENTIAL MARKET:
1. Online Platform companies.
2. Content sites
3. Big Banks – fraud detection, app logs, data correlation at all.
4. Parallel processing of data that cannot be processed by traditional databases (SQL, Oracle, Informix et. al)
5. Share market – and a list goes on! Infinite possibilities.
Not to mention but the original user or abuser of Hadoop has been using it for ages! – Yes LinkedIN, Facebook, Google, Yahoo. Please leave your comments. What do you think of this new buzz word! In the next post (when? Probably when this bubble is gone), I will cover building career in the big data domain.