Big Data holds the key to wisdom and prudence. With good enough data, one can slice through and glean useful bits of information. With intelligent extrapolation of available data, future can be as clear as the night sky under brightly lit stars. Trends, shape of things to come, tomorrow, a week from now; all can be modeled, predicted and visualized. Many useful and highly prudent insights buried under mountains of data and, therefore, unavailable, can be made to surface through smart queries, resulting in co-relations hitherto un-imagined.
The science of data analysis, aggregation and visualization is in itself a huge area. It’s applications are even bigger and much diverse; health care, financial forecasting, business modeling, design, climate modeling, weather forecasting, custom drug designing, etc., are all open to it.
With such a diverse application portfolio and cost effective ways of using it, Big Data is bound to explode; it is already growing rather exponentially. As such shortage of expertise and skills in this domain is very clearly written on the wall. Over the next three years, it is expected that USA alone will face an expertise shortage to the tune of 190,000- a huge gap by any standard.
Big Data and Udacity
In this backdrop Udacity’s recent announcement of courses in the area of data sciences is not only a welcome initiative but a timely one too. A one month course titled Introduction to Hadoop and MapReduce (www.udacity.com/course/ud617) is available immediately and is open to anyone who signs in-free of cost, of course. Other courses such as ’Introduction to Data Science’, ’Exploratory Data Analysis’ , ‘Data Wrangling with MongoDB’ and ‘Machine Learning’ are in the pipeline and will be offered later. One important and very valuable aspect of these courses is the infusion of industrial expertise through collaboration with big names like Cloudera, Facebook and MongoDB.
The one month course ’Hadoop and MapReduce’ will lay the foundations of Big Data and teach the students the fundamentals of Hadoop and MapReduce.
What is Hadoop?
Hadoop is an open source, distributed, scalable, fault tolerant and high-available big data processing platform from Apache Software Foundation. It has two major components- HDFS, which stands for Hadoop Distributed File System and on top of it another layer of MapReduce, a framework for parallel processing of big data. Recently Hadoop 2 has been released which opens up HDFS for non Mapreduce frameworks as well through the introduction of YARN; essentially a job scheduling and cluster resource management framework.
It will be interesting to observe the kind of role this initiative from Udacity will be playing in shrinking the demand- supply gap of such a high demand area having such fast technology dynamics through the use of massive-ness of MOOCs.