The Hadoop ecosystem emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity, server class hardware, thereby providing cost-effective, horizontal scalability.
Saturday, February 25, 2017
Thursday, February 23, 2017
Hadoop Administration: Accessing HDFS (File system & Shell Commands)
You can access HDFS in many different ways. HDFS provides a native Java application programming interface (API) and a native C-language wrapper for the Java API. In addition, you can use a web browser to browse HDFS files. I'll be using CLI only in this post.
Saturday, February 18, 2017
Hadoop Ecosystem - Quick Introduction
This is data age, data data everywhere. Although we cannot measure total volume of data stored electronically but it is estimated that 4.4 zettabytes in 2013 and is forecasting a tenfold growth by 2020 to 44 zettabytes. Clearly we can say this is Zettabyte Era. A zettabyte is equal to one thousand exabytes, one million petabytes, or one billion terabytes.
Sunday, February 12, 2017
Big Data - The Bigger Picture
I’ve
put the title with "The Bigger Picture" instead of "The Big
Picture" because even big picture comes with much more details. The
aim of this post is to provide a broad understanding of the topic without
indulging into deeper details.
Subscribe to:
Posts (Atom)