Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Wednesday, April 10, 2013

Big Data: A Brief Intro

Definition

"Big data" is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set.


In a 2001 research report (Gartner) defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner continues to use this model for describing big data.
Examples
Examples include web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce.

Big Data impacts
Big data has emerged because we are living in a society which makes increasing use of data intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide and there are between 1 billion and 2 billion people accessing the internet. Basically, there are more people interacting with data or information than ever before. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class which means more and more people who gain money will become more literate which in turn leads to information growth. The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data - the equivalent of 167 times the information contained in all the books in the US Library of Congress.

Facebook handles 40 billion photos from its user base.

Technology demands
Because Big Data is complex data sets in massive volumes (petabytes) and multiple formats (table contents, text, audio, video), with the speed and amount of data being generated today, the corresponding technology demand is driving new ways to analyze the information faster, cheaper, and with better results.

Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.

How to know if my problem is a Big Data Problem?

Without delving into details about the nature of the business challenge and existing sources of data, it is difficult for anyone to determine for sure if the problem is a Big Data problem.

Analysis on the following may determine whether a challenge is Big Data-related.

Types of data

Three types of data may be present in your enterprise:

a. Large Volumes, e.g. Data stored in Database tables, Excel spreadsheets, Access database
b. Unstructured data, e.g. Video, Audio, Facebook, Twitter, Blogs, Customer Reviews, Log files
c. ‘Gray’ data, e.g. web traffic where the exact usage is yet to be determined based on business needs that may arise

Big Data Solution classification
There are data conditions called the “Vs” that assist in defining a Big Data problem:
1. Volume, e.g. multiple petabytes of data
2. Velocity e.g. results need to be analyzed in seconds or less
3. Variety, e.g. Structured and unstructured data like social media posts and video files
4. Variability, e.g. Constantly changing like a stock market

How do you get started?

Many of the enterprises fail to implement a Big Data solution because they have not identified clear business cases for the tools. The common trigger to initiate Big Data development is a data blast that existing systems can no longer manage. As these datasets continue to grow in size, the enterprises face the problem of managing, storing and processing the data at the speed required for timely business response.

Big Data, by its very nature, contains endless possibilities for business insight and improved operations. But much like venturing into space without a defined mission, the Big Data world demands that businesses clearly define what they intend to achieve in advance. Otherwise enterprises can spend substantially on fancy tools that may never happen upon real business benefit.

Related Posts:
Oracle NoSQL Database - Intro
Big Data: Working with Oracle NoSQL (KVLite)  

No comments: