Hadoop Executive Summary

Hadoop Executive Summary

By LaertesCTB

Hadoop is the de facto standard for Big Data. Its database is HBase

   
   

Key Advantages of Hadoop

Hadoop is a platform that provides Distributed storage & Computational capabilities both

Hadoop is the popular data storage and big data analysis platform. Large and successful companies are using it to do mission critical tasks and powerful analysis. Hadoop offers two important services: It can store any kind of data from any source at very large scale, and its extreme performance can do very sophisticated data process and analysis of that data easily and quickly.  

   

Hadoop delivers several key advantages:

Extremely Performance
Hadoop can be expended from few nodes to a large number of nodes, thus it can provide extreme processing power to handle extremely challenging requirements.

Store anything and NO information is lost  
Hadoop stores data in its native format without forcing the transformation when data arrives, therefore no information is lost. Downstream analyses run with no loss of fidelity.  Hadoop allows the data analyst to choose how and when to digest, analyze and transform data.

Use with confidence 
The user community of Hadoop and HBase is global, active and diverse. Companies across many industries participate, including social networking, media, financial services, telecommunications, retail, health care and others (for more information, please read: Who uses HBase and Hadoop).

Proven at scale
You may not have petabytes of data that you need to analyze today, nevertheless, you can deploy Hadoop with confidence because companies like Facebook, Yahoo! and others run very large Hadoop instances managing enormous amounts of data. When you adopt a platform for data management and analysis, you are making a commitment that you will have to live with for years. The success of the biggest Web companies in the world demonstrates that Hadoop can grow as your business does.

High Availability 
Hadoop 2.x offers High Availability without Single-Point-Of-Failure using Multiple-Master model, multiple and redundant namenodes are acting in congress to Hadoop Distributed File System (HDFS), it enables Hadoop to scale out over 4,000 machines per cluster. It also introduces MapReduce NextGen which transforms Hadoop into a full blown platform as a service. 

Big Data Random Access and Flexible Secondary Indexes 
HBase is Hadoop's Database, with Built-in load-balancing, Automatic versioning, Automatic failover and Built-in scalability. It is a Strongly consistent database and provides random access to your big data.

Extremely cost effective to handle Big Data
Hadoop runs on industrial standard hardware. It means that the cost per terabyte, for both storage and processing, is much lower than on older systems. HBase makes efficient use of disk space by support pluggable compression algorithms.  Adding or removing storage capacity is simple. You can dedicate new hardware to a cluster incrementally.

   
   

Hadoop Product Family - The Complete Tool Set for Big Data Analysis

The Hadoop allows for the distributed processing of large data sets across clusters of computers using simple programming models. Its product library provides Complete Tool Sets to analyze Big Data

  • Hadoop includes:
    • Hadoop Common: The common utilities that support the other Hadoop modules.
    • Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
    • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
    • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hive, the Hadoop data warehouse, facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets
  • HBase, the Hadoop Database, Highly fault tolerance, built-in scalability, built-in load-balancing, Automatic failover and versioning
  • Pig An engine for executing data flows in parallel on Hadoop,  automatic MapReduce program generator 
  • Ozzie, a scalable, reliable and extensible workflow scheduler system to manage Hadoop jobs
  • Mahout, the Hadoop powerful machine learning library
  • Sqoop,  an important application for transferring data between relational databases and Hadoop
   
   

Use cases of and Hadoop

Simple numerical summaries – average, minimum, sum – were sufficient for the business problems of the 1980s and 1990s. Large amounts of complex data, though, require new techniques. Recognizing customer preferences requires analysis of purchase history, but also a close examination of browsing behavior and products viewed, comments and reviews logged on a web site, and even complaints and issues raised with customer support staff. Predicting behavior demands that customers be grouped by their preferences, so that behavior of one individual in the group can be used to predict the behavior of others. The algorithms involved include natural language processing, pattern recognition, machine learning and more. These techniques run very well on Hadoop.

Use case of HBase and Hadoop, as follows but not limited to:

  • Archive platform - Big Image library, big document library
  • Natural Language processing
  • Recommendation Engine - How can companies predict customer preferences? Click-stream analysis, log analysis at web scale
  • Customer Churn Analysis - How to win more customers and avoid really losing customers?  Sophisticated data mining 
  • AD Targeting - How can companies increase campaign efficiency? Marketing automation, business intelligence
  • Point-of-sales Transaction Analysis - How do retailers target promotions guaranteed to make you buy?
  • Analyzing Network Data to Predict - How can organizations use machine generated data to identify potential trouble?
  • Threat Analysis - How can companies detect threats and fraudulent activity? Crawling, text processing
  • Trade Surveillance - How can a bank spot the rogue trader?
  • Search Quality - What’s in your search?
  • Data Sandbox - What can you do with new data? Big data archiving and sandbox, including of relational/tabular data
  • GIS - 3D maps, spatial applications
  • Real-time Customer Segmentation - Marketing analytics 
   

When you need random, realtime read/write access to your Big Data, you may consider Hadoop now.

Please feel free to contact us if you have any queries.

PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL