The_Value_of_Hadoop

The_Value_of_Hadoop

By Frank.Vassen

Hadoop is the De Facto Standard of Big Data

   

Big data is a collection of data sets with sizes beyond the ability of commonly used software tools to store and process within a tolerable elapsed time,  the challenges include

  • capture, curation, storage ,search, sharing, transfer, analysis, and visualization

Hadoop enables you to handle these challenges in the extremely cost effective way.  It is a fundamentally new way of storing and processing large data sets to reveal insight from all types of data,  by making all of data usable, not just what’s in databases, it lets you see relationships that were hidden before and reveal answers that have always been just out of reach. 

Key values of Hadoop and it product family:

  • The Exact Code Used in Production Systems of Large and Successful Companies Yahoo, they contribute all changes back to Hadoop and share the exact code they run in production so everyone benefits from the shared neutral foundation.  Facebook develops the Hadoop's data warehouse system, named Hive, they contribute all changes back to Hadoop and share the exact code they run in production.
    You can use Hadoop with confidence
  • Proven at Scale  Hadoop has been rolled it out to many companies at global scale, on tens of thousands of production nodes.  For instance, Yahoo, one of the largest Hadoop sites, with over 40,000 Hadoop servers over 100,000 CPUs, also one of the largest Hadoop QA sites, running very large clusters, having a large QA effort, running a huge variety of workloads in years.   (see http://www.slideshare.net/ydn/hadoop-yahoo-internet-scale-data-processing).
  • Extremely cost effective to handle Big Data Hadoop runs on industrial standard hardware. It means that the cost per terabyte, for both storage and processing, is much lower than on other systems. It makes efficient use of disk space by support pluggable compression algorithms.  Adding or removing storage capacity is simple. You can dedicate new hardware to a cluster incrementally.
    You don't need to find new hardware to experiment.
  • You Can Work with All Your data  There is no type nor no type nor length boundary in Hadoop, no limit on your data.  You can store and analyze anything in Hadoop no matter what size, try new things quickly at huge scale.
    More important, you can process all of that data in a timely manner.
  • No Need for R&D and No License Fee, the cluster just work!  Companies can save big from license and buy better and bigger number of Hadoop servers.
  • Flexible Support Option  There is no license fee in Hadoop, you can use it for any purposes and in any scale.  There are flexible support options in Hadoop's support eco-system, from community to professional support services.
   

Applications of Hadoop can be but not limited as follows:

  • Log and/or clickstream analysis of various kinds
  • Marketing analytics
  • Machine learning and/or sophisticated data mining
  • Image processing
  • Processing of XML messages
  • Web crawling and/or text processing
  • General archiving, including of relational/tabular data
   

There are 3 key steps to start a Hadoop project successfully:

  • Defining the problem domains and your business use cases: Start with an inventory of business challenges and goals, narrow them down to those expected to provide the highest return with Hadoop.
  • Defining the technical requirements: Determine the quality of your data in terms of volume, velocity, variety, identify how Hadoop will store and process the data
  • Planning the Big Data project: To construct concrete objectives and goals in measurable business terms, identify the time to business value, expected outcome. Plan project approach, cost by category, measures, project activity and timing.
   

Please feel free to contact us if you have any queries.

PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL