Hadoop 2.4 Globally Released

Hadoop 2.4 Globally Released

By jonrawlinson

Hadoop is The De Facto Standard of Big Data

   

What is new in Hadoop 2.x

[Feb 20, 2014] We are pleased to announce that Hadoop 2.4.0 is released today,  hadoop-2.4.0, the *General Availability GA* release of  Hadoop 2.x series!

[Feb 20, 2014] We are pleased to announce that Hadoop 2.3.0 is released today,  hadoop-2.3.0, the *General Availability GA* release of  Hadoop 2.x series!

[Oct 15, 2013] We are pleased to announce that Hadoop 2.2.0 is released today,  hadoop-2.2.0, the *General Availability GA* release of  Hadoop 2.x series!

The significant highlights of Hadoop 2.2 compared to Hadoop 1.x:

  • YARN - A general purpose resource management system for Hadoop to allow MapReduce and other other data processing frameworks and services. 
  • High Availability for HDFS: The HDFS High Availability feature provides the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.
  • HDFS Federation: In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes.  Key Benefits are 
    • Namespace Scalability - Large deployments using lot of small files benefit from scaling the namespace by adding more Namenodes to the cluster 
    • Performance - Adding more Namenodes to the cluster scales the file system read/write operations throughput.
    • Isolation - With multiple Namenodes, different categories of applications and users can be isolated to different namespaces.
  • HDFS Snapshots: HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.
  • NFSv3 access to data in HDFS
  • Support for running Hadoop on Windows
  • Binary Compatibility for MapReduce applications built on hadoop-1.x
  • Substantial amount of integration testing with rest of projects in the ecosystem

Users are encouraged to immediately move to 2.2.0 since this release is significantly more stable and is guaranteed to remain compatible in terms of both APIs and protocols.

   

Note: A couple of important points to note while upgrading to hadoop-2.2.0:

  • HDFS - The HDFS community decided to push the symlinks feature out to a future 2.3.0 release and is currently disabled.
  • YARN/MapReduce - Users need to change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.
   
   

Please feel free to contact us if you have any queries.

PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL