Facebook has the world's leagest Apache Hadoop Cluster!
Facebook has many Hadoop clusters, the largest among them is the one that is used for Datawarehousing. 30,000+ simultaneous clients to the HDFS.
"The Datawarehouse Hadoop cluster at Facebook has become the largest known Hadoop storage cluster in the world
(Since 2010) Facebook has many Hadoop clusters, the largest among them is the one that is used for Datawarehousing. Here are some statistics that describe a few characteristics of the Facebook's Datawarehousing Hadoop cluster:
• 12 TB of compressed data added per day • 800 TB of compressed data scanned per day • 25,000 map-reduce jobs per day • 65 millions files in HDFS • 30,000 simultaneous clients to the HDFS NameNode
A majority of this data arrives via scribe, as desribed in scribe-hdfs integration. This data is loaded in Hive. Hive provides a very elegant way to query the data stored in Hadoop. Almost 99.9% Hadoop jobs at Facebook are generated by a Hive front-end system. We provide lots more details about our scale of operations in our paper at SIGMOD titled Datawarehousing and Analytics Infrastructure at Facebook."