The Mozilla Metrics team stepped in to deliver a new storage infrastructure that accommodates much larger volumes of data

Mozilla creates powerful web tech for everyone, Mozilla uses HBase for product improvement

Get the latest features, fast performance, and the development tools you need to build for the open web from Mozzila. Mozilla's Socorro uses Apache HBase as the core storage for product improvement.

Mozzila independently offers one of the world's most popular browsers: Firefox, with the goal to protect user privacy.

"The key goals of the Metrics team for the Socorro 2.0 design are:

• Provide scalable backend capable of storing 100% of 3 years worth of crash reports

• Provide a powerful analytic platform capable of handling complicated queries without requiring new features in the Socorro platform code

• Reuse existing code/components and keep as much Socorro "business logic" (e.g. crash report signature generation) as possible in the hands of the Socorro dev team to enable them to control their schedule without being blocked by Metrics team deliverables

The Metrics team determined that a key-value or document store database was best suited for this infrastructure. Some of the determining factors were:

• A continuous stream of crash reports flowing in

• Crash reports need to be retrieved by ID with low latency based on user interaction with the Socorro website

• If information is requested for a crash report that has not yet been processed, it needs to be processed as a priority, delivering the results within seconds

• Continuous background processing of crash reports

• Scheduled jobs to aggregate and analyze crash reports and populate tables for the Socorro website

After evaluating the merits of several different back-end technologies such as MongoDB, CouchDB, Cassandra, Hadoop, and HBase, the Metrics team opted for HBase as the primary data store. One of the biggest reasons for this choice was that in addition to fitting the above criteria similar to several of the other options, it is built on Hadoop and provides a general computing cluster that can be used for a variety of future projects such as large scale log processing. "