UPDATES

WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Tuesday, 4 February 2014

Spark is Now Generally Available for Cloudera Enterprise

Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.
Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).
Spark was created and contributed to the Apache Software Foundation by UC Berkeley, and it has quickly gained adoption for machine learning, interactive analytics, and streaming analytics over large datasets. It features a general programming model for writing applications by composing arbitrary operators, such as mappers, reducers, joins, group-bys, and filters. Spark keeps track of the data that each of the operators produces, enabling applications to reliably store this data in memory, which makes it ideal for low-latency computations and efficient iterative algorithms. Spark applications can be up to 100x faster and require writing 2x to 10x less code than equivalent MapReduce applications.

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Source: Cloudera Blog The CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of ...