UPDATES

WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Video Bar

Loading...

Saturday, 28 June 2014

How-to: Create an IntelliJ IDEA Project for Apache Hadoop

Thanks to Charles Lamb, Source: Cloudera Link: here
Prefer IntelliJ IDEA over Eclipse? We’ve got you covered: learn how to get ready to contribute to Apache Hadoop via an IntelliJ project.
It’s generally useful to have an IDE at your disposal when you’re developing and debugging code. When I first started working on HDFS, I used Eclipse, but I’ve recently switched to JetBrains’ IntelliJ IDEA (specifically, version 13.1 Community Edition).
My main motivation was the ease of project setup in the face of Maven and Google Protocol Buffers (used in HDFS). The latter is an issue because the code generated by protoc ends up in one of the target subdirectories, which can be a configuration headache. The problem is not that Eclipse can’t handle getting these files into the classpath — it’s that in my personal experience, configuration is cumbersome and it takes more time to set up a new project whenever I make a new clone. Conversely, IntelliJ’s import maven project functionality seems to “just work”.
In this how-to, you’ll learn a few simple steps I use to create a new IntelliJ project for writing and debugging your code for Hadoop. (For Eclipse users, there is a similar post available here.) It assumes you already know how to clone, build, and run in a Hadoop repository using mvngit, and so on. These instructions have been tested with the Hadoop upstream trunk (github.com/apache/hadoop-common.git).

Monday, 23 June 2014

Introducing : Cloudera LIVE

Source: cloudera, Thanks to Cloudera
Cloudera Live is a new way to get started with Apache Hadoop, online. No downloads, no installations, no waiting. Watch tutorial videos and work with real-world examples of the complete Hadoop stack included with CDH, Cloudera’s completely open source Hadoop platform, to:
  • Learn Hue, the Hadoop User Interface developed by Cloudera
  • Query data using popular projects like Apache Hive, Apache Pig, Impala, Apache Solr, and Apache Spark (new!)
  • Develop workflows using Apache Oozie

Introduction to HDFS Erasure Coding in Apache Hadoop

Thanks to blog contributors from Cloudera Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compar...