How-to: Create an IntelliJ IDEA Project for Apache Hadoop

Thanks to Charles Lamb, Source: Cloudera Link: here
Prefer IntelliJ IDEA over Eclipse? We’ve got you covered: learn how to get ready to contribute to Apache Hadoop via an IntelliJ project.
It’s generally useful to have an IDE at your disposal when you’re developing and debugging code. When I first started working on HDFS, I used Eclipse, but I’ve recently switched to JetBrains’ IntelliJ IDEA (specifically, version 13.1 Community Edition).
My main motivation was the ease of project setup in the face of Maven and Google Protocol Buffers (used in HDFS). The latter is an issue because the code generated by protoc ends up in one of the target subdirectories, which can be a configuration headache. The problem is not that Eclipse can’t handle getting these files into the classpath — it’s that in my personal experience, configuration is cumbersome and it takes more time to set up a new project whenever I make a new clone. Conversely, IntelliJ’s import maven project functionality seems to “just work”.
In this how-to, you’ll learn a few simple steps I use to create a new IntelliJ project for writing and debugging your code for Hadoop. (For Eclipse users, there is a similar post available here.) It assumes you already know how to clone, build, and run in a Hadoop repository using mvngit, and so on. These instructions have been tested with the Hadoop upstream trunk (

  1. Make a clone of a Hadoop repository (or use an existing sandbox). There should be a hadoop-common directory at the top level when you’re finished.
  2. Do a clean and a full build using the mvn command in the CLI. I use mvn clean install, but you should do whatever suits you.
  3. In .../bin/ file, set idea.max.intellisense.filesize=10000.
  4. Start IntelliJ.
  5. Select File > Import Project…
  6. A “Select File or Directory” to Import wizard screen will pop up. Select the hadoop-common directory from your repository clone. Click OK.
  7. An “Import Project” wizard screen will appear. Check “Import project from external model” and select “Maven”. Click Next.
  8. On the next screen, you don’t need to select any options. Click Next
  9. On the next wizard screen, you do not need to select any profiles. Click Next.
  10. On the next wizard screen, org.apache.hadoop.hadoop-main:N.M.P-SNAPSHOT will already be selected. Click Next.
  11. On the next wizard screen, ensure that IntelliJ is pointing at a JDK 7 SDK (download JDK 7 here) for the JDK home path. Click Next.
  12. On the next wizard screen, give your project a project name. I typically set the “Project file location” to a sibling of the hadoop-common directory, but that’s optional. Click Finish.
  13. IntelliJ will then tell you: “New projects can either be opened in a new window or replace the project in the existing window. How would you like to open the project?” I typically select “New Window” because it lets me different projects in different panels. Select one or the other.
  14. IntelliJ then imports the project.
  15. You can check that the project builds OK using Build > Rebuild Project. In the “Messages” panel I typically use the little icon on the left side of that window to Hide Warnings.
You’re ready for action. It’s easy to set up multiple projects that refer to different clones: just repeat the same steps above. Click on Window and you can switch between them.
A typical workflow for me is to edit either in my favorite editor or in IntelliJ. If the former, IntelliJ is very good about updating the source that it shows you. After editing, I’ll do Build > Rebuild Project, work out any compilation errors, and then either use the Debug or Run buttons. I like the fact that IntelliJ doesn’t make me mess around with Run/Debug configurations.
Some of my favorite commands and keystrokes (generally configurable) are:
  • c-sh-N (Navigate > File), which lets me easily search for a file and open it
  • c-F2 and c-F5 (stop the currently running process, and debug failed unit tests)
  • In the Run menu, F7 (step into), F8 (step over), Shift-F8 (step out), and F9 (resume program) while in the debugger
  • In the View > Tool Windows menu: Alt-1 (Project) and Alt-7 (Structure)
If you ever run into an error that says you need to add webapps/hdfs to your classpath (has happened to me when running some HDFS unit tests), taking the following steps should fix it (credit goes to this post from Stack Overflow):
  1. Select File > Project Structure…
  2. Click on Modules under “Project Settings.”
  3. Select the hadoop-hdfs project.
  4. Select the Dependencies tab.
  5. Click the + sign on right side and select “Jars or directories.”
  6. From your clone hierarchy, select the .../hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target directory. Click OK.
  7. Check all of the / directories (classes, jar directory, source archive directory). Click OK.
  8. Click OK (again).
Congratulations, you are now ready to contribute to Hadoop via an IntelliJ project!

Happy Hadooping !!


Popular posts from this blog

Cloudera Data Hub: Where Agility Meets Control

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Big Data Trendz