UPDATES

WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Video Bar

Loading...

Thursday, 31 March 2016

Apache Sentry is Now a Top-Level Project

Source: Cloudera
The following post was originally published by the Sentry community at apache.org. We re-publish it here for your convenience.
We are very excited to announce that Apache Sentry has graduated out of Incubator and is now an Apache Top-Level Project! Sentry, which provides centralized fine-grained access control on metadata and data stored in Apache Hadoop clusters, was introduced as an Apache Incubator project back in August 2013. In the past two and a half years, the development community grew significantly to a large number of contributors from various organizations. Upon graduation, there were more than 50 contributors, 31 of whom had become committers.sentry

What’s Sentry?

While Hadoop has strong security at the filesystem level, it lacked the granular support needed to adequately secure access to data by users and BI applications. This problem forces users to make a choice: either leave data unprotected or lock out users entirely. Most of the time, the preferred choice is the latter, severely inhibiting access to data in Hadoop. Sentry provides the ability to enforce role-based access control to data and/or privileges on data for authenticated users in a fine-grained manner. For example, Sentry’s SQL permissions allow access control at the server, database, table, view and even column scope at different privilege levels including select, insert, etc for Apache Hive and Apache Impala (incubating). With role-based authorization, precise levels of access could be granted to the right users and applications.

Getting started with NextGen MapReduce (single node) in easy steps

Without any detailed explanation of what-is-what which is due for another blog entry, here are simple steps to get started with MRv2 (next generation MapReduce) in easy steps. Find more details about MRv2 details here. So, here are the steps

1) Download the Hadoop 2.x release here.

2) Extract it to a folder (let's call it $HADOOP_HOME).

3) Add the following to .bashrc in the home folder.

?
1
2
3
4
5
6
export HADOOP_HOME=/home/bigdata/Installations/hadoop-2.7.2
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
4) Create the namenode and the datanode folder in $HADOOP_HOME folder.
?
1
2
mkdir -p $HADOOP_HOME/yarn/yarn_data/hdfs/namenode
mkdir -p $HADOOP_HOME/yarn/yarn_data/hdfs/datanode
5) Create the following configuration files in $HADOOP_HOME/etc/hadoop folder.
etc/hadoop/yarn-site.xml

?
1
2
3
4
5
6
7
8
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>

Introduction to HDFS Erasure Coding in Apache Hadoop

Thanks to blog contributors from Cloudera Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compar...