Showing posts from January, 2015

Using Apache Sqoop for Load Testing

Reference: Cloudera

Our thanks to Montrial Harrell, Enterprise Architect for the State of Indiana, for the guest post below. Recently, the State of Indiana has begun to focus on how enterprise data management can help our state’s government operate more efficiently and improve the lives of our residents. With that goal in mind, I began this journey just like everyone else I know: with an interest in learning more about Apache Hadoop. I started learning Hadoop via a virtual server onto which I installed CDH and worked through a few online tutorials. Then, I learned a little more by reading blogs and documentation, and by trial and error. Eventually, I decided to experiment with a classic Hadoop use case: extract, load, and transfer (ELT). In most cases, ELT allows you to offload some resource-intensive data transforms in favor of Hadoop’s MPP-like functionality, thereby cutting resource usage on the current ETL server at a relatively low cost. This functionality is in part delivered via…

New in CDH 5.3: Transparent Encryption in HDFS

Thanks to Cloudera for article and updated Version of CDH 5.3
Support for transparent, end-to-end encryption in HDFS is now available and production-ready (and shipping inside CDH 5.3 and later). Here’s how it works. Apache Hadoop 2.6 adds support for transparent encryption to HDFS. Once configured, data read from and written to specified HDFS directories will be transparently encrypted and decrypted, without requiring any changes to user application code. This encryption is also end-to-end, meaning that data can only be encrypted and decrypted by the client. HDFS itself never handles unencrypted data or data encryption keys. All these characteristics improve security, and HDFS encryption can be an important part of an organization-wide data protection story. Cloudera’s HDFS and Cloudera Navigator Key Trustee (formerly Gazzang zTrustee) engineering teams did this work under HDFS-6134 in collaboration with engineers at Intel as an extension of earlier Project Rhino work. In this post, we’…

Big Data Trendz