Hortonworks is excited to announce that our first hands-on, performance based certification exam is now available! The HDP Certified Developer (HDPCD) exam is designed for Hadoop developers working with frameworks like Pig, Hive, Sqoop and Flume. This new approach to Hadoop certification is designed to allow individuals an opportunity to prove their Hadoop skills in a way that is recognized in the industry as meaningful and relevant to on-the-job performance.
Instead of multiple-choice questions, the exam consists of tasks executed on a live, three-node Hortonworks Data Platform cluster:
The exam has three main categories of tasks that involve:
Click here to view a detailed list of objectives for the HDPCD exam.
Cloudera’s new Data Hub cloud service, powered by Cloudera Data Platform, enables users to seamlessly migrate on-premises data management and analytics workloads to the cloud as well as implement new cloud workloads in pursuit of your cloud-first data management strategy. On August 22nd, Cloudera demonstrated its Data Hub service during a webinar highlighting key business benefits, use cases, and product capabilities. Below is a brief overview of the topics covered and some of the most frequently asked questions from attendees. What is Cloudera Data Hub?Cloudera Data Hub is a powerful cloud service on Cloudera Data Platform (CDP) that makes it easier, safer, and faster to build modern, mission-critical, data-driven applications with enterprise security, governance, scale, and control. The cloud-native service is powered by a suite of integrated open source technologies that delivers the widest range of analytical workloads such as data marts and data engineering. The three distinguishing…
Thanks to Ted Malaska and Cloudera Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment. The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time. Technologies like Apache Kafka, Apache Flume, Apache Spark, Apache Storm, and Apache Samza are increasingly pushing the envelope on what is possible. It is often tempting to bucket large-scale streaming use cases together but in reality they tend to break down into a few different architectural patterns, with different components of the ecosystem better suited for different problems. In this post, I will outline the four major streaming patterns that we have encountered with customers running enterprise data hubs in production, and explain how to implement those patterns architecturally on Hadoop.
Streaming PatternsThe four basic streaming patterns (often used in tandem) are…
Source: Cloudera BlogThe CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of columnar storage at each phase of data processing. An open source project co-founded by Twitter and Cloudera, Parquet was designed from the ground up as a state-of-the-art, general-purpose, columnar file format for the Apache Hadoop ecosystem. In particular, Parquet has several features that make it highly suited to use with Cloudera Impala for data warehouse-style operations: Columnar storage layout: A query can examine and perform calculations on all values for a column while reading only a small fraction of the data from a data file or table.Flexible compression options: The data can be compressed with any of several codecs. Different data files can be compressed differently. The compression is transparent to applications that read the data files.Innovative encoding schemes: Sequences of identical, similar, or related data values can be represented i…