Posts

Showing posts from September, 2017

3X FASTER INTERACTIVE QUERY WITH APACHE HIVE LLAP

Image
Thanks to Carter Shanklin & Nita Dembla from Hortonworks for valuable post. One of the most exciting new features of HDP 2.6 from Hortonworks was the general availability of Apache Hive with LLAP. If you missed DataWorks Summit you’ll want to look at some of the great LLAP experiences our users shared, including Geisinger who found that Hive LLAP outperforms their traditional EDW for most of their queries, and Comcast who found Hive LLAP is faster than Presto for 75% of benchmark queries. These great results are thanks to performance and stability improvements Hortonworks made to Hive LLAP resulting in 3x faster interactive query in HDP 2.6. This blog dives into the reasons HDP 2.6 is so much faster. We’ll also take a look at the massive step forward Hive has made in SQL compliance with HDP 2.6, enabling Hive to run all 99 TPC-DS queries with only trivial modifications to the original source queries. STARTING OFF: 3X PERFORMANCE GAINS IN HDP 2.6 WITH HIVE LLAPLet’s start out with a s…

YINCEPTION: A YARN BASED CONTAINER CLOUD AND HOW WE CERTIFY HADOOP ON HADOOP

Image
Thanks to Hortonworks Team for the valuable post. In this post, we deep dive into something that we are extremely excited about – Running a container cloud on YARN! We have been using this next-generation infrastructure for more than a year in running all of the Hortonworks internal CI / CD infrastructure. With this, we can now run Hadoop on Hadoop to certify our releases! Let’s dive right in! CERTIFYING HORTONWORKS PLATFORMSThe introductory post on Engineering @ Hortonworks gave the readers an overview of the scale of challenges we see in delivering an Enterprise Ready Data platform. Essentially, for every new release of a platform, we provision Hadoop clusters on demand, with specific configurations like authentication on/off, encryption on/off, DB combinations, and OS environments, run a bunch of tests to validate changes, and shut them down. And we do this over and over, day in and day out, throughout the year.

DATA SCIENCE FOR THE MODERN DATA ARCHITECTURE

Image
Thanks to Vinay Shukla(Leading Data Science Product Management at Hortonworks) Our customers increasingly leverage Data Science, and Machine Learning to solve complex predictive analytics problem. A few examples of these problems are churn prediction, predictive maintenance, image classification, and entity matching. While everyone wants to predict the future, truly leveraging Data Science for Predictive Analytics remains the domain of a select few. To expand the reach of Data Science, the Modern Data Architecture (MDA) needs to address the following 4 requirements: Enable Apps to consume predictions and become smarterBring predictive analytics to the IOT EdgeBecome easier, more accurate & faster to deploy and manageFully support data science life cycleThe below diagram represents where Data science fits in the MDA. DATA SMART APPLICATIONS The end-users consumes data, analytics and the results of Data Science analytics via data centric applications (or apps). A vast majority of these…