Skip to main content

Use Cases


Time to time I come across different interesting scenarios and use cases of Big Data which have a direct positive impact (weather forecast) and negative impact (user surveillance) on our lives. Here are some of the ones I found too interesting to share.

If you are involved in any interesting Big Data scenarios, please let me know and I will add it to this page. You can get my email from the `About Me` section on this blog.


Managing sewage like traffic thanks to data -


How India’s favorite TV show uses data to change the world -

Process a Million Songs with Apache Pig -

Health Care

Neural Network for Breast Cancer Data Built on Google App Engine -

Processing Rat Brain Neuronal Signals Using A Hadoop Computing Cluster – Part I -

Big Data in Genomics and Cancer Treatment -

Big data is the next big thing in health IT -

Big data and DNA: What business can learn from junk gene -

UC Irvine Medical Center: Improving Quality of Care with Apache Hadoop - 

Lessons from Anime and Big Data (Ghost in the Shell) -

6 Big Data Analytics Use Cases for Healthcare IT -  

IT Infrastructure

Hadoop for Archiving Email

The Data Lifecycle, Part One: Avroizing the Enron Emails

Fraud Detection & Crime
Using Hadoop for Fraud Detection and Prevention -

Big data thwarts fraud -

Hadoop : Your Partner in Crime -

Ad Platform
Why Europe’s Largest Ad Targeting Platform Uses Hadoop -


Big data is stealth travel site's secret weapon -


Big Data in Education

Network Security
Introducing Packetpig

Enterprise and security


Adopting Apache Hadoop in the Federal Government -

10 days big data changes everything -

Biodiversity Indexing: Migration from MySQL to Hadoop -


Popular posts from this blog

How-to: Use Parquet with Impala, Hive, Pig, and MapReduce

Source: Cloudera Blog The CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of columnar storage at each phase of data processing.  An open source project co-founded by Twitter and Cloudera, Parquet was designed from the ground up as a state-of-the-art, general-purpose, columnar file format for the Apache Hadoop ecosystem. In particular, Parquet has several features that make it highly suited to use with Cloudera Impala for data warehouse-style operations: Columnar storage layout: A query can examine and perform calculations on all values for a column while reading only a small fraction of the data from a data file or table.Flexible compression options: The data can be compressed with any of several codecs. Different data files can be compressed differently. The compression is transparent to applications that read the data files.Innovative encoding schemes: Sequences of identical, similar, or related data values can be represented i…

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Thanks to Ted Malaska  and Cloudera
Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment.
The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time. Technologies like Apache Kafka, Apache Flume, Apache Spark, Apache Storm, and Apache Samza are increasingly pushing the envelope on what is possible. It is often tempting to bucket large-scale streaming use cases together but in reality they tend to break down into a few different architectural patterns, with different components of the ecosystem better suited for different problems. In this post, I will outline the four major streaming patterns that we have encountered with customers running enterprise data hubs in production, and explain how to implement those patterns architecturally on Hadoop. Streaming PatternsThe four basic streaming patterns (often used in tandem) are…


Thanks to Carter Shanklin & Nita Dembla from Hortonworks for valuable post. One of the most exciting new features of HDP 2.6 from Hortonworks was the general availability of Apache Hive with LLAP. If you missed DataWorks Summit you’ll want to look at some of the great LLAP experiences our users shared, including Geisinger who found that Hive LLAP outperforms their traditional EDW for most of their queries, and Comcast who found Hive LLAP is faster than Presto for 75% of benchmark queries. These great results are thanks to performance and stability improvements Hortonworks made to Hive LLAP resulting in 3x faster interactive query in HDP 2.6. This blog dives into the reasons HDP 2.6 is so much faster. We’ll also take a look at the massive step forward Hive has made in SQL compliance with HDP 2.6, enabling Hive to run all 99 TPC-DS queries with only trivial modifications to the original source queries. STARTING OFF: 3X PERFORMANCE GAINS IN HDP 2.6 WITH HIVE LLAPLet’s start out with a s…