Showing posts from August, 2016

Resolving Lock Contention in Apache Solr: A Performance-Analysis Detective Story

This case study is an instructive example of how performance analysis is a multi-faceted process that often leads one in surprising directions.  Apache Solr Near Real Time (NRT)  Search allows Solr users to search documents indexed just seconds ago. It’s a critical feature in many real-time analytics applications. As Solr indexes more and more documents in near real time, end-user expectations for performance get higher and higher. However, recently the Cloudera Search team found that Solr NRT indexing throughput often hit a bottleneck even when there are plenty of CPU, disk, and network resources available. Latency was average, in the hundreds of milliseconds range. Considering that Solr NRT indexing is a mainly machine-to-machine operation, without a human waiting for indexing to complete, that latency range was actually fairly good. Furthermore, some customers reported other issues under heavy Solr NRT indexing workloads, such as connection resets, that could be “cascading” performa…

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

source: Cloudera blog Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases, with the advent of technologies like Apache SparkApache Kafka, and Apache Impala (Incubating), Hadoop is also increasingly a real-time platform. In particular, compliance-related use cases centered on electronic forms of communication, such as archiving, supervision, and e-discovery, are extremely important in financial services and related industries where being “out of compliance” can result in hefty fines. For example, financial institutions are under regulatory pressure to archive all forms of e-communicat…