UPDATES

WELCOME TO BIGDATATRENDZ      WELCOME TO CAMO      Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop      Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle     

Tuesday, 4 October 2016

RDBMS vs NoSQL Data Flow Architecture

Here is a view for difference between RDBMS and a NoSQL Database. There are a lot of differences, but the data flow is as shown below in those systems.

 
In a traditional RDBMS, the data is first written to the database, then to the memory. When the memory reaches a certain threshold, it's written to the Logs. The Log files are used for recovering in case of server crash. In RDBMS before returning a success on an insert/update to the client, the data has to be validated against the predefined schema, indexes created and other things which makes it a bit slow compared to the NoSQL approach discussed below.

In case of a NoSQL database like HBase, the data is first written to the Log (WAL), then to the memory. When the memory reaches a certain threshold, it's written to the Database. Before returning a success for a put call, the data has to be just written to the Log file, there is no need for the data to be written to the Database and validated against the schema.

Log files (first step in NoSQL) are just appended at the end and is much faster than writing to the Database (first step in RDBMS). The NoSQL data flow discussed above gives a higher threshold/rate during data inserts/updates in case of NoSQL Databases when compared to RDBMS.


Happy Hadooping......!!

Introduction to HDFS Erasure Coding in Apache Hadoop

Thanks to blog contributors from Cloudera Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compar...