We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases.
Let’s consider a shopping website. There is a need to maintain product information, transaction data as well as product reviews. That data can easily be stored in a relational database management system (RDBMS). But as the number of comments increases, we must alter the table to accommodate the increase. These changes are near-real-time and data modeling becomes very challenging due to the time and resources required to complete these changes. Any changes in the RDBMS schema may also affect the performance of the production database. There can be many scenarios similar to this where changes in the RDBMS schema are required due to the nature and volume of information stored in the database. These challenges can be addressed using toolsets from the Hadoop ecosystem.