Hadoop – the data processing engine based on MapReduce – is being superceded by new processing engines: Apache Tez, Apache Storm, Apache Spark and others. YARN makes any data processing future possible. But Hadoop the platform – thanks to YARN …
You might be wondering what bearing a history lesson may have on a technology project such as Apache Drill. In order to truly appreciate Apache Drill, it is important to understand the history of the projects in this space, as well …
Apache Sqoop provides a framework to move data between HDFS and relational databases in a parallel fashion using Hadoop’s MR framework. As Hadoop becomes more popular in enterprises, there is a growing need to move data from non-relational sources like mainframe …
The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of …
Hadoop is a popular open-source implementation of MapReduce framework designed to analyze large data sets. It has two parts; Hadoop Distributed File System (HDFS) and MapReduce. HDFS is the file system used by Hadoop to store its data. It has …
Big data techniques are becoming mainstream in an increasing number of businesses, but how do people get self-service, interactive access to their big data? And how do they do this without having to train their SQL-literate employees to be advanced …
The HDFS docs have some information, and logically it makes sense to separate the network of the Hadoop nodes from a “management” network. However, in our experience, multi-homed networks can be tricky to configure and support. The pain stems from …