How-to: Deploy Apache Hadoop Clusters Like a Boss
The HDFS docs have some information, and logically it makes sense to separate the network of the Hadoop nodes from a “management” network. However, in our experience, multi-homed networks can be tricky to configure and support. The pain stems from Hadoop integrating with a large ecosystem of components that all have their own network and port-binding settings.
The traditional recommendation for worker nodes was to set swappiness (vm.swappiness) to 0. However, this behavior changed in newer kernels and we now recommend setting this to 1. (This post has more details.)
Cloudera Manager Like A Boss
We highly recommend using Cloudera Manager to manage your Hadoop cluster. Cloudera Manager offers many valuable features to make life much easier. The Cloudera Manager documentation is pretty clear on this but in order to stamp out any ambiguity, below are the high-level steps to do a production-ready Hadoop deployment with Cloudera Manager.
Set up an external database and pre-create the schemas needed for your deployment.
create database amon DEFAULT CHARACTER SET utf8;
grant all on amon.* TO ‘amon’@’%’ IDENTIFIED BY ‘amon_password’;
create database rman DEFAULT CHARACTER SET utf8;
grant all on rman.* TO ‘rman’@’%’ IDENTIFIED BY ‘rman_password’;
create database metastore DEFAULT CHARACTER SET utf8;
grant all on metastore.* TO ‘metastore’@’%’ IDENTIFIED BY ‘metastore_password’;
create database nav DEFAULT CHARACTER SET utf8;
grant all on nav.* TO ‘nav’@’%’ IDENTIFIED BY ‘nav_password’;
create database sentry DEFAULT CHARACTER SET utf8;
grant all on sentry.* TO ‘sentry’@’%’ IDENTIFIED BY ‘sentry_password’;
(Please change the passwords in the examples above!)
Install the cloudera-manager-server and cloudera-manager-daemons packages per documentation.
yum install cloudera-manager-server cloudera-manager-daemons
1
yum install cloudera-manager-server cloudera-manager-daemons
Run the scm_prepare_database.shscript specific to your database type.
/usr/share/cmf/schema/scm_prepare_database.sh mysql -h cm-db-host.cloudera.com -utemp -ptemp –scm-host cm-db-host.cloudera.com scm scm scm
1
/usr/share/cmf/schema/scm_prepare_database.sh mysql -h cm-db-host.cloudera.com -utemp -ptemp –scm-host cm-db-host.cloudera.com scm scm scm
Start the Cloudera Manager Service and follow the wizard from that point forward.
This is the simplest way to install Cloudera Manager and will get you started with a production-ready deployment in under 20 minutes.
Many customers purchase new hardware in regular cycles; adding new generations of computing resources makes sense as data volumes and workloads increase. For such environments containing heterogeneous disk, memory, or CPU configurations, Cloudera Manager allows Role Groups, which allow the administrator to specify memory, YARN containers, and Cgroup settings per node or per groups of nodes.
Previously, we published some recommendations on selecting new hardware for Apache Hadoop deployments. That post covered some important ideas regarding cluster planning and deployment such as workload profiling and general recommendations for CPU, disk, and memory allocations.
Tag:Apache Hadoop Cluster, big data hadoop online Training, Big Data Hadoop Training and Placement by mindsmapped, Deploy Apache Hadoop Clusters, Hadoop cluster, hadoop training, hadoop Training and Placement by mindsmapped, How-to: Deploy Apache Hadoop Clusters, How-to: Deploy Apache Hadoop Clusters Like a Boss, online big data hadoop training, online hadoop training