Hadoop Cluster creation

Hardware Requirement

  • Minimum of 3 machines to get started with installing a production cluster.
  • Machines can be physical or virtual.
  • Hard disk size – 100 GB or higher
  • Node RAM capacity
    • Name node - 16 GB or higher
    • Data node - 4 GB or higher

Software Requirement

Windows

  • .NET Framework of 4.5 or later
  • Windows 7, Windows Server 2008 R2 or later (only 64 bit version supported)

Linux

  • Ubuntu 14.04 LTS, Ubuntu 16.04 LTS (only 64 bit version supported)

On each of your hosts:

  • tar, p7zip

  • apt-get

JDK Requirements

  • Open source JDK 1.7

Download Syncfusion Big Data Cluster Manager and Big Data Agent from here

Getting Started

Create Cluster – Manual mode

Step 1: Install Syncfusion Big Data Agent on all machines where each machine acts as cluster node.

Step 2: Install Syncfusion Big Data Cluster Manager in any one of the machines where we have installed Big Data Agent. It can also be installed on separate machine in the same network where all the cluster nodes are present.

Step 3: The Cluster Manager installer will offer to run a start-up dashboard. We can also run the dashboard from a shortcut that should have been installed on desktop.

Step 4: Launch the dashboard and launch Cluster Manager.

Cluster creation dialog

Step 5: Login into Cluster Manager. The default user name and password are admin and admin respectively.

Cluster credentials dialog

The password can be changed by using the Change Password option available in the Cluster Manager.

ChangePassword dialog

Step 6: Click on the Create and select Manual Mode option.

Cluster mode dialog

Step 7: Provide a user defined name for the cluster, replication value and IP or host name information for the following nodes.

  • Active name node
  • standby name node
  • One or more data nodes

Cluster details dialog

NOTE

Import option allow you to load number of data nodes information from CSV file at a time. You should maintain nodes detail in single column of IP address or host name in CSV file format.

Step 8: Click Next, the Cluster Manager will automatically do needed validations including DNS and reverse DNS validation.

Cluster validation dialog

All default properties will be set for all cluster nodes, if you need to edit Hadoop configuration, use advance configuration option.

Cluster Success dialog

Cluster configuration dialog

XML configuration dialog

Step 9: Once validation is successful on clicking Next, configuration popup with 2 options will be showed regarding Hadoop configuration.

Default: All default properties will be set for all cluster nodes.
Recommended: The Cluster Manager will automatically set configuration properties based on hardware specification of nodes such as RAM capacity.

Hardware specification dialog

Also you can modify the recommended properties,

Cluster properties dialog

Once properties are verified, cluster creation can be started by clicking Create button.

Cluster configuration dialog

Package transfering dialog

NOTE

Cluster formation contains SDK package shipment, configuring Hadoop XML files, starting Hadoop services, uploading getting started samples and Oozie libraries into HDFS. This process will take 15-20 minutes based network capacity and number of nodes.

Step 10: Once everything is done, the Cluster Manager will show a running cluster.

Cluster running dialog

Cluster Creation – Automatic mode

In Cluster creation manual mode, you have to install Big Data Agent in each cluster node manually. In Cluster creation - automatic mode, the Agent will be automatically installed remotely.

We have used PowerShell to install the Agent remotely so it is needed to enable PowerShell remoting and file size limit on each cluster node for one time. Run following commands in PowerShell as run as administrator.

NOTE

We does not currently support automatic mode of agent installation in Linux nodes. We will provide this support in upcoming release.

PowerShell commands
Enable-PSRemoting -SkipNetworkProfileCheck –Force
Command for receiving large file size in PowerShell
Register-PSSessionConfiguration -Name DataNoLimits –force

Set-PSSessionConfiguration -Name DataNoLimits -MaximumReceivedDataSizePerCommandMB 500 -MaximumReceivedObjectSizeMB 500 –force

Set-Item WSMan:localhost\Client\TrustedHosts *

Step 1: On Cluster Manager home page, click Create Cluster and choose automatic mode.

Automatic Cluster dialog

Step 2: Provide a user defined name for the cluster, replication value and IP or host name and username and password for the machines and click NEXT to proceed as regular cluster creation.

Automatic Cluster details dialog

NOTE

The Agent will be automatically installed in all cluster nodes and will proceed with regular cluster creation.

Pseudo node Cluster

We can create pseudo node cluster using Cluster Manager (single node cluster). It will be useful for development purpose. For standard installation you need to have minimum 3 machines for production cluster but for development purpose you can create a pseudo node cluster using one machine.

Step 1: Install the Big Data Agent in a machine and the Cluster Manager setup in the same or separate machine of same network.

Step 2: From Cluster Manager’s home page, click Create Cluster and choose Local Development Cluster.

Cluster mode dialog

Step 3: Provide cluster name , host name or IP of the Agent installed machine , port number where the Agent service is running, by default it is 60008 and click Done to create pseudo node cluster.

Pseudonode cluster details dialog

Step 4: It will start to transfer the packages, configuration and up a running Hadoop cluster.

Pseudonode cluster running dialog