Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

After our last one Cloudera Data Platform (CDP) overview, we cover how to deploy CDP Private Cloud on your on-premises infrastructure. It is fully automated using the Ansible cookbooks published by Cloudera and it is reproducible on your local host using Vagrant.

CDP is an enterprise data cloud. It provides a powerful Big Data platform, built-in security with automated compliance and data protection governance, and policy-based, metadata-driven analytics for end users.

Deploy one CDP Private Cloud clustering is not an easy task. Therefore, we present a way to get a local cluster up and running in a few simple steps. We will deploy a basic cluster consisting of two nodes, a master and a worker. In our cluster we will run the following services: HDFS, YARN and Zookeeper.

Conditions

You can use the on-premises infrastructure of your choice to deploy CDP Private Cloud. In this tutorial we will use Drifter and VirtualBox to quickly start two virtual machines that will act as the nodes of the cluster.

VirtualBox

VirtualBox is a cross-platform virtualization application. Download the latest version of VirtualBox.

Drifter

Vagrant is a tool for building and managing virtual machine environments. Download the latest version of Drifter.

Once Vagrant is installed, you need to install a plugin that automatically installs the host's VirtualBox Guest Additions on the guest system. Open a terminal and type the following command:

vagrant plugin install vagrant-vbguest

Dock workers

Cloudera Deploy runs from within a Docker container. When it runs, it starts the cluster. Follow the official Docker instructions to install Docker on your machine:

Getting Started

Bootstrap your nodes

A Vagrantfile used to configure and provision virtual machines per project. Make sure you have an ssh key on your host before proceeding. If none is provided, the quickstart (next section) will generate an SSH key pair. Create a new file called Vagrantfile in your working directory and paste the following code:

box = "centos/7"

Vagrant.configure("2") do |config|
  config.vm.synced_folder ".", "/vagrant", disabled: true
  config.ssh.insert_key = false
  config.vm.box_check_update = false
  ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
  config.vm.provision "Add ssh_pub_key", type: "shell" do |s|
    s.inline = <<-SHELL
      echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
      sudo mkdir -p /root/.ssh/
      sudo echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
      sudo touch /home/vagrant/.ssh/config
      sudo chmod 600 /home/vagrant/.ssh/config
      sudo chown vagrant /home/vagrant/.ssh/config
    SHELL
  end
  config.vm.define :master01 do |node|
    node.vm.box = box
    node.vm.network :private_network, ip: "10.10.10.11"
    node.vm.network :forwarded_port, guest: 22, host: 24011, auto_correct: true
    node.vm.network :forwarded_port, guest: 8080, host: 8080, auto_correct: true
    node.vm.provider "virtualbox" do |d|
      d.memory = 8192
    end
    node.vm.hostname = "master01.nikita.local"
  end
  config.vm.define :worker01 do |node|
    node.vm.box = box
    node.vm.network :private_network, ip: "10.10.10.16"
    node.vm.network :forwarded_port, guest: 22, host: 24015, auto_correct: true
    node.vm.provider "virtualbox" do |d|
      d.customize ["modifyvm", :id, "--memory", 2048]
      d.customize ["modifyvm", :id, "--cpus", 2]
      d.customize ["modifyvm", :id, "--ioapic", "on"]
    end
    node.vm.hostname = "worker01.nikita.local"
  end
end

The master01 the node has master01.nikita.local FQDN and 10.10.10.11 IP. The worker01 the node has master01.nikita.local FQDN and 10.10.10.16 IP.

Now run the following command:

It creates two connected virtual machines that form a small cluster.

Edit your local /etc/hosts file by adding the following lines:

10.10.10.11 master01.nikita.local
10.10.10.16 worker01.nikita.local

Now connect to master01 using ssh:

Add or edit the following lines /etc/hosts file:

10.10.10.11 master01.nikita.local
10.10.10.16 worker01.nikita.local

Repeat the operation by connecting to worker01.

Download the quickstart script

The quickstart.sh the script will set up the Docker container with the software dependencies you need for deployment. Download it to your host computer with the following command:

curl https://raw.githubusercontent.com/cloudera-labs/cloudera-deploy/main/quickstart.sh -o quickstart.sh

Run the quickstart script

The script will prepare and run the Ansible Runner in a Docker container.

chmod +x quickstart.sh
./quickstart.sh

You should see cldr {build}-{version} #> orange prompt. You are now inside the container.

Create an inventory file

Navigate to cloudera-deploy folder:

Create a new file called inventory_static.ini which contains your hosts:

[cloudera_manager]
master01.nikita.local

[cluster_master_nodes]
master01.nikita.local host_template=Master1

[cluster_worker_nodes]
worker01.nikita.local

[cluster_worker_nodes:vars]
host_template=Workers

[cluster:children]
cluster_master_nodes
cluster_worker_nodes

[db_server]
master01.nikita.local

[deployment:children]
cluster
db_server

[deployment:vars]



ansible_user=vagrant

Configure the cluster

Set use_download_mirror to no in the definition file located at examples/sandbox/definition.yml to avoid triggering behaviors dependent on public cloud services.

Run the main playbook

ansible-playbook /opt/cloudera-deploy/main.yml -e "definition_path=examples/sandbox" -e "profile=/opt/cloudera-deploy/profile.yml" -i /opt/cloudera-deploy/inventory_static.ini -t default_cluster

The command creates a CDP Private Base cluster with your local infrastructure. More specifically, it deploys a cluster with HDFS, YARN, and Zookeeper.

Conclusion

Cloudera Data Platform can be deployed in a variety of ways, making it a versatile option when considering a data platform. In this article, we described how to deploy a CDP Private Cloud cluster using Cloudera's official deployment script. This allows the user to test the platform locally and make relevant business decisions. From there, you can add services to your cluster as well as configure CDP Private Cloud's built-in components.

Troubleshooting

Should you encounter any problems with SSH between the host and the two virtual machines, you can force the installation of Virtualbox Guest Additions for master01 and worker01 by adding the following line to their individual configurations i Vagrantfile:

node.vbguest.installer_options = { allow_kernel_upgrade: true }

SSH_AUTH_SOCK

The quickstart.sh script may terminate abruptly if it detects that SSH_AUTH_SOCK the path is not properly defined or empty. If you encounter this error, first run the following command:

This returns the path to the unix socket used by ssh-agent, which must be added as the variable SSH_AUTH_SOCK to the quickstart script for ssh to work properly; your quickstart script should now look like this:




SSH_AUTH_SOCK variable in quickstart.sh

In this example, the socket path is “/run/user/1000/keyring/ssh”.

#Cloudera #Deploy #install #Cloudera #Data #Platform #CDP #Private #Cloud

Source link

Leave a Reply