Apache HBase: RegionServer’s collocation

RegionServers are the processes that manage the storage and retrieval of data in Apache HBasethe non-relational column-oriented database i Apache Hadoop. It is through their daemons that all CRUD queries (for Create, Read, Update and Delete) are performed. Together with the Master, they are the guarantors of data backup and performance optimization. In production environments, a single RegionServer is deployed on each compute node, enabling scaling of both workloads and storage sharing by adding additional nodes.

Today, many companies still choose to run their infrastructure on-premises, that is, in the company's own data centers, as opposed to in the cloud. Their server deployments take advantage of the full memory space on each machine. Among them, those whose workloads are mostly based on Apache HBase do not always use all their resources. This is because the amount of RAM that a RegionServer can consume is limited by the Java JVM. If each RegionServer could take advantage of all the RAM available to it, its performance would potentially improve significantly. In this case, one can consider removing nodes from the cluster, which reduces the license costs that are directly dependent on it, and makes better use of the available resources. Therefore, on an on-premises infrastructure, can we allow each RegionServer to utilize more memory resources to support the workload due to the removal of one or more workers?

RegionServer's collocation

On Apache Ambari, the size of the memory space allocated to the JVM for each RegionServer can be selected: from 3 to 64 GB. But it documentation recommend not to exceed 30 GB. In fact, beyond this threshold, the Garbage Collector, whose mission is to clear data no longer used in RAM, is overloaded: this results in pauses in the middle of a workload, making it unavailable for several seconds. A possible solution would be to multiply the number of RegionServers per machine, to keep the 30 GB JVM, while taking advantage of the available RAM.

A project on Github describes the procedure to start multiple RegionServers on the same host, on an HDP deployment managed by Ambari. It is a script mainly composed of a for loop that runs the classic RegionServer startup steps, as many times as you like. For each new instance, the initial configuration file will be copied and the port numbers will be incremented. The log and PID folders for each RegionServer are created automatically.

This script must be placed in the HBase configuration directory, on each of the workers we want to multiply RegionServers on, and then started by Ambari. Here are some manual steps needed to set up this script.

#!/bin/bash


mv /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh /usr/hdp/current/hbase-regionserver/bin/hbase-daemon-per-instance.sh


cp ./hbase-daemon.sh /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh


chmod +rwx /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh

Although these new instances are not visible in Ambari, they are still visible on the HBase Master UI. After doing each of these steps, you just need to start each RegionServer through the Ambari interface. In fact, when an operation will be performed on a RegionServer through the Ambari interface, it will also be started on every RegionServer on the same host. For example, if a restart of the host's RegionServers worker01 done via Ambari, all new RegionServer instances of the host worker01 will also launch one restart.




HBase Master UI

Test environment and methodology

To measure the performance impact of these new RegionServers instances, using a YCSB the test suite is relevant. YCSB (for Yahoo! Cloud Serving Benchmark) is a set of open source programs used to evaluate the recovery and maintainability of computer programs. It is often used to compare the relative performance of NoSQL database management systems, such as HBase. This test suite replicates the following steps in each iteration:

  • Creation (or re-creation) of a table of 50 regions
  • Deposits with new records (~150 GB)
  • YCSB Workload A, “Heavy Update”: this workload consists of 50/50 reads and writes. Updates are made without first reading the element. An example of application would be cookies for a session on a website.
  • YCSB workload F, “Read-Modify-Write”: for this workload, each element will be read, modified, and then rewritten. An example of an application would be a database where a user's records are read and modified by the user.

For each workload, the execution time (in seconds) and operation rate (in operations/second) will be reset. These tests were performed on an HDP 2.6.5 cluster, on 4 bare metal workers with 180 GB of RAM and 4 disks of 3 TB each, with an inter-machine bandwidth of 10 GB/s. All RegionServers for these 4 workers formed a “Region Server Group” and were therefore used exclusively for the test set and for no other workload. In each situation, the test set was launched at least three times.

Adding new instances

This first step consists of removing a node from the cluster and then adding RegionServers on different hosts to measure performance. The following table shows the results: ‘w1' stands for ‘Worker 1' and the bars represent the number of RegionServers on each host. Finally, the indication “w̶1̶” indicates that the node “Worker 1” was removed from the HBase cluster for the given test. Results shown are averages calculated across all test batteries.

Table of performance obtained after removing a host

We notice that adding one or more RegionServers on each host does not compensate for the loss of a machine. Additionally, an imbalance between RegionServers from one host to another causes performance to drop. This is because the Master sees each RegionServer independently, as if they each had their own machine. Thus, the regions (data plots) are evenly distributed across each RegionServers. A host with more RegionServers than another will need to process more regions: the workload is then less evenly distributed between the machines. Since a RegionServer does not compensate for the loss of a machine, this proves that there is a bottleneck at some stage of the data flow.

Configuration optimization

This study consists of modifying properties on Ambari to get better performance with our new RegionServers. The properties in question are the following:

  • DataNodes:
    • Maximum number of threads (dfs.datanode.max.transfer.threads)
    • RAM allocated to DataNodes (dtnode_heapsize)
  • RegionServers:
    • Number of handlers (hbase.regionserver.handler.count)
    • RAM allocated to RegionServers (hbase_regionserver_heapsize)

The following table shows the performance obtained initially, then with 2 RegionServers per Worker, then with the most efficient configuration.

Table of the performance obtained with optimization

As can be seen, the gains of adding a RegionServer on each Worker are 10%: they are therefore negligible and explain the results of the first study, reinforced considering a realistic margin of error of 3%. The gains with 3 or more RegionServers per Worker are identical to the previous case, so no advantage is generated.

Among the properties tested, only two of them resulted in the most efficient configuration to increase profits by 2 RegionServers per Worker. Adding 10GB of RAM to RegionServers (hbase_regionserver_heapsize=30.720) and 4000 threads for DataNode data transfer (dfs.datanode.max.transfer.threads=20,480), we get 11% gain for workload A and 21% for F, which are the two workloads including reads. Again, the gains with 3 RegionServers after optimization mean no change. Although a profit of 21% cannot be called significant, it is still interesting.

Finally, new features have been tested but have not been able to provide any concrete benefit:

  • io.file.buffer.size: the size of the I/O buffer through the DataNode.
  • dfs.datanode.handler.count: number of handlers available per DataNode to respond to requests.
  • dfs.datanode.balance.bandwidthPerSec: maximum level of bandwidth that each DataNode can use.
  • dfs.datanode.readahead.bytes: number of bytes to read before the object is read.
  • dfs.client.read.shortcircuit.streams.cache.size: the size of the client's cache during a short-circuit read.

Metric analysis

The analysis of different activity measures allowed us to identify hypotheses that explain the obtained gains. The metrics in question, taken on each machine, are: the evolution of the CPU load (user, the system), that for RAM (Used, free, change), that of I/O (number of transfers, read/write speed, request speed) and the distribution of the loads (run queue, average loadnumber of tasks waiting on disk).

Hypotheses:

  • Increasing the number of Datanode threads increases read/write requests to the disks, generating tasks waiting for available IO. The default number of threads was therefore insufficient when a RegionServer was added to reach the performance threshold of the DataNodes.
  • With the addition of RAM on RegionServers, we can assume that reads are done in BlockCache, which reduces the number of read requests going to disk, which reduces tasks waiting for disk. This would explain the benefits for read workloads, negligible for write workloads, with or without optimization.

Therefore, we can say that the bottleneck is certainly on the discs. In fact, the insertion performance increases slightly due to the pressure that a second RegionServer puts on the disks, but it no longer works precisely because it depends only on the disks. The read performance is also better thanks to the pressure from the threads, but especially thanks to the enlargement of the various read caches. Also, it is interesting to note that the number of requests does not fully double when adding a second RegionServer. Without a bottleneck, performance would not have doubled.

Adding disks

So far the machines have been running with four 3TB drives, but how would performance change if they had more? 8 disks were therefore added to each machine. The test results are shown in the following table:

Table of performance obtained with 12 disks

First of all, we notice a 22% performance difference on workload A compared to the initial case, while they are more or less identical for the other two types of tests. On the other hand, on the same workload, gains increase to 1% with 2 RegionServers per host, then 10% with optimization. The increase in the number of disks has therefore had a significant impact on the performance of our RegionServer, but the addition of an extra instance, even with optimization, is of little interest.

Conclusion

As we have seen, in our environment the write and read performance is far from doubled by adding a second RegionServer on each Worker. The DataNoden then acts as a software bottleneck and gives a profit of 21% at best. RegionServers are optimized for peak performance on their own and work in a trio with Yarn Node Managers and HDFS DataNodes. This environment is designed to have only one RegionServer per worker, as evidenced by the performance increase with a single RegionServer when adding disks.

On the other hand, this case may favor a cluster with an excess of regions. In fact, the regions are fairly distributed between each RegionServer, without causing any performance loss. For some use cases, such as workload F, the removal of a physical node can also be considered to save licensing and operational costs without significant performance degradation.

#Apache #HBase #RegionServers #collocation

Source link

Leave a Reply