Dive into tdp-lib, the SDK responsible for TDP cluster management

All deployments are automated and Ansible plays a central role. With the growing complexity of the codebase, a new system was needed to overcome Ansible limitations that would allow us to take on new challenges.

TDP is a 100% open source big data platform based on the Hadoop ecosystem. Alloy offers support and professional services at TDP.

Any restrictions

Scheduling in Ansible is not easy. Having one task trigger at the end of another (thanks to handlers) or select the various tasks to be performed as needed (thanks to tags) is not scaled.

In TDP, the main way to control the distribution is through variables. Defining and versioning variables in Ansible is an easy way to complicate your life, it currently exists 22 different locations where you can define variables and as the project grows in complexity it is difficult to keep track of where each variable is defined or redefined. In addition, it can sometimes lead to defining standards beyond our control. We decided to add a 23rdrd way to easily version and add custom behavior. We had to develop the necessary tools to properly version these variables.


The answer to these two requirements is tdp-lib. This SDK allows compatible collections to define a DAG (directed acyclic graph) which contains all relationships between components and services, with detailed order of execution of the tasks. Additionally, it allows the definition of variables per service and per component in yaml files.

Based on these two features, tdp-lib can restart the minimal number of components of the TDP stack to apply configuration changes.

NOTE: tdp-lib does not replace ansible, it uses ansible internally to deploy TDP

TDP roadmap

The TDP ecosystem is moving forward and developing a set of tools to help manage cluster deployment.

  • tdp-lib: project which role is to implement core features like variable versioning and deployment with ansible, status: Main features implemented
  • tdp-server: REST Api with tdp-lib to deploy the managed cluster, status: Main features implemented
  • tdp-ui: Web interface to provide users with a top-notch user experience, status: In active development
  • tdp-cli: CLI client to interact with tdp server, status: Not started yet

TDP definitions


A terminology has been defined to precisely define what TDP can and cannot do. The basic concept of TDP is an operation. There are two types of operations, service operations and component operations. A service operation consists of two parts: the service name and the operation name (service_action). Service operations are often meta-operations (translate to oops), meant to schedule component operation in the service. Then there are the component operations. They consist of three parts: the service name (ie ZooKeeper), the component name (ie server) and the action name (ie install). Example: zookeeper_server_install.

Read more about operations

DAY definition

DAGs are defined using YAML files located inside the folder tdp_lib_dag of a collection (defined later). Each file is a list of dicts containing the keys: name, noop (optional), and depends_on. It is possible to use only one file to define each day's nodes, but as a convention we separate them.

Current state:

├── exporter.yml
├── hadoop.yml
├── hbase.yml
├── hdfs.yml
├── hive.yml
├── knox.yml
├── ranger.yml
├── spark3.yml
├── spark.yml
├── yarn.yml
└── zookeeper.yml

Example of a service definition:

- name: zookeeper_server_install
  depends_on: []

- name: zookeeper_install
  noop: yes
    - zookeeper_server_install
    - zookeeper_client_install
    - zookeeper_kerberos_install

Here is an example of a day containing only the zookeeper nodes:

Day ZooKeeper

TDP Vars

As said before, using variables in Ansible is difficult. In the early TDP versions we used defaults variables in the collections' roles that must be overridden with Ansible's group variables. Using this method meant that we couldn't possibly handle all the values ​​outside of a collection, which meant that we had difficulty versioning and offering control to external tools. This is where TDP Vars comes in. We decided to create a special place where we could define and manage variables. This location does not exist when the collections are cloned from github. This is a step you must perform during installation. The most common way to do it when not using the library is to copy the variables from {collection_path}/tdp_vars_defaults into your inventory (inventory/tdp_vars). Using the library, there are initialization functions that allow you to perform this necessary step.

In the folder tdp_vars, the folder names don't matter, only the variable files do. The variable loading in Ansible is performed in two steps. First, at Ansible initialization time, the variable is loaded with one inventory plugin: tosit.tdp.inventory. This plugin loads each YAML file in the tree inventory/tdp_vars and puts them in the group whose all as keys with a prefix. Important, variables are not resolved at this stage, if you look at their content you will see the jinja templates. For example the file inventory/tdp_vars/hadoop/hadoop.yml will be loaded as "TDP_AUTO_hadoop": {...values}. Then comes the runtime, where playbooks need to use plugins resolve, which will merge and then resolve the variables. The Resolve plugin takes an argument which is the service + component name used to create variable inheritance.

Example: Define the variable files hdfs and hdfs_namenode will load hdfs first and then override it with the values ​​from hdfs_namenode. Note that this only applies when the argument to the resolve plugin is hdfs_namenode. Using the argument hdfs_datanode will override hdfs variables with values ​​from hdfs_datanode.

The precedence rules defined in This article still valid. tdp vars can be considered to be at the default level for roles. And can therefore be overridden by group vars or host vars.

NOTE: An inventory plugin is used instead of one whose plugin because you cannot use the vars plugin from a collection in Ansible 2.9.


A collection is a unit that contains three folders: tdp_lib_dag, tdp_vars_defaultsand playbooks. The Playbooks folder contains all the features that are in the collection, they don't need to be part of DAG. Operations outside the day are called special actions.

Multiple collections can be used simultaneously by tdp-lib by providing a list of collections. There are specific rules and behaviors defined as follows:

  • You can define dependencies in DAG from another collection, but the other collection becomes mandatory
  • tdp collectionis considered the main collection and therefore cannot depend on another collection
  • The order in which the collection is provided is used to override operations and variables. If you provide the collections in the order [collection_a, collection_b] and they both define the playbook zookeeper_initwill the library use the playbook from collection_b. For variables, the library will merge (only on initialization) the variables from collection_a and collection_busing collection_b as infringing. (Useful to add auth_to_local rules globally for example)

TDP-lib is an important step towards managing a TDP cluster, it is not meant to be used directly by users, but has the core functions of what is needed to manage a TDP cluster.

The TDP ecosystem is moving fast, and tons of features are being implemented daily, don't hesitate to participatecontributions are welcome!

#Dive #tdplib #SDK #responsible #TDP #cluster #management

Source link

Leave a Reply