
Dive into tdp-lib, the SDK responsible for TDP cluster management
All deployments are automated and Ansible plays a central role. With the growing complexity of the codebase, a new system was needed to overcome Ansible limitations that would allow us to take on new challenges.
TDP is a 100% open source big data platform based on the Hadoop ecosystem. Alloy offers support and professional services at TDP.
Any restrictions
Scheduling in Ansible is not easy. Having one task trigger at the end of another (thanks to handlers
) or select the various tasks to be performed as needed (thanks to tags
) is not scaled.
In TDP, the main way to control the distribution is through variables. Defining and versioning variables in Ansible is an easy way to complicate your life, it currently exists 22 different locations where you can define variables and as the project grows in complexity it is difficult to keep track of where each variable is defined or redefined. In addition, it can sometimes lead to defining standards beyond our control. We decided to add a 23rdrd way to easily version and add custom behavior. We had to develop the necessary tools to properly version these variables.
TDP Lib
The answer to these two requirements is tdp-lib
. This SDK allows compatible collections to define a DAG (directed acyclic graph) which contains all relationships between components and services, with detailed order of execution of the tasks. Additionally, it allows the definition of variables per service and per component in yaml files.
Based on these two features, tdp-lib can restart the minimal number of components of the TDP stack to apply configuration changes.
NOTE: tdp-lib does not replace ansible, it uses ansible internally to deploy TDP
TDP roadmap
The TDP ecosystem is moving forward and developing a set of tools to help manage cluster deployment.
- tdp-lib: project which role is to implement core features like variable versioning and deployment with ansible, status: Main features implemented
- tdp-server: REST Api with tdp-lib to deploy the managed cluster, status: Main features implemented
- tdp-ui: Web interface to provide users with a top-notch user experience, status: In active development
- tdp-cli: CLI client to interact with tdp server, status: Not started yet
TDP definitions
Operations
A terminology has been defined to precisely define what TDP can and cannot do. The basic concept of TDP is an operation. There are two types of operations, service operations and component operations. A service operation consists of two parts: the service name and the operation name (service_action
). Service operations are often meta-operations (translate to oops), meant to schedule component operation in the service. Then there are the component operations. They consist of three parts: the service name (ie ZooKeeper), the component name (ie server) and the action name (ie install). Example: zookeeper_server_install
.
DAY definition
DAGs are defined using YAML files located inside the folder tdp_lib_dag
of a collection (defined later). Each file is a list of dicts containing the keys: name
, noop
(optional), and depends_on
. It is possible to use only one file to define each day's nodes, but as a convention we separate them.
Current state:
tdp_lib_dag/
├── exporter.yml
├── hadoop.yml
├── hbase.yml
├── hdfs.yml
├── hive.yml
├── knox.yml
├── ranger.yml
├── spark3.yml
├── spark.yml
├── yarn.yml
└── zookeeper.yml
Example of a service definition:
---
- name: zookeeper_server_install
depends_on: []
- name: zookeeper_install
noop: yes
depends_on:
- zookeeper_server_install
- zookeeper_client_install
- zookeeper_kerberos_install
Here is an example of a day containing only the zookeeper nodes:
TDP Vars
As said before, using variables in Ansible is difficult. In the early TDP versions we used defaults
variables in the collections' roles that must be overridden with Ansible's group variables. Using this method meant that we couldn't possibly handle all the values outside of a collection, which meant that we had difficulty versioning and offering control to external tools. This is where TDP Vars comes in. We decided to create a special place where we could define and manage variables. This location does not exist when the collections are cloned from github. This is a step you must perform during installation. The most common way to do it when not using the library is to copy the variables from {collection_path}/tdp_vars_defaults
into your inventory (inventory/tdp_vars
). Using the library, there are initialization functions that allow you to perform this necessary step.
In the folder tdp_vars
, the folder names don't matter, only the variable files do. The variable loading in Ansible is performed in two steps. First, at Ansible initialization time, the variable is loaded with one inventory plugin: tosit.tdp.inventory
. This plugin loads each YAML file in the tree inventory/tdp_vars
and puts them in the group whose all
as keys with a prefix. Important, variables are not resolved at this stage, if you look at their content you will see the jinja templates. For example the file inventory/tdp_vars/hadoop/hadoop.yml
will be loaded as "TDP_AUTO_hadoop": {...values}
. Then comes the runtime, where playbooks need to use plugins resolve
, which will merge and then resolve the variables. The Resolve plugin takes an argument which is the service + component name used to create variable inheritance.
Example: Define the variable files hdfs
and hdfs_namenode
will load hdfs
first and then override it with the values from hdfs_namenode
. Note that this only applies when the argument to the resolve plugin is hdfs_namenode
. Using the argument hdfs_datanode
will override hdfs
variables with values from hdfs_datanode
.
The precedence rules defined in This article still valid. tdp vars can be considered to be at the default level for roles. And can therefore be overridden by group vars or host vars.
NOTE: An inventory plugin is used instead of one whose plugin because you cannot use the vars plugin from a collection in Ansible 2.9.
Collections
A collection is a unit that contains three folders: tdp_lib_dag
, tdp_vars_defaults
and playbooks
. The Playbooks folder contains all the features that are in the collection, they don't need to be part of DAG. Operations outside the day are called special actions
.
Multiple collections can be used simultaneously by tdp-lib by providing a list of collections. There are specific rules and behaviors defined as follows:
- You can define dependencies in DAG from another collection, but the other collection becomes mandatory
- tdp collectionis considered the main collection and therefore cannot depend on another collection
- The order in which the collection is provided is used to override operations and variables. If you provide the collections in the order
[collection_a, collection_b]
and they both define the playbookzookeeper_init
will the library use the playbook fromcollection_b
. For variables, the library will merge (only on initialization) the variables fromcollection_a
andcollection_b
usingcollection_b
as infringing. (Useful to addauth_to_local
rules globally for example)
TDP-lib is an important step towards managing a TDP cluster, it is not meant to be used directly by users, but has the core functions of what is needed to manage a TDP cluster.
The TDP ecosystem is moving fast, and tons of features are being implemented daily, don't hesitate to participatecontributions are welcome!
#Dive #tdplib #SDK #responsible #TDP #cluster #management
Source link