Internship for Big Data Infrastructure

Job description

Big Data and distributed computing are at the heart of Adaltas. We accompany our partners in the commissioning, maintenance and optimization of some of the largest clusters in France. Since recently, we also provide support for daily operations.

As a strong defender and active contributor of open source, we are at the forefront of the data platform initiative TDP (TOSIT Data Platform).

During this internship you will contribute to the development of TDP, its industrialization and the integration of new open source components and new functions. You will be accompanied by Alliage expert team responsible for TDP editor support.

You will also work with the Kubernetes ecosystem and automation of data lab installations Onyxiawhich we want to make available to our customers as well as to students as part of our teaching modules (devops, big data, etc.).

Your qualifications will help expand Alliage's services open source support offer. Supported open source components include TDP, Onyxia, ScyllaDB, … For those who want to do some web work beyond big data, we already have a very functional intranet (ticket management, time management, advanced search, mentions and related articles, …) but other nice features expected.

You will train GitOps release chains and write articles.

You will work in a team with senior advisers as a mentor.

Company presentation

Adaltas is a consulting agency led by a team of open source experts with a focus on data management. We deploy and operate the storage and computing infrastructure in collaboration with our customers.

Collaborating with Cloudera and Databricks, we are also open source contributors. We invite you to browse our website and our many technical publications to learn more about the company.

Skills required and to be acquired

Automating the deployment of the Onyxia data lab requires knowledge of Kubernetes and Cloud native. You must be comfortable with the Kubernetes ecosystem, the Hadoop ecosystem and the distributed computing model. You will master how the fundamental components (HDFS, YARN, object storage, Kerberos, OAuth, etc.) work together to meet the use of big data.

Good knowledge of using Linux and the command line is required.

During the internship you will learn:

  • The Kubernetes/Hadoop ecosystem to contribute to the TDP project
  • Secure cluster with Kerberos and SSL/TLS certificate
  • High availability (HA) of services
  • Distribution of resources and workload
  • Monitoring of services and hosted applications
  • Fault-tolerant Hadoop cluster with recoverability of lost data in case of infrastructure failure
  • Infrastructure as Code (IaC) via DevOps tools such as Ansible and Drifter
  • Be comfortable with the architecture and operation of a data lake house
  • Code collaboration with Git, Gitlab and Github


  • Familiarize yourself with the architecture and configuration methods of the TDP deployment
  • Deploy and test secure and highly available TDP clusters
  • Contribute to the TDP knowledge base with troubleshooting guides, FAQs, and articles
  • Actively contribute ideas and code to make iterative improvements to the TDP ecosystem
  • Research and analyze the differences between the main Hadoop distributions
  • Update Adaltas Cloud using Nikita
  • Contribute to the development of a tool to collect customer logs and metrics on TDP and ScyllaDB
  • Actively contribute ideas to develop our support solution

More information

  • Location: Boulogne Billancourt, France
  • Language: French or English
  • Start date: March 2023
  • Duration: 6 months

Much of the digital world runs on open source software and the Big Data industry is booming. This internship is an opportunity to gain valuable experience in both domains. TDP is now the only true open source Hadoop distribution. This is the right time to join us. As part of the TDP team, you will have the opportunity to learn one of the most important models of big data processing and participate in the development and future roadmap of TDP. We believe that this is an exciting opportunity and that after completing your internship you will be ready for a successful career in Big Data.

Equipment available

A laptop with the following features:

  • 32 GB of RAM
  • 1TB SSD
  • 8c/16h CPU

A cluster consisting of:

  • 3x 28c/56t Intel Xeon Scalable Gold 6132
  • 3x 192TB RAM DDR4 ECC 2666MHz
  • 3x 14 SSD 480GB SATA Intel S4500 6Gbps

A Kubernetes cluster and a Hadoop cluster.


  • Salary €1200/month
  • Restaurant tickets
  • Transport card
  • Participation in an international conference

In the past, the conferences we have attended include CubeCon organized by the CNCF Foundation Open Source Summit from the Linux Foundation and Fosdem.

For any request for further information and to submit your application, please contact David Worms:

#Internship #Big #Data #Infrastructure

Source link

Leave a Reply