Running Kafka in Kubernetes with Strimzi

Kubernetes is not the first platform that comes to mind to run Apache Kafka cluster. In fact, Kafka's heavy reliance on storage can be a pain point for Kubernetes' way of doing things when it comes to persistent storage. Kafka brokers are unique and stately, how can we implement this in Kubernetes?

Let's go over the basics of Strimzia Kafka operator for Kubernetes curated by red hat and see what problems it solves.

A particular focus will be placed on how to plug in additional Kafka tools into a Strimzi installation.

We will also compare Strimzi with other Kafka operators by giving their pros and cons.

Strimzi




Strimzi logo

Strimzi is a Kubernetes Operator which aims to reduce the costs of deploying Apache Kafka clusters on cloud-based infrastructures.

As an operator, Strimzi extends the Kubernetes API by providing resources to internally manage Kafka resources, including:

  • Kafka cluster
  • Kafka topics
  • Kafka user
  • Kafka MirrorMaker2 instances
  • Kafka Connect instances

The project is currently in the “Sandbox” stage at Cloud Native Computing Foundation.

Note: The CNCF website defines a “sandbox” project as “experimental projects that have not yet been extensively tested in production at the bleeding edge of technology.”

With Strimzi, deploy a 3 broker tls-encrypted cluster is as simple as applying the following YAML file:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.2.3
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.2"
    storage:
      type: jbod
      volumes:
        - id: 0
          type: persistent-claim
          size: 100Gi
          deleteClaim: false
        - id: 1
          type: persistent-claim
          size: 100Gi
          deleteClaim: false
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

A subject looks like this:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 1
  replicas: 1
  config:
    retention.ms: 7200000
    segment.bytes: 1073741824

Both of these examples are from examples directory for the Strimzi operator. This directory contains many more examples that cover all of Strimzi's features.

security

An interesting feature of Strimzi is out-of-the-box security features. By default, communication between brokers is encrypted using TLS, while communication with ZooKeeper is both authenticated and encrypted using mTLS.

The Apache ZooKeeper clusters that support the Kafka instances are not exposed outside the Kubernetes cluster, providing additional security.

These configurations are actually impossible to overridethought it is possible to access ZooKeeper by using one tweak project past Scholz.

Streamzi PodSets

Kubernetes comes with its own solution for managing distributed stateful applications: StatefulSets.

The official documentation says:

(StatefulSets) manages the distribution and scaling of a set of Pods, and provides guarantees about the order and uniqueness of those Pods.

While StatfulSets have the advantage of being Kubernetes native resources, they have some limitations.

Here are some examples:

  • Scaling up and down is linear. If you have a StatefulSet with 3 pods: pod-1, pod-2, pod-3, scaling up will create pod-4 and scaling down can only remove pod-4. This can be a problem when you want to eliminate a particular pod in your deployment. Applied to Kafka you can be in a situation where a bad topic can make a broker unstable, with StatefulSets you cannot remove this particular broker and scale out a new fresh broker.
  • All pods share the same specs (CPU, Mem, number of PVCs, etc.)
  • Critical node failure requires manual intervention

These limitations were addressed by the Strimzi team by developing their own resources: StreamziPodSetsa feature introduced in Strimzi 0.29.0.

The benefits of using StrimziPodSets include:

  • Scaling up and down is more flexible
  • Per broker configuration
  • Opening the gate for broker specialization when ZooKeeper-less Kafka is GA (KIP-500more on this topic later in the article)

A disadvantage using StrimziPodSets is that the Strimzi Operator instance becomes critical.

If you want to hear more about Strimzi PodSets, please take a look StrimziPodSets – What are they and why should you care? video by Jakub Scholz.

Distributing Strimzi

Strimzis Quickstart documentation is perfectly complete and functional.

We will focus the rest of the article on addressing useful issues not covered by Strimzi.

Kafka UI on top of Strimzi

Strimzi provides a lot of comfort to users when it comes to managing Kafka resources in Kubernetes. We wanted to bring something to the table by showing how to deploy a Kafka interface on top of a Strimzi cluster as a native Kubernetes resource.

There are several open source Kafka UI projects on GitHub, to name a few:

Let's go for Kafka UI which has the cleanest user interface (IMO) among the competition.

The project provides official Docker images that we can see in documentation. We will leverage this image and deploy a Kafka UI instance as a Kubernetes spread.

The following YAML is an example of a Kafka UI instance configured over a SCRAM-SHA-512 authenticated Strimzi Kafka cluster. The user interface is authenticated against an OpenLDAP via ldaps.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-kafka-ui
  namespace: kafka
spec:
  selector:
    matchLabels:
      app: cluster-kafka-ui
  template:
    metadata:
      labels:
        app: cluster-kafka-ui
    spec:
      containers:
        - image: provectuslabs/kafka-ui:v0.4.0
          name: kafka-ui
          ports:
            - containerPort: 8080
          env:
            - name: KAFKA_CLUSTERS_0_NAME
              value: "cluster"
            - name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
              value: "cluster-kafka-bootstrap:9092"
            - name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
              value: SASL_PLAINTEXT
            - name: KAFKA_CLUSTERS_0_PROPERTIES_SASL_MECHANISM
              value: SCRAM-SHA-512
            - name: KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG
              value: 'org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="XSnBiq6pkFNp";'
            
            - name: AUTH_TYPE
              value: LDAP
            - name: SPRING_LDAP_URLS
              value: ldaps://myldapinstance.company:636
            - name: SPRING_LDAP_DN_PATTERN
              value: uid={0},ou=People,dc=company
            - name: SPRING_LDAP_ADMINUSER
              value: uid=admin,ou=Apps,dc=company
            - name: SPRING_LDAP_ADMINPASSWORD
              value: Adm1nP@ssw0rd!
            
            - name: JAVA_OPTS
              value: "-Djdk.tls.client.cipherSuites=TLS_RSA_WITH_AES_128_GCM_SHA256 -Djavax.net.ssl.trustStore=/etc/kafka-ui/ssl/truststore.jks"
          volumeMounts:
            - name: truststore
              mountPath: /etc/kafka-ui/ssl
              readOnly: true
      volumes:
        - name: truststore
          secret:
            secretName: myldap-truststore

Note: By utilizing a PLAINTEXT internal listener on port 9092, we don't need to provide one KAFKA_CLUSTERS_0_PROPERTIES_SSL_TRUSTSTORE_LOCATION configuration.

With this configuration, users must authenticate via LDAP to the Kafka UI. Once logged in, the underlying user used for interactions with the Kafka cluster is the admin user defined in KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG. Role-based access control was recently introduced with this question.

Schema Registry with Strimzi

We had a functional need to deploy one Schema Registry instance for our Kafka clusters running in Kubernetes.

While Strimzi goes the extra mile by handling additional tools like Kafka Connect or MirrorMaker instances, it is not yet capable of deploying a Schema Registry.

To mitigate this problem, Rubin Observatory Science Quality and Reliability Engineering team worked on strimzi register operator.

The configurations we used are those shown in example section by README.

The only problem we encountered where the operator is not yet capable of deploying a Schema Registry backed up on a SCRAM-SHA-512 secured cluster.

What about ZooKeeper-less Kafka?

After many years of work at KIP-500the Apache Kafka team finally announced that they are running Kafka KCraft mode (ZooKeeper less) became production ready. The announcement was made as part of Kafka 3.3 release.

The Strimzi team started working on the KRaft mode in Streamzi 0.29.0. As stated in Strimzi documentationthe feature is still experimental, both at the Kafka and Strimzi level.

Strimzi's main contributor, Jakub Scholzhave commented the following in the question:

To call it production ready for new clusters I think is a bit strange. This means we would have to maintain two parallel code paths with guaranteed upgrades etc. for possibly a long time. So, TBH, I was hoping we'd have a lot more progress by this point and be more prepared for ZooKeeper removal. But as my personal opinion – I would probably be very hesitant to call something at this stage production ready anyway.

After these comments, we can guess that ZooKeeper-less Kafka will not be the default configuration in Strimzi in the next release (0.34.0 at the time of writing) but it will definitely happen at some point.

What about storage?

Storage is often a pain point with bare metal Kubernetes clusters and Kafka is no exception.

The community consensus to provide storage on Kubernetes is via Ceph with Hit thought there were other solutions (Longhorn or Open EBS on the open source side, Portworx or Lens size as proprietary solutions).

Comparing storage engines for bare metal Kubernetes clusters is too big a topic to include in this article, but please check out our previous article “Ceph object storage in a Kubernetes cluster with Rook” for more about Rook.

We had the opportunity to compare the performance of a three-broker Kafka installation using Strimzi/Rook Ceph against a three-broker Kafka cluster running on the same machine with direct disk access.

Here are the specs and benchmark results:

Specifications

Kubernetes environment:

  • Kafka version 3.2.0 on Kubernetes through Strimzi
  • 3 brokers (one pod per node)
  • 6 RBDs per broker (provided by Rook Ceph Storage Class)
  • Xms java standard (2g)
  • Xmx java standard (29g)

Bare metal environment:

  • Kafka version 3.2.0 as JVM process with Apache version
  • 3 brokers (one JVM per node)
  • 6 RBD drives per broker (JBOD with ext4 formatting)
  • Xms java standard (2g)
  • Xmx java standard (29g)

Notes: The benchmarks were run on the same machines (HP Gen 7 with 192 Gb RAM and 6 x 2 TB disks) running RHEL 7.9. Kubernetes was not running when Kafka as a JVM process was running and vice versa.

kafka-producer-perf-test \
--topic my-topic-benchmark \
--record-size 1000 \
--throughput -1 \
--producer.config /mnt/kafka.properties \
--num-records 50000000

Note: The subject my-topic-benchmark has 100 partitions and 1 replica.

Results

We ran the previous benchmark 10 times on each configuration and averaged the results:

Metric JBOD bare metal Ceph RBD Performance difference
Record/sec 75223 65207 – 13.3%
Average latency 1.45 1.28 + 11.1%

The results are interesting: while write performance was better on JBOD, latency was slower with Ceph.

Strimzi alternative

There are two main alternatives to Strimzi when it comes to running Kafka on Kubernetes:

We didn't thoroughly test Koperator so it would be unfair to compare it to Strimzi in this article.

As for the Confluent operator, it provides many features that we don't have with Strimzi. Here are some that we found interesting:

  • Schema Registry integration
  • ksqlDB integration
  • Support for LDAP authentication
  • Out-of-the-box UI (Confluent Control Center) for both administrators and developers
  • Warning
  • Storage in level

All of these come with the cost (literally) of purchasing a commercial license from Confluent. Note that the operator and control center can be tested during a 30-day trial period.

#Running #Kafka #Kubernetes #Strimzi

Source link

Leave a Reply