Apache Kafka on Kubernetes with Strimzi — Part 1: Creating and Deploying a Strimzi Kafka Cluster

Introduction

When you Google “deploying Kafka on Kubernetes”, you may encounter so many different deployment approaches and various Kubernetes yaml files on the internet. After trying some of them and putting time on creating a fully working cluster and fixing problems, you realize that using Kafka in Kubernetes is hard! But don’t you worry, Strimzi is here to help!

What is Strimzi?

Strimzi provides container images and Operators for running Kafka on Kubernetes. These Operators are fundamental to the running of Strimzi and are built with specialist operational knowledge to effectively manage a Kafka cluster.
Strimzi Operators can simplify many Kafka related processes including: deploying, running and managing Kafka cluster and components, configuring and securing access to Kafka, creating and managing topics and users, etc.

  • Kafka. A cluster of Kafka broker nodes
  • ZooKeeper. Storing configuration data and cluster coordination
  • Kafka Connect. An integration toolkit for streaming data between Kafka brokers and external systems using Connector (source and sink) plugins. (Also supports Source2Image)
  • Kafka MirrorMaker. Replicating data between two Kafka clusters, within or across data centers.
  • Kafka Bridge. Providing a RESTful interface for integrating HTTP-based clients with a Kafka cluster without the need for client applications to understand the Kafka protocol
  • Kafka Exporter. Extracting data for analysis as Prometheus metrics like offsets, consumer groups, consumer lag, topics and…

Strimzi Operators

As we have already mentioned, with Strimzi Operators, you no longer need to configure and manage complex Kafka related tasks and these Operators will do the hard work for you!

  • Cluster Operator. Deploys and manages Apache Kafka clusters, Kafka Connect, Kafka MirrorMaker, Kafka Bridge, Kafka Exporter, and the Entity Operator
  • Entity Operator. Comprises the Topic Operator and User Operator
  • Topic Operator. Manages Kafka topics.
  • User Operator. Manages Kafka users.
Operators within the Strimzi architecture (source)

Deploying Kafka using Strimzi

For this tutorial, we are going to use Minikube. In order to create our Kafka cluster, we need to deploy yaml files in a specific order:

  1. Deploying the Cluster Operator to manage our Kafka cluster
  2. Deploying the Kafka cluster with ZooKeeper using the Cluster Operator. Topic and User Operators can be deployed in this step with the same deploy file or you can deploy them later.
  3. Now you can deploy other components as you like (Optional):
  • Topic and User Operators
  • Kafka Connect
  • Kafka Bridge
  • Kafka MirrorMaker
  • Kafka Exporter and monitoring metrics

Deploying the Cluster Operator

First and foremost, let’s deploy the Cluster Operator. The Cluster Operator is responsible for deploying and managing Apache Kafka clusters within a Kubernetes cluster.

$ kubectl create ns kafka
$ sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
$ sed -i '' 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

Single Namespace

If you just want to deploy your Kafka Cluster in the same namespace as the Cluster Operator, no need to do anything else. Just deploy all yaml files from the cluster-operator folder. If you still want your Cluster Operator to watch only a single namespace but want your Kafka Cluster to reside in a different namespace, follow the approach presented in the Multiple Namespaces section.

Multiple Namespaces

Sometimes, we want to have multiple Kafka Clusters on different namespaces. So we need to tell the Operator the location of our Kafka Clusters (Kafka resources). Open 060-Deployment-strimzi-cluster-operator.yaml file and locate the STRIMZI_NAMESPACE environment variable. Change valueFrom to value and add your namespaces like below:

# ...
env:
- name: STRIMZI_NAMESPACE
value: kafka-cluster-1,kafka-cluster-2,kafka-cluster-3
# ...
$ kubectl apply -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n watched-namespace$ kubectl apply -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n watched-namespace$ kubectl apply -f install/cluster-operator/032-RoleBinding-strimzi-cluster-operator-topic-operator-delegation.yaml -n watched-namespace

All Namespaces

For watching all namespaces, just like above, Open 060-Deployment-strimzi-cluster-operator.yaml file and locate the STRIMZI_NAMESPACE environment variable. Change valueFrom to value and use “*” as the value:

# ...
env:
- name: STRIMZI_NAMESPACE
value: "*"
# ...
$ kubectl create clusterrolebinding strimzi-cluster-operator-namespaced --clusterrole=strimzi-cluster-operator-namespaced --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator

$ kubectl create clusterrolebinding strimzi-cluster-operator-entity-operator-delegation --clusterrole=strimzi-entity-operator --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator

$ kubectl create clusterrolebinding strimzi-cluster-operator-topic-operator-delegation --clusterrole=strimzi-topic-operator --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator
$ kubectl apply -f install/cluster-operator -n kafka
$ kubectl get deployments -n kafkaNAME                         READY   UP-TO-DATE   AVAILABLE   AGE
strimzi-cluster-operator 1/1 1 1 40m

Deploying the Kafka Cluster

There are two approaches to deploy a Kafka Cluster:

  • Ephemeral Cluster. An ephemeral cluster is a Cluster with temporary storage which is suitable for development and testing. This deployment uses volumes for storing broker information (for ZooKeeper) and topics or partitions (for Kafka). So all the data will be removed once the Pod goes down.
  • Persistent Cluster. Uses PersistentVolumes to store ZooKeeper and Kafka data. The PersistentVolume is claimed using a PersistentVolumeClaim to make it independent of the actual type of the PersistentVolume. Also, the PersistentVolumeClaim can use a StorageClass to trigger automatic volume provisioning.
Simple Kafka deployment YAML for Strimzi
  1. Name of your Kafka cluster.
  2. The Kafka version. In case of upgrading, checkout the Upgrading Procedure from the Strimzi documentation
  3. Configuring the number of the Kafka broker nodes
  4. Setting container resource constraints.
    A request is the amount of the resource that the system will guarantee. Kubernetes decides on which node to put the Pod based on the request values.
    A limit is the maximum amount of resources that the container is allowed to use. If the request is not set, it defaults to limit. And when the limit is not set, it defaults to zero (unbounded).
  5. Specifies the minimum (-Xms) and maximum (-Xmx) heap allocation for the JVM
  6. Listeners configure how clients connect to a Kafka cluster. Multiple listeners can be configured by specifying unique name and port for each listener.
    Two types of listeners are currently supported which are Internal Listener (for accessing the cluster within the Kubernetes) and External Listener (for accessing the cluster from outside of the Kubernetes). TLS encryption can also be enabled for listeners.
    Internal listeners are specified using an internal type. And for external types, these values can be used:
    route. to use OpenShift routes and the default HAProxy router
    loadbalancer. to use loadbalancer services
    nodeport. to use ports on Kubernetes nodes (external access)
    ingress. to use Kubernetes Ingress and the NGINX Ingress Controller for Kubernetes.
  7. You can specify and configure all of the options in the “Broker Configs” section of the Apache Kafka documentation apart from those managed directly by Strimzi. Visit the Strimzi documentation for the forbidden configs.
  8. Storage is configured as ephemeral, persistent-claim or jbod.
    Ephemeral. As we have discussed previously, an Ephemeral cluster uses an emptyDir volume. The data stored in this volume will be lost after the Pod goes down. So this type of storage is only suitable for development and testing. When using clusters with multiple ZooKeeper nodes and replication factor higher than one, when a Pod restarts, it can recover data from other nodes.
    Persistent Claim. Uses Persistent Volume Claims to provision persistent volumes for storing data. a StorageClass can also be set to use for dynamic volume provisioning. Also, we can use a selector for selecting a specific persistent volume to use. It contains key:value pairs representing labels for selecting such a volume. Check out the documentation for more details. In Minikube, a default StorageClass with the name “standard” has been configured automatically and we can use it like above.
    JBOD. By using jbod type, we can specify multiple disks or volumes (can be either ephemeral or persistent) for our Kafka cluster.
  9. Loggers and log levels can be specified easily with this config.
  10. ZooKeeper configurations can be customized easily. Most of the configurations are similar to the cluster configs. Some options (like Security, Listeners, etc.) cannot be customized since the Strimzi itself is managing them.
  11. The Entity Operator is responsible for managing Kafka-related entities in a running Kafka cluster. It supports several sub-properties:
    tlsSidecar. Contains the configuration of the TLS sidecar container, which is used to communicate with ZooKeeper.
    topicOperator. contains the configuration of the Topic Operator. When this option is missing, the Entity Operator is deployed without the Topic Operator. If an empty object ({}) is used, all properties use their default values.
    userOperator. contains the configuration of the User Operator. When this option is missing, the Entity Operator is deployed without the User Operator. If an empty object ({}) is used, all properties use their default values.
    template. contains the configuration of the Entity Operator pod, such as labels, annotations, affinity, and tolerations
$ kubectl apply -f kafka-deployment.yaml -n kafka
$ kubectl get deployments -n kafka

Managing Topics

Now that our Kafka cluster is up and running, we need to add our desired topics. By using the Topic Operator, we can easily manage the topics in our Kafka cluster through Kubernetes resources. These resources are called KafkaTopic.

Creating a Topic in Strimzi Kafka Cluster

Testing our Kafka Cluster

We have created our Kafka Cluster inside Kubernetes. Now, it’s time to test it.

  • kafka-console-producer for producing messages
  • kafka-console-consumer for consuming messages.
  1. Get the DNS name of the Kafka Cluster service in order to connect to it:
$ kubectl get svc -n kafka
Kafka Cluster service name (address)
$ kubectl run kafka-producer -ti --image=strimzi/kafka:0.20.0-rc1-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap.kafka:9092 --topic my-topic
$ kubectl run kafka-consumer -ti --image=strimzi/kafka:0.20.0-rc1-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic my-topic --from-beginning
  1. Download the latest binary release of Kafka from here.
  2. Get the IP of the minikube by running minikube ip command (mine is 192.168.99.105).
  3. Run kubectl get svc -n kafka and find the exposed port of the kafka-external-bootstrap (highlighted with blue in the picture above)
  4. The kafka-console-producer and kafka-console-consumer are in the /bin directory of the downloaded package (for Windows, navigate to /bin/windows)
  5. Like above, fire up your console producer and consumer with the below commands (Windows commands are the same) and test your cluster from outside of the Kubernetes:
$ kafka-console-producer.sh --broker-list 192.168.99.105:30825 --topic my-topic$ kafka-console-consumer.sh --bootstrap-server 192.168.99.105:30825 --topic my-topic --from-beginning

Conclusion

Strimzi provides a nice abstraction over the complicated configs of a Kafka Cluster on Kubernetes. It takes a small amount of time to deploy a fully working cluster. Also, it has an extensive amount of configurations so you can deploy your cluster in any way you like.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sina Nourian

Sina Nourian

CTO at Daneshgar Technology Co. Ltd | Interested in Distributed Systems, Cloud Computing, Serverless computing, Data Stream Processing and Microservices