Apache Kafka on Kubernetes with Strimzi — Part 1: Creating and Deploying a Strimzi Kafka Cluster


What is Strimzi?

  • Kafka. A cluster of Kafka broker nodes
  • ZooKeeper. Storing configuration data and cluster coordination
  • Kafka Connect. An integration toolkit for streaming data between Kafka brokers and external systems using Connector (source and sink) plugins. (Also supports Source2Image)
  • Kafka MirrorMaker. Replicating data between two Kafka clusters, within or across data centers.
  • Kafka Bridge. Providing a RESTful interface for integrating HTTP-based clients with a Kafka cluster without the need for client applications to understand the Kafka protocol
  • Kafka Exporter. Extracting data for analysis as Prometheus metrics like offsets, consumer groups, consumer lag, topics and…

Strimzi Operators

  • Cluster Operator. Deploys and manages Apache Kafka clusters, Kafka Connect, Kafka MirrorMaker, Kafka Bridge, Kafka Exporter, and the Entity Operator
  • Entity Operator. Comprises the Topic Operator and User Operator
  • Topic Operator. Manages Kafka topics.
  • User Operator. Manages Kafka users.
Operators within the Strimzi architecture (source)

Deploying Kafka using Strimzi

  1. Deploying the Cluster Operator to manage our Kafka cluster
  2. Deploying the Kafka cluster with ZooKeeper using the Cluster Operator. Topic and User Operators can be deployed in this step with the same deploy file or you can deploy them later.
  3. Now you can deploy other components as you like (Optional):
  • Topic and User Operators
  • Kafka Connect
  • Kafka Bridge
  • Kafka MirrorMaker
  • Kafka Exporter and monitoring metrics

Deploying the Cluster Operator

$ kubectl create ns kafka
$ sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
$ sed -i '' 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

Single Namespace

Multiple Namespaces

# ...
value: kafka-cluster-1,kafka-cluster-2,kafka-cluster-3
# ...
$ kubectl apply -f install/cluster-operator/020-RoleBinding-strimzi-cluster-operator.yaml -n watched-namespace$ kubectl apply -f install/cluster-operator/031-RoleBinding-strimzi-cluster-operator-entity-operator-delegation.yaml -n watched-namespace$ kubectl apply -f install/cluster-operator/032-RoleBinding-strimzi-cluster-operator-topic-operator-delegation.yaml -n watched-namespace

All Namespaces

# ...
value: "*"
# ...
$ kubectl create clusterrolebinding strimzi-cluster-operator-namespaced --clusterrole=strimzi-cluster-operator-namespaced --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator

$ kubectl create clusterrolebinding strimzi-cluster-operator-entity-operator-delegation --clusterrole=strimzi-entity-operator --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator

$ kubectl create clusterrolebinding strimzi-cluster-operator-topic-operator-delegation --clusterrole=strimzi-topic-operator --serviceaccount my-cluster-operator-ns:strimzi-cluster-operator
$ kubectl apply -f install/cluster-operator -n kafka
$ kubectl get deployments -n kafkaNAME                         READY   UP-TO-DATE   AVAILABLE   AGE
strimzi-cluster-operator 1/1 1 1 40m

Deploying the Kafka Cluster

  • Ephemeral Cluster. An ephemeral cluster is a Cluster with temporary storage which is suitable for development and testing. This deployment uses volumes for storing broker information (for ZooKeeper) and topics or partitions (for Kafka). So all the data will be removed once the Pod goes down.
  • Persistent Cluster. Uses PersistentVolumes to store ZooKeeper and Kafka data. The PersistentVolume is claimed using a PersistentVolumeClaim to make it independent of the actual type of the PersistentVolume. Also, the PersistentVolumeClaim can use a StorageClass to trigger automatic volume provisioning.
Simple Kafka deployment YAML for Strimzi
  1. Name of your Kafka cluster.
  2. The Kafka version. In case of upgrading, checkout the Upgrading Procedure from the Strimzi documentation
  3. Configuring the number of the Kafka broker nodes
  4. Setting container resource constraints.
    A request is the amount of the resource that the system will guarantee. Kubernetes decides on which node to put the Pod based on the request values.
    A limit is the maximum amount of resources that the container is allowed to use. If the request is not set, it defaults to limit. And when the limit is not set, it defaults to zero (unbounded).
  5. Specifies the minimum (-Xms) and maximum (-Xmx) heap allocation for the JVM
  6. Listeners configure how clients connect to a Kafka cluster. Multiple listeners can be configured by specifying unique name and port for each listener.
    Two types of listeners are currently supported which are Internal Listener (for accessing the cluster within the Kubernetes) and External Listener (for accessing the cluster from outside of the Kubernetes). TLS encryption can also be enabled for listeners.
    Internal listeners are specified using an internal type. And for external types, these values can be used:
    route. to use OpenShift routes and the default HAProxy router
    loadbalancer. to use loadbalancer services
    nodeport. to use ports on Kubernetes nodes (external access)
    ingress. to use Kubernetes Ingress and the NGINX Ingress Controller for Kubernetes.
  7. You can specify and configure all of the options in the “Broker Configs” section of the Apache Kafka documentation apart from those managed directly by Strimzi. Visit the Strimzi documentation for the forbidden configs.
  8. Storage is configured as ephemeral, persistent-claim or jbod.
    Ephemeral. As we have discussed previously, an Ephemeral cluster uses an emptyDir volume. The data stored in this volume will be lost after the Pod goes down. So this type of storage is only suitable for development and testing. When using clusters with multiple ZooKeeper nodes and replication factor higher than one, when a Pod restarts, it can recover data from other nodes.
    Persistent Claim. Uses Persistent Volume Claims to provision persistent volumes for storing data. a StorageClass can also be set to use for dynamic volume provisioning. Also, we can use a selector for selecting a specific persistent volume to use. It contains key:value pairs representing labels for selecting such a volume. Check out the documentation for more details. In Minikube, a default StorageClass with the name “standard” has been configured automatically and we can use it like above.
    JBOD. By using jbod type, we can specify multiple disks or volumes (can be either ephemeral or persistent) for our Kafka cluster.
  9. Loggers and log levels can be specified easily with this config.
  10. ZooKeeper configurations can be customized easily. Most of the configurations are similar to the cluster configs. Some options (like Security, Listeners, etc.) cannot be customized since the Strimzi itself is managing them.
  11. The Entity Operator is responsible for managing Kafka-related entities in a running Kafka cluster. It supports several sub-properties:
    tlsSidecar. Contains the configuration of the TLS sidecar container, which is used to communicate with ZooKeeper.
    topicOperator. contains the configuration of the Topic Operator. When this option is missing, the Entity Operator is deployed without the Topic Operator. If an empty object ({}) is used, all properties use their default values.
    userOperator. contains the configuration of the User Operator. When this option is missing, the Entity Operator is deployed without the User Operator. If an empty object ({}) is used, all properties use their default values.
    template. contains the configuration of the Entity Operator pod, such as labels, annotations, affinity, and tolerations
$ kubectl apply -f kafka-deployment.yaml -n kafka
$ kubectl get deployments -n kafka

Managing Topics

Creating a Topic in Strimzi Kafka Cluster

Testing our Kafka Cluster

  • kafka-console-producer for producing messages
  • kafka-console-consumer for consuming messages.
  1. Get the DNS name of the Kafka Cluster service in order to connect to it:
$ kubectl get svc -n kafka
Kafka Cluster service name (address)
$ kubectl run kafka-producer -ti --image=strimzi/kafka:0.20.0-rc1-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap.kafka:9092 --topic my-topic
$ kubectl run kafka-consumer -ti --image=strimzi/kafka:0.20.0-rc1-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic my-topic --from-beginning
  1. Download the latest binary release of Kafka from here.
  2. Get the IP of the minikube by running minikube ip command (mine is
  3. Run kubectl get svc -n kafka and find the exposed port of the kafka-external-bootstrap (highlighted with blue in the picture above)
  4. The kafka-console-producer and kafka-console-consumer are in the /bin directory of the downloaded package (for Windows, navigate to /bin/windows)
  5. Like above, fire up your console producer and consumer with the below commands (Windows commands are the same) and test your cluster from outside of the Kubernetes:
$ kafka-console-producer.sh --broker-list --topic my-topic$ kafka-console-consumer.sh --bootstrap-server --topic my-topic --from-beginning





CTO at Daneshgar Technology Co. Ltd | Interested in Distributed Systems, Cloud Computing, Serverless computing, Data Stream Processing and Microservices

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

10 Top Visual Studio Code Extensions in 2020

The Incredible Journey of A Middleware Administrator To DevOps Engineer

I am 14 years old.

Where Can I Get a Raspberry Pi and Additional Hardware?

Yes, but how do QR codes actually work?

Python scripting and IT Automation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sina Nourian

Sina Nourian

CTO at Daneshgar Technology Co. Ltd | Interested in Distributed Systems, Cloud Computing, Serverless computing, Data Stream Processing and Microservices

More from Medium

Using Search Template — ElasticSearch

How to Connect Elastic Sink Connector with Kafka

How to deploy Scalar DB Server on Kubernetes

Event Streaming Platform

event streaming platform