Install and Configure Apache Kafka on Ubuntu
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Apache Kafka, often known simply as Kafka, is a popular open-source platform for stream management and processing. Kafka is structured around the concept of an event. External agents, independently and asynchronously, send and receive event notifications to and from Kafka. Kafka accepts a continuous stream of events from multiple clients, stores them, and potentially forwards them to a second set of clients for further processing. It is flexible, robust, reliable, self-contained, and offers low latency along with high throughput. LinkedIn originally developed Kafka, but the Apache Software Foundation offers the current open-source iteration.
Before You Begin
- If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with Linode and Creating a Compute Instance guides. 
- Follow our Setting Up and Securing a Compute Instance guide to update your system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access. 
sudo. If you’re not familiar with the sudo command, see the
Linux Users and Groups guide.A Summary of the Apache Kafka Installation Process
A complete Kafka installation consists of the high-level steps listed below. Each step is described in a separate section. These instructions are designed for Ubuntu 24.04 but are generally valid for any Debian-based Linux distribution.
- Install Java
- Download and Install Apache Kafka
- Run Kafka
- Create a Kafka Topic
- Write and Read Kafka Events
- Process Data with Kafka Streams
- Create System Files for Zookeeper and Kafka
Install Java
You must install Java before you can use Apache Kafka. This guide explains how to install OpenJDK, an open-source version of Java.
- Update your Ubuntu packages. - sudo apt update
- Install OpenJDK with - apt.- sudo apt install openjdk-21-jdk
- Confirm you installed the expected version of Java. - java -version- Java returns some basic information about the installation. The information can vary based on the version you have installed. - openjdk 21.0.3 2024-04-16 OpenJDK Runtime Environment (build 21.0.3+9-Ubuntu-1ubuntu1) OpenJDK 64-Bit Server VM (build 21.0.3+9-Ubuntu-1ubuntu1, mixed mode, sharing)
Download and Install Apache Kafka
Tar archives for Apache Kafka can be downloaded directly from the Apache Site and installed with the process outlined in this section. The name of the Kafka download varies based on the release version. Substitute the name of your own file wherever you see kafka_2.13-3.7.0.tgz.
- Navigate to the Apache Kafka Downloads page and choose the Kafka release you want. We recommend choosing the latest version, which is currently Apache Kafka 2.7. This link takes you to a landing page where you can use either HTTP or FTP to download the tar file. 
- If you downloaded the software onto a different computer than the host, transfer the Apache Kafka files to the host via - scp,- ftp, or another file transfer method. Replace the- userand- yourhostvalues with your user name and host IP address:- scp /localpath/kafka_2.13-3.7.0.tgz user@192.0.2.0:~/- Note If the transfer is blocked, verify your firewall is not blocking the connection. Execute- sudo ufw allow 22/tcpto allow- ufwto allow- scptransfers.
- Optional: You can confirm you downloaded the file correctly with a SHA512 checksum. You can find the checksum file on the Apache Kafka Downloads page. Each release includes a link to a corresponding - sha512file. Download this file and transfer it to your Kafka host using- scp. Place the checksum file in the same directory as your tar file. Execute the following command to generate a checksum for the tar file:- gpg --print-md SHA512 kafka_2.13-3.7.0.tgz- Compare the output from this command against the contents of the - SHA512file. The two checksums should match. This step does not confirm the authenticity of the file, only its validity. The checksum output has the following format:- kafka_2.13-3.7.0.tgz: F3DD1FD8 8766D915 0D3D395B 285BFA75 F5B89A83 58223814 90C8428E 6E568889 054DDB5F ADA1EB63 613A6441 989151BC 7C7D6CDE 16A871C6 674B909C 4EDD4E28
- For extra security, confirm the file is signed. Download the - .ascfile and the signing keys associated with the release. You can find these files on the Apache Kafka Downloads page. The link to the- KEYSfile is located at the top of the page. Each release includes a link to its- ascfile. Download these files and transfer them to your Kafka host using- scp. Place these files in the same directory as your tar file.- Import the keys from the - KEYSfile. This installs the entire key set.- gpg --import KEYS
- Use - gpgto verify the signature.- gpg --verify kafka_2.13-3.7.0.tgz.asc kafka_2.13-3.7.0.tgz
- The output should list the actual RSA key and the person who signed it. - gpg: Signature made Wed Dec 16 14:03:36 2020 UTC gpg: using RSA key DFB5ABA9CD50A02B5C2A511662A9813636302260 gpg: issuer "bbejeck@apache.org" gpg: Good signature from "Bill Bejeck (CODE SIGNING KEY) <bbejeck@apache.org>" [unknown]- Note - Gpgmight warn you the “key is not certified with a trusted signature”. Unfortunately, there is no easy way to confirm the authenticity of the signer, and for most deployments, this is not necessary. For unqualified authentication for high-security deployments, follow the steps for Validating Authenticity of a Key on the Apache Kafka Authentication page.
 
- Extract the files with the - tarutility. After the extraction process is complete, either delete the archive or store it in a secure place elsewhere on your system.- tar -zxvf kafka_2.13-3.7.0.tgz
- Optional: Create a new centralized directory for Kafka and move the extracted files to this new Kafka home directory. - sudo mkdir /home/kafka sudo mv kafka_2.13-3.7.0 /home/kafka
Run Kafka
Kafka can be launched directly from the command line. You must launch the Zookeeper module before running Kafka.
- Review the settings contained in the - kafka_2.13-3.7.0/config/server.propertiesfile within your Kafka directory. For now, the default settings are fine. But we recommend you set the- delete.topic.enableattribute to- trueat the end of the file. This allows you to delete any topics you might create during testing.- File: /home/kafka/kafka_2.13-3.7.0/config/server.properties
- 1- delete.topic.enable = true
 
- Change to the Kafka home directory and start Zookeeper. - cd /home/kafka/kafka_2.13-3.7.0/ bin/zookeeper-server-start.sh config/zookeeper.properties- Note Leave all settings in- Zookeeper.propertiesat the defaults for most deployments.
- Open a new console session and launch Kafka. - cd /home/kafka/kafka_2.13-3.7.0/ bin/kafka-server-start.sh config/server.properties
Create a Kafka Topic
Before you can send any events to Kafka, you must create a topic to contain the events. An explanation of topics can be found in Linode’s Introduction to Kafka.
- Open a new console session. 
- Change the directory to your Kafka directory and create a new topic named - test-events:- cd /home/kafka/kafka_2.13-3.7.0/ bin/kafka-topics.sh --create --topic test-events --bootstrap-server localhost:9092- Kafka confirms the topic has been created: - Created topic test-events.
- Generate a list of all the topics on the cluster with the - --listoption:- bin/kafka-topics.sh --list --bootstrap-server localhost:9092- You should see - test-eventslisted in the output:- test-events
- Use the - describeflag to display all information about the new topic:- bin/kafka-topics.sh --describe --topic test-events --bootstrap-server localhost:9092- Kafka returns a summary of the topic, including the number of partitions and the replication factor: - Topic: test-events TopicId: URC3EPiqTUW2fBkJuW5AYQ PartitionCount: 1 ReplicationFactor: 1 Configs: Topic: test-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Writing and Reading Kafka Events
Kafka’s command-line interface allows you to quickly test out the new topic. Use the API to create a Producer and write some events into the topic. Then, create a consumer and read the events you wrote.
- Open a new console session for the producer and change the directory to the Kafka root directory. - cd /home/kafka/kafka_2.13-3.7.0/
- Configure a producer and specify a topic for its events. You are not creating any events yet, only a client with the ability to send events. Kafka returns a prompt - >indicating the producer is ready.- bin/kafka-console-producer.sh --topic test-events --bootstrap-server localhost:9092
- Send a few key-value pairs to Kafka. Separate the keys and values with a - :. You can choose to write messages with different keys or with the same key. If you do not specify a key, and only specify a value, the event is assigned a NULL key.- key1: This is event 1 key2: This is event 2 key1: This is event 3
- Open a new console session to run the consumer and change the directory to the root Kafka directory. - cd /home/kafka/kafka_2.13-3.7.0/
- Create the consumer, specifying the - test-eventstopic it should read from. The- --from-beginningflag indicates it should read all events starting from the beginning of the topic.- bin/kafka-console-consumer.sh --topic test-events --from-beginning --bootstrap-server localhost:9092- Note - Kafka’s Consumer API provides options to format the incoming events. Run the following command to view the full list. - bin/kafka-console-consumer.sh
- The consumer immediately polls Kafka for any outstanding events in the topic and displays them onscreen. You should be able to see all the events you sent earlier. - key1: This is event 1 key2: This is event 2 key1: This is event 3
- Return to the producer console (the producer should still be running) and generate another new event. - key2: This is event 4
- The event immediately appears in the consumer console. - key2: This is event 4
- Stop the producer or consumer anytime you like with a - Ctrl-Ccommand.
Process Data with Kafka Streams
Kafka Streams is a library for performing real-time transformations and analysis on a stream. A Kafka Streams application typically acts as both a consumer and a producer. It polls a topic for new events, processes the data, and transmits its output as events to a second topic. Other applications are consumers of this second topic. Kafka Streams is explained in Linode’s Introduction to Apache Kafka.
You can use the WordCountDemo Java application included with Kafka Streams to run a quick demo. WordCountDemo consumes streams-plaintext-input events. It parses and processes the lines, and stores the words and counts in a table. The updated word counts are converted to a stream of events and sent to the streams-plaintext-input topic. The entire file is included below.
- File: WordCountDemo.java
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24- // Serializers/deserializers (serde) for String and Long types final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); // Construct a `KStream` from the input topic "streams-plaintext-input", where message values // represent lines of text (for the sake of this example, we ignore whatever may be stored // in the message keys). KStream<String, String> textLines = builder.stream( "streams-plaintext-input", Consumed.with(stringSerde, stringSerde) ); KTable<String, Long> wordCounts = textLines // Split each text line, by whitespace, into words. .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+"))) // Group the text words as message keys .groupBy((key, value) -> value) // Count the occurrences of each word (message key). .count(); // Store the running counts as a changelog stream to the output topic. wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
- Create a topic on the Kafka cluster to store the sample word count data. - cd /home/kafka/kafka_2.13-3.7.0/ bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic streams-plaintext-input- Kafka confirms it has created the topic: - Created topic streams-plaintext-input.
- Create a second topic to store the output of the Kafka Streams application. Set the cleanup policy to compact entries, so only the updated word counts are stored. - bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic streams-wordcount-output --config cleanup.policy=compact- Kafka again confirms it has created the topic: - Created topic streams-wordcount-output.
- Run the - WordCountDemoapplication.- bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo
- Launch a producer to send test data to the WordCountDemo stream as - streams-plaintext-inputevents.- cd /home/kafka/kafka_2.13-3.7.0/ bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
- Create a consumer to listen to the - streams-wordcount-outputstream. This stream contains the updated results of the- WordCountDemoapplication. Set the formatting properties as follows to create more legible output.- cd /home/kafka/kafka_2.13-3.7.0/ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic streams-wordcount-output --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property print.value=true --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
- Enter some test data at the producer prompt. - This is not the end
- Verify the word counts are displayed in the consumer window. - this 1 is 1 not 1 the 1 end 1
- Use the producer to write more test input. - The end of the line
- Review the new output from the consumer. Notice how the word counts have been updated. - the 2 end 2 of 1 the 3 line 1
- When you are finished with the demo, use - Ctrl-Cto stop the producer, the consumer, and the WordCountDemo application.
Create System Files for Zookeeper and Kafka
Until now, you have been starting Zookeeper and Kafka from the command line inside the Kafka directory. This is perfectly acceptable, but it is much easier to create entries for them inside /etc/systemd/system/ and start them with systemctl enable.
- Create a system file for Zookeeper called - /etc/systemd/system/zookeeper.service.- sudo nano /etc/systemd/system/zookeeper.service
- Edit the file and add the following information. Use the location of your Kafka directory in the path names. - File: /etc/systemd/system/zookeeper.service
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14- [Unit] Description=Apache Zookeeper Server Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple ExecStart=/home/kafka/kafka_2.13-3.7.0/bin/zookeeper-server-start.sh /home/kafka/kafka_2.13-3.7.0/config/zookeeper.properties ExecStop=/home/kafka/kafka_2.13-3.7.0/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
 
- Create a second file for the Kafka server called - /etc/systemd/system/kafka.service.- sudo nano /etc/systemd/system/kafka.service
- Edit the file and add the following information. Verify the full path to your Java application and enter it as the - JAVA_HOMEpath.- File: /etc/systemd/system/kafka.service
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15- [Unit] Description=Apache Kafka Server Requires=zookeeper.service After=zookeeper.service [Service] Type=simple Environment="JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64" ExecStart=/home/kafka/kafka_2.13-3.7.0/bin/kafka-server-start.sh /home/kafka/kafka_2.13-3.7.0/config/server.properties ExecStop=/home/kafka/kafka_2.13-3.7.0/bin/kafka-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
 
- Reload the - systemddaemon and start both applications.- sudo systemctl daemon-reload sudo systemctl enable --now zookeeper sudo systemctl enable --now kafka
- Confirm both Kafka and the Zookeeper are running as expected. Verify the status of both processes with - systemctl status.- sudo systemctl status kafka zookeeper- The entries should both show as active. - kafka.service - Apache Kafka Server Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2021-01-21 15:13:45 UTC; 4s ago ...
Shut Down the Kafka Environment
When you are finished with Kafka, we recommend you gracefully shut down all components and delete all unnecessary logs.
- Shut down any Kafka consumers and producers and any Kafka Streams applications with a - ctrl-Ccommand.
- Shut down Kafka and then Zookeeper with - systemctl stopcommands. If you did not register your Kafka application with the- systemddaemon, shut them down with a- Ctrl-Ccommand.- sudo systemctl stop kafka sudo systemctl stop zookeeper
- Clean up any test data with the following command: - sudo rm -rf /tmp/kafka-logs /tmp/zookeeper
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on