-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic message duplication during leader transition with idempotent producer #4953
Comments
Hi, I've found an explanation for why the producer epoch is being bumped. It turns out that it is a consequence of a broker's librdkafka/src/rdkafka_request.c Lines 3904 to 3950 in 876cf4a
Below I'm posting an excerpt of captured packets It is important to note that the same Produce requests (partition 12)
|
Description
It is possible to get duplicated messages in a single topic-partition. It happens despite configuring the producer for exactly-once delivery. It occurs sporadicly during the partition leader change, where for some reason the producer epoch is also being incremented. It is not trivial to reproduce, but it is fairly doable. (I can provide repro-steps, but I think it is not helpful as the attached pcap file contains enough details to analyze this issue).
The essential configuration bits for the producer are:
Details
Below is the excerpt from the captured packets demonstrating the problem.
The producer app was writing a sequence of numbers (strictly incrementing) into
my-topic-0188
. Because of the topic-partition leadership change, the values0x0a90
and0x0a91
were written twice. This breaks exactly-once delivery guarantees and can cause data inconsistencies in applications relying on message deduplication.It is unclear why these values were written twice as there is only one successful confirmation of writing them.
What is also interesting, the response from frame 1566024, has a valid offset but also a non-zero error code.
Confluent.Kafka(Librdkafka) version
Produce requests
1560411, 15:37:11.568610, 192.168.65.3:65494->3.70.194.188:40001 - Kafka Produce v10 Request
1564645, 15:37:12.298735, 3.70.194.188:40001->192.168.65.3:65494 - Kafka Produce v10 Response
1561308, 15:37:11.655040, 192.168.65.3:65494->3.70.194.188:40001 - Kafka Produce v10 Request
1566024, 15:37:12.623878, 3.70.194.188:40001->192.168.65.3:65494 - Kafka Produce v10 Response [Not Leader For Partition]
1566591, 15:37:12.769799, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Produce v10 Request
1567506, 15:37:12.925713, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Produce v10 Response [Not Leader For Partition]
1568303, 15:37:13.040103, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Produce v10 Request
1568678, 15:37:13.113566, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Produce v10 Response [Not Leader For Partition]
1569730, 15:37:13.280079, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Produce v10 Request
1570055, 15:37:13.330732, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Produce v10 Response [Not Leader For Partition]
1570954, 15:37:13.507840, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Produce v10 Request
1571293, 15:37:13.567244, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Produce v10 Response [Not Leader For Partition]
Metadata requests
1571548, 15:37:13.608869, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Metadata v12 Request
1571750, 15:37:13.642273, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Metadata v12 Response
Final Produce request (with incremented producer epoch)
1572355, 15:37:13.710099, 192.168.65.3:58732->3.70.194.188:40003 - Kafka Produce v10 Request
1591784, 15:37:15.822886, 3.70.194.188:40003->192.168.65.3:58732 - Kafka Produce v10 Response
OffsetForLeaderEpoch requests
1572938, 15:37:13.770089, 192.168.65.3:58626->3.70.194.188:40003 - Kafka OffsetForLeaderEpoch v2 Request
1573309, 15:37:13.802351, 3.70.194.188:40003->192.168.65.3:58626 - Kafka OffsetForLeaderEpoch v2 Response
1583043, 15:37:14.817771, 192.168.65.3:58626->3.70.194.188:40003 - Kafka OffsetForLeaderEpoch v2 Request
1583379, 15:37:14.848090, 3.70.194.18840003->192.168.65.3:58626 - Kafka OffsetForLeaderEpoch v2 Response
Fetch requests
1586668, 15:37:15.212955, 3.70.194.188:40003->192.168.65.3:58626 - Kafka Fetch v16 Response
1592076, 15:37:15.844683, 3.70.194.188:40003->192.168.65.3:58626 - Kafka Fetch v16 Response
The text was updated successfully, but these errors were encountered: