Skip to content

Tags: linkedin/kafka

Tags

2.4.1.82

Toggle 2.4.1.82's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Allows more than one connecting nodes when AT_LEAST_THREE was used fo…

…r least loaded node algorithm and connecting/connected nodes count is less than 3. (#533)

3.0.1.882

Toggle 3.0.1.882's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[LI-HOTFIX] Ensure ReplicationFactor is at least (minIsr + Redundancy…

…Factor + 1) in topic creation (#532)

This PR is for "[LIKAFKA-65517] Improve the Kafka broker controlled shutdown".

If a topic has replication factor (RF) that is smaller than (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1), the controlled shutdown would be stuck due to error NOT_ENOUGH_REPLICAS. e.g., a topic with RF=3 and minISR=2 with controlledShutdownSafetyCheckRedundancyFactor=1 , it would cause broker stuck in controlled shutdown.

This PR auto updates RF to be at least (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1) during topic creation if replica assigments are not provided; if replica assignments are provided, it would throw error (topic creation will be failed) if the replica set size is smaller than (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1). Also updated log level from "info" to "warn" if controlled shutdown could not proceed due to NOT_ENOUGH_REPLICAS.

Added unittests and fixed breaking unittests due to this change.

3.0.1.80

Toggle 3.0.1.80's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[LI-HOTFIX] Ensure ReplicationFactor is at least (minIsr + Redundancy…

…Factor + 1) in topic creation (#532)

This PR is for "[LIKAFKA-65517] Improve the Kafka broker controlled shutdown".

If a topic has replication factor (RF) that is smaller than (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1), the controlled shutdown would be stuck due to error NOT_ENOUGH_REPLICAS. e.g., a topic with RF=3 and minISR=2 with controlledShutdownSafetyCheckRedundancyFactor=1 , it would cause broker stuck in controlled shutdown.

This PR auto updates RF to be at least (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1) during topic creation if replica assigments are not provided; if replica assignments are provided, it would throw error (topic creation will be failed) if the replica set size is smaller than (minIsr + controlledShutdownSafetyCheckRedundancyFactor + 1). Also updated log level from "info" to "warn" if controlled shutdown could not proceed due to NOT_ENOUGH_REPLICAS.

Added unittests and fixed breaking unittests due to this change.

2.4.1.81

Toggle 2.4.1.81's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[LI-HOTFIX] throw ConfigException when resolved addresses is empty (#527

)

* throw RetriableKafkaClientConstructionException when resolved addresses is empty

* adds RetriableKafkaClientConstructionException for Retriable exception

* misc

---------

Co-authored-by: Qi Liu <qliu2+LinkedIn@linkedin.com>

3.0.1.79

Toggle 3.0.1.79's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Catch exceptions for LogDirFailureHandler thread during handleLogDirF… (

#530)

LITICKET=ACTIONITEM-5825

-Description

We recently found an issue that Kafka didn't shutdown itself when there is bad disk issue detected. The current assumption is Kafka detected the issue but the LogDirFailureHandler thread was not working correctly. Unfortunately we don't have enough logs or heapdump to identify what exactly happens, so our best bet is to catch the exceptions for that thread to avoid it happens in the future. We also add a new metric to notify when such exception is thrown, which usually indicate some critical issues.

A more aggressive strategy is to exit the program whenever detecting offline log dir, which might make sense for LinkedIn setup. I don't see huge risk of doing it, but we may still want to start from being a bit conservative. So I decided to add metrics to indicate when a server detects failed disks, so at least our monitoring systems could detect it and potentially taking further actions.

3.0.1.78

Toggle 3.0.1.78's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixed logging order issue in GroupCoordinator (#526)

Currently the GroupCoordinator always log "Preparing to rebalance group XXX in state PreparingRebalance", because it sets the state to PreparingRebalance right before log. This makes the previous state of the group being lost at the logging moment. We need to log first, and then change the state.

2.4.1.80

Toggle 2.4.1.80's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[LI-HOTFIX] Reduce exception log spam in producer IO thread (#525)

* Reduce exception log spam in producer IO thread

* Address comments

* 1. fix unit test
2. fix typo

---------

Co-authored-by: Qi Liu <qliu2+LinkedIn@linkedin.com>

3.0.1.77

Toggle 3.0.1.77's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Change log level for client quota metric to debug (#524)

2.4.1.79

Toggle 2.4.1.79's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ACTIONITEM-3738: Improve the log message for the "There are no nodes …

…in the Kafka cluster" error. (#523)

* ACTIONITEM-3738: Improve the log message for the "There are no nodes in the Kafka cluster" error.

* Improve error msg.

* More improvements.

---------

Co-authored-by: yuhayang@linkedin.com <alias@linkedin.com>

2.4.1.78

Toggle 2.4.1.78's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
checking that newNodes size is not 0 before passing it into random nu…

…mber generator (#522)

* checking that newNodes size is not 0 before passing it into random number generator