Skip to content

Conversation

Hugal31
Copy link
Contributor

@Hugal31 Hugal31 commented Jun 26, 2025

Set up a configurable 10s poll interval to:

  • Update the endpoint list (i.e. resolve 0.0.0.0. & [::] endpoints).
  • If changed, send an update to the neighbors.
  • If needed, join new multicast interfaces and restart the scouting routine to scout & listen the new interfaces.

This allows Zenoh to react to a network configuration change (i.e. connections and disconnections).

I tried to use the NETLINK socket in #1824 to be more efficient and reactive but it only worked on Linux and I didn't knew how to make it work with IPv6.

Replaces #1824
Closes #1823

Hugo Laloge added 16 commits April 7, 2025 09:41
On Unix systems, use netlink to detect added or delete IPv4 addresses to:
  * Renew the interface list.
  * Update the node locators.
  * Restart the scouting to use the new addresses/discard the old addresses.

This is still quite hacky, with the following shortcommings:
* We do not handle IPv6 addresses.
* We perform the locator update and the scouting reset even if the
  new/old addresses are not used as per the configuration.
* The code overall is not the best quality.
Remove the scouting reset upon interface change.
This is in preparation of an improvemnt of the scouting update upon interface change.
When updating the locators after a network change, send the new
locators to the routers and the peers if in linkstate mode.
…TLINK

NETLINK is only available on Linux. The poll interval may be less
reactive and efficient, but it is available everywhere. Moreover, this
fix the case where a new locator was not deteced if an interface IP
was added before the interface was UP and RUNNING.
Copy link

PR missing one of the required labels: {'new feature', 'dependencies', 'internal', 'bug', 'documentation', 'breaking-change', 'enhancement'}

@Hugal31 Hugal31 force-pushed the feature/endpoint-auto-update branch from ffbdb8e to 3d8d2b6 Compare June 26, 2025 09:06
Copy link

PR missing one of the required labels: {'documentation', 'enhancement', 'bug', 'new feature', 'breaking-change', 'dependencies', 'internal'}

Copy link

PR missing one of the required labels: {'documentation', 'dependencies', 'internal', 'new feature', 'enhancement', 'bug', 'breaking-change'}

Copy link

codecov bot commented Jun 26, 2025

Codecov Report

❌ Patch coverage is 55.29412% with 190 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.82%. Comparing base (05e20cf) to head (22010b3).

Files with missing lines Patch % Lines
zenoh/src/net/runtime/scouting.rs 57.60% 131 Missing ⚠️
zenoh/src/net/routing/hat/p2p_peer/gossip.rs 0.00% 13 Missing ⚠️
zenoh/src/net/runtime/orchestrator.rs 70.58% 10 Missing ⚠️
zenoh/src/net/protocol/network.rs 0.00% 8 Missing ⚠️
zenoh/src/net/routing/hat/router/mod.rs 0.00% 8 Missing ⚠️
zenoh/src/net/runtime/mod.rs 78.57% 6 Missing ⚠️
zenoh/src/net/routing/hat/linkstate_peer/mod.rs 0.00% 5 Missing ⚠️
zenoh/src/net/routing/hat/p2p_peer/mod.rs 0.00% 5 Missing ⚠️
commons/zenoh-util/src/net/mod.rs 75.00% 3 Missing ⚠️
zenoh/src/net/routing/hat/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2020      +/-   ##
==========================================
- Coverage   70.92%   70.82%   -0.11%     
==========================================
  Files         370      371       +1     
  Lines       62781    63007     +226     
==========================================
+ Hits        44527    44622      +95     
- Misses      18254    18385     +131     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

PR missing one of the required labels: {'breaking-change', 'internal', 'bug', 'dependencies', 'documentation', 'new feature', 'enhancement'}

@diogomatsubara diogomatsubara added the enhancement Existing things could work better label Jul 1, 2025
@@ -384,6 +384,9 @@ validated_struct::validator! {
/// if connection timeout exceed, exit from application
pub exit_on_failure: Option<ModeDependentValue<bool>>,
pub retry: Option<connection_retry::ConnectionRetryModeDependentConf>,
/// Interval in millisecond to check if the listening endpoints changed (e.g. when listening on 0.0.0.0).
/// Also update the multicast scouting listening interfaces. Use -1 to disable.
pub endpoint_poll_interval_ms: Option<i64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason to use i64 for purely positive value, can Option::None be used for disabling ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the idea from timeout_ms, which has the same semantic if I'm not mistaken (-1 = infinite / disabled).

After verification, I disable the poll if the interval is <= 0, not just < 0 , while timeout_ms actually has a timeout of 0 if you set 0.
I don't mind changing

@milyin
Copy link
Contributor

milyin commented Aug 4, 2025

@Hugal31 could you please make any update to make the CI restart. For some reason I don't see a way to restart it from github interface. It's not clear now if CI failure is caused by PR itself or is it some sporadic thing

@Hugal31
Copy link
Contributor Author

Hugal31 commented Sep 1, 2025

I think the macos tests fails only because they are flaky, they fail on main right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Existing things could work better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle the addition or suppression of network interfaces/addresses
4 participants