Files
Sergio Ammirata cadb6c4aa4 fix(eap): recover and reintegrate a bonded EAP-SRP sender leg after a flap
A bonded caller-sender leg that lost and regained connectivity on the same
source tuple -- an interface flap, or an upstream outage that resumes without
a reconnect -- stayed wedged and never rejoined the bond without an operator
restart.

While the leg is silent the far-end authenticator times out and purges its
session. The leg keeps its miface-bound UDP socket (a flap does not close it)
and stays in EAP SUCCESS, so it keeps streaming data the authenticator now
drops while it waits for an EAPOL START the caller never sends, because EAP is
authenticator-driven and the caller believes it is still authenticated.

- Extend try_caller_socket_rebind to sender-mode callers: a sender leg silent
  past session_timeout resets its EAP context (eap_reset_authenticatee) and
  re-drives the SRP handshake on the existing socket. It does not rebind --
  the miface-bound socket is still valid and rebinding would move the source
  tuple the far end expects.
- Fold the leg back into the weighted bond once it re-authenticates. Recovery
  de-authenticates the leg (so it leaves the sender balancing rotation while
  down) and rewinds eap_authentication_state, so the "EAP Authentication
  succeeded" transition fires again on re-auth and restores the connection-
  level authenticated flag via rist_peer_authenticate. Without this the leg
  re-authenticates but is left out of balancing and carries only NACK
  retransmits instead of its share.
- Only SRP sender legs need this; plaintext/PSK senders have no such deadlock
  and recover via normal reconnect, so they are left untouched.

Reproduced and verified with a bonded advanced-profile ristsender over two
netns/veth legs to an SRP listener: one leg is silenced with 100% packet loss
both ways (tc netem) while its interface stays up, so the socket persists on
the same source tuple. Before the fix the returning leg never re-authenticates
and the listener floods "handshake is still pending"; with re-auth alone it
authenticates but the sender balances over only the surviving leg; after the
full fix the returning leg re-authenticates and resumes carrying its full
weighted share (verified on a restored zero-loss link, matching a plaintext
bond). Added as test/rist/test_bonded_leg_flap_netns.sh (meson "netns" suite),
which asserts both re-authentication and reintegration; it needs Linux root +
netns/tc and cleanly skips (exit 77) otherwise. An in-process loopback test
cannot reproduce the wedge because a loopback leg has no miface binding and
self-heals with a new-port handshake.
2026-07-03 13:10:42 -04:00
..
2023-07-03 17:25:15 +02:00