A bonded caller-sender leg that lost and regained connectivity on the same
source tuple -- an interface flap, or an upstream outage that resumes without
a reconnect -- stayed wedged and never rejoined the bond without an operator
restart.
While the leg is silent the far-end authenticator times out and purges its
session. The leg keeps its miface-bound UDP socket (a flap does not close it)
and stays in EAP SUCCESS, so it keeps streaming data the authenticator now
drops while it waits for an EAPOL START the caller never sends, because EAP is
authenticator-driven and the caller believes it is still authenticated.
- Extend try_caller_socket_rebind to sender-mode callers: a sender leg silent
past session_timeout resets its EAP context (eap_reset_authenticatee) and
re-drives the SRP handshake on the existing socket. It does not rebind --
the miface-bound socket is still valid and rebinding would move the source
tuple the far end expects.
- Fold the leg back into the weighted bond once it re-authenticates. Recovery
de-authenticates the leg (so it leaves the sender balancing rotation while
down) and rewinds eap_authentication_state, so the "EAP Authentication
succeeded" transition fires again on re-auth and restores the connection-
level authenticated flag via rist_peer_authenticate. Without this the leg
re-authenticates but is left out of balancing and carries only NACK
retransmits instead of its share.
- Only SRP sender legs need this; plaintext/PSK senders have no such deadlock
and recover via normal reconnect, so they are left untouched.
Reproduced and verified with a bonded advanced-profile ristsender over two
netns/veth legs to an SRP listener: one leg is silenced with 100% packet loss
both ways (tc netem) while its interface stays up, so the socket persists on
the same source tuple. Before the fix the returning leg never re-authenticates
and the listener floods "handshake is still pending"; with re-auth alone it
authenticates but the sender balances over only the surviving leg; after the
full fix the returning leg re-authenticates and resumes carrying its full
weighted share (verified on a restored zero-loss link, matching a plaintext
bond). Added as test/rist/test_bonded_leg_flap_netns.sh (meson "netns" suite),
which asserts both re-authentication and reintegration; it needs Linux root +
netns/tc and cleanly skips (exit 77) otherwise. An in-process loopback test
cannot reproduce the wedge because a loopback leg has no miface binding and
self-heals with a new-port handshake.
Reported by the Fable 5 security audit.
parse_url_udp_options() extracted the URL scheme into a fixed 16-byte
prefix[] buffer using a copy length derived from the offset of the first
'/'. Two inputs corrupted memory:
- A URL beginning with '/' set prefix_len to 0, so the strncpy length
prefix_len-1 underflowed the uint32_t to 0xFFFFFFFF. strncpy copied
the short source and then NUL-padded the destination toward 4 GB,
writing far past the 16-byte buffer.
- A scheme longer than 15 characters left the strncpy length clamped to
15 but still wrote the terminator at prefix[prefix_len], an
out-of-bounds one-byte write at an attacker-influenced index.
Both are reachable through the public rist_parse_udp_address() /
rist_parse_udp_address2() entry points, which the ristsender, ristreceiver
and udp2udp tools call directly on user-supplied -i/-o UDP URLs.
Clamp both the copy length and the terminator index to the buffer size,
preserving the existing behavior of dropping the trailing ':' from the
scheme. Add test/rist/unit/test_url_udp_prefix_parse, which exercises the
normal udp/rtp schemes plus the two malformed inputs and fails under
AddressSanitizer against the old code.
Also replace the unbounded sprintf that builds the Simple-profile RTCP
peer address with snprintf, so a long input URL cannot overflow the
peer-config address buffer.
An EAP-SRP caller (e.g. a caller-mode receiver behind NAT or on a single
UDP port) did not recover when its listener restarted. SRP callers were
excluded from the silent caller socket rebind, so once the listener dropped
all session state the caller stayed authenticated on a dead socket and never
re-handshook; recovery required an operator restart.
- Include SRP callers in try_caller_socket_rebind and reset their EAP context
(eap_reset_authenticatee) so they re-run the SRP handshake on the fresh
socket.
- Bound the supplicant EAPOL START retransmit: an authenticatee stuck in
UNAUTH against a silent authenticator retransmits START on a timer, capped
by EAP_AUTH_TIMEOUT_RETRY_MAX. Inbound EAP re-arms the timer and
authentication success disarms it, so a live handshake is never disrupted.
This covers a restarted listener that never saw the first START.
- Reset rebind_attempts/last_rebind_time on successful (re)authentication so a
later outage retries promptly instead of after an ever-growing backoff.
- Log a warning if the immediate START send fails (the periodic path still
retries).
test/rist/test_caller_socket_rebind (plaintext, psk, srp) kills the sender,
waits past session_timeout, restarts it on the same port, and asserts the
caller rebinds to a new local port, the backoff resets to 0, and real payload
flows end to end after recovery.
recovery-depth N sizes the Advanced retransmission ring to UINT16_SIZE << N
packets. RIST_RECOVERY_DEPTH_MAX is 16, i.e. 2^32 packets - the full 32-bit
sequence space. The ring is two parallel arrays (one rist_buffer* and one
uint32_t per slot), so 2^32 slots are only addressable when size_t is 64 bits.
On a 32-bit size_t the slot count overflows, rist_recovery_depth_apply()
correctly refused it (-2), and rist_recovery_depth_set() returned failure for
the documented maximum. The recovery_depth unit test, which expects set(200)
to clamp to the maximum and succeed, therefore failed on 32-bit targets; its
own DEPTH_PKTS(MAX) expectation also overflowed to 0.
Add rist_recovery_depth_platform_max() in rist-private.h: the largest exponent
whose ring (UINT16_SIZE << depth) fits SIZE_MAX / (sizeof(ptr) + sizeof(u32)).
On LP64 this is RIST_RECOVERY_DEPTH_MAX; on a 32-bit size_t it resolves lower
(depth 12). rist_recovery_depth_to_packets() now clamps to it - covering both
the public setter and the ?recovery-depth= / config apply path - so the result
is always representable. The setter logs when a request is capped. The unit
test uses the same shared helper for its expectation, so it tracks whatever
the platform supports.
apply()'s representability/OOM guard is kept as a safety net. No change on
64-bit, where the maximum is unchanged.
Fixes#222.
receiver_output() resets the flow when more than 100 consecutive packets
arrive past twice the recovery window (delay_rtc > 2 * recovery_buffer_ticks).
The reset logged "Too many old packets, resetting buffer", set
receiver_queue_has_items = false and returned, but it never cleared
f->too_late_ctr. The counter therefore stayed latched above 100.
The data-output thread calls receiver_output() whenever the queue is
non-empty, with no has_items gate, so on the next cycle the first still-old
packet immediately tripped the > 100 guard again: log, return, repeat. The
flow only recovers when a non-retry packet hits the re-anchor path in
receiver_enqueue(), but a packet that arrives while has_items is false and is
a retry is dropped before it can re-anchor. On a degraded, heavily
retransmitting link (e.g. bonded) retries dominate, so the flow stays in the
latched state and the message repeats every output cycle - the flood the
report describes.
Clear f->too_late_ctr when the reset fires. The stale backlog then drains
through the existing drop path (goto next) instead of returning without
progress, and a fresh run of 100 too-late packets is required before the next
reset is logged. No behavioural change on a healthy flow, where the counter
is already reset on every released packet.
Fixes#223.
udpsocket_set_optimal_buffer_size()/_send_size() requested a fixed 1 MB
SO_RCVBUF/SO_SNDBUF and, on failure, fell back to ~200 KB. They never used
the headroom a large net.core.{r,w}mem_max provides. That headroom is what
lets a listener absorb a burst of simultaneous connections: a single
protocol thread drains one UDP socket, so when many peers establish at once
the inbound control/handshake traffic can momentarily exceed 1 MB and the
kernel silently drops the overflow, stalling the peers whose handshake
packets were lost.
Measured on a 16-core x86 listener (net.core.rmem_max = 16 MB) with 200
peers dialing in within <1 s: at the 1 MB buffer the kernel dropped 924
datagrams at the socket (Udp: RcvbufErrors) and only ~158/200 peers
completed authentication; the protocol thread sat at 9% of one core the
whole time, so it was never the bottleneck. Rebuilt with the buffer sized
to the OS maximum, the same storm dropped 0 datagrams and all peers
connected (256/256 at the 256-flow cap).
Request UDPSOCKET_SOCK_BUFSIZE_MAX (8 MB) and let the kernel clamp to
*mem_max, keeping whatever we obtain as long as it is at least the historical
1 MB floor; otherwise fall back to the ~200 KB last resort and warning
exactly as before. No API/ABI change, no new configuration, and hosts with a
small *mem_max are unaffected (they still settle at the same fallback). Also
correct the send-buffer error message to reference wmem_max instead of
rmem_max.
rist_stats_callback_set() updated only the common stats interval, but the
sender loop reads the sender context's own stats_report_time, which stayed 0.
The callback then fired every loop iteration and each report covered a single
pass, so packets_sent/packets_received showed tiny deltas (0-5). Set the
sender interval here too, matching rist_sender_stats_callback_set().
Also harden the sender loop: read the interval fresh each pass (so a callback
registered after the loop starts takes effect) and skip reporting when it is
0, so a stats-disabled sender no longer builds a stats object every iteration.
Affected risttunnel's sender stats; the bundled Prometheus path (ristsender)
uses the legacy setter and was unaffected.
Thanks to laur fb for the precise diagnosis and the patch suggestion.
The listener cname re-association calls eap_is_authenticated(), whose
declaration lives under HAVE_SRP_SUPPORT, so the build broke when SRP
support is disabled. Guard the call. Without SRP the re-association
refuses: the cname is not a per-peer secret, so a peer must not be
migrated to a new source tuple without an authenticated session.
Thanks to Gyan Doshi for the report.
Adding --blind-send to the help text pushed the single usage string
literal past the 4095-character limit C99 requires a compiler to
support, so a -Wpedantic -std=c99 -Werror build failed with
-Woverlength-strings. Split the help text into two literals printed
back to back; the output is unchanged.
Thanks to Florian Ernst for the report.
When an Advanced flow upgrades from Main to Advanced wire framing
mid-stream, the two framings carry source_time in different timestamp
domains, so the switch resets the flow timing baseline. That reset
re-derived time_offset but left the clock-drift sample buffer full of
stale Main-domain samples. The next median recalculation then jumped
the offset by several seconds and released the whole receiver buffer at
once, overflowing the data-out fifo. Because test_send_receive treats
any ERROR-level log as fatal at 0% loss, the single "Rist data out fifo
queue overflow" line failed the advanced+unicast+client tests, most
reproducibly on slower build hosts.
Clear the drift samples on the baseline reset, matching the existing
clock-wrap reset path.
Thanks to Florian Ernst for the detailed report and build logs.
Three fixes had landed without a changelog entry: the risttunnel SRP
credential presentation on the sender/client legs, the rist_oob_dequeue
use-after-free guard, and the rist_receiver_data_block_free2() NULL
guard. Every commit since the 0.2.18 release is now covered.
An Advanced-profile sender indexes its retransmit buffer by the 32-bit
advanced sequence (seq_index, keyed by seq & (queue_max - 1)). Under
TR-06-3 an Advanced device starts every peer in Main and only frames
Advanced once that peer advertises I=1, so a peer that stays in Main
requests retransmits with a 16-bit RTP sequence and nack_seq_msb = 0.
The advanced and RTP sequence counters are independent, so the Main
peer's NACK never resolved against the 32-bit index: the lookup found
the wrong (or no) buffer slot, the seq-number validation rejected it,
and nothing was retransmitted (recovered packet count stayed 0).
Advanced<->Advanced sessions were unaffected.
Keep a parallel 16-bit RTP retransmit index on Advanced senders,
populated alongside seq_index as packets are sent, and resolve,
validate, and frame a downgraded-Main peer's retransmits in the RTP
domain. The new path is gated by rist_retx_use_rtp_domain(), which is
true only for an Advanced context serving a peer that has not negotiated
up to Advanced, so the Advanced<->Advanced lookup, validation, and wire
framing are byte-for-byte unchanged. Main/Simple senders are untouched:
their primary index is already the RTP domain and no second index is
allocated.
Add a header-only unit test for the domain predicate.
risttunnel only enabled EAP-SRP on the listener via the -F verifier file.
The sender leg (and the receiver leg in two-port client mode) parsed
username/password from the URL into the peer config but never called
rist_enable_eap_srp_2, so a caller pointed at an SRP-protected listener
failed with "EAP authentication requested but credentials have not been
configured" and the listener waited forever for authentication.
Mirror ristsender/ristreceiver/rist2rist: when the URL carries both an SRP
username and password, present them on the peer; the receiver leg falls back
to the verifier file when no credentials are supplied.
rist_oob_write() stashes the destination rist_peer* into the oob queue
from a caller thread; the protocol thread later sends it in
rist_oob_dequeue(). A peer addressed this way (e.g. the child peer the
risttunnel --single-port reverse path captures via the auth/connect
callback) can be torn down by a NAT rebind / session timeout in the
protocol thread between enqueue and dequeue. The non-listening branch
then dereferenced a freed peer (p->listening, rist_send_common_rtcp) ->
use-after-free / intermittent SIGSEGV.
Take peerlist_lock once at the top of the dequeue, verify the stashed
peer is still on the live PEERS list, and drop the packet if it is gone.
Hold the lock across the whole send (the listener branch already did
this for its child walk) so a concurrent add/remove cannot free the
peer or a sibling underneath us. Both rist_oob_dequeue callers release
peerlist_lock before calling it, so there is no recursive-lock risk.
The public free2() wrapper dereferenced (*block)->ref unconditionally.
In the receiver data_fd/tun delivery path the block can legitimately be
NULL by the time the fifo-overflow branch frees it: free_data_block()
nulls *block after a successful free, and merged FEC pairs never
allocate a block at all. With fifo_queue_size advancing on every packet,
the overflow branch eventually calls free2() on the now-NULL block,
reading offset 0x40 (->ref) of a NULL pointer -> intermittent SIGSEGV
(observed as "segfault at 40" on listeners running risttunnel in both
two-port and --single-port modes under abrupt link changes).
Make free2() a free(NULL)-style no-op, mirroring free_data_block()'s
own NULL guard. This protects every caller, not just the data_fd path.
risttunnel -1/--single-port carries a bidirectional IP tunnel over a
single UDP port instead of two. One RIST connection runs the forward
direction over the RTP data_fd path (ARQ-protected) and the reverse
direction over the OOB channel on the same socket (best-effort, no ARQ).
The caller passes only -o (sender role); the listener passes only -b
(receiver role). librist learns the OOB return peer only from received
OOB and the caller sends only forward RTP, so the listener captures the
connecting peer via the auth/connect callback (sp_conn_cb) and hands it
explicitly to rist_oob_write() from a TUN->OOB reader thread. The caller
injects inbound OOB into its TUN through an OOB receive callback.
Intended for low-bandwidth, NAT-friendly control tunnels (e.g. remote
support over a single UDP/443) where the heavy direction is forward and
the reverse is light/interactive.
rist_oob_read() was a stub that logged an error and returned 0 (success
per the public API) without ever writing *oob_block, so an application
using the documented polling API received a garbage output pointer.
Add an out-of-band receive fifo: when oob is enabled without a callback
(rist_oob_callback_set with a NULL callback), the protocol thread queues
incoming OOB packets in rist_recv_oob_data() and rist_oob_read() drains
them, handing back a borrowed block that stays valid until the next call.
The fifo is freed with the rest of the oob state on teardown.
rist_oob_read() now returns -1 when oob is disabled or a callback is
installed, the number of available packets (>=1) when one is returned,
and 0 when the queue is empty.
Adds test_oob_read exercising the write->read round-trip over the Main
and Advanced profiles.
risttunnel declared log_ptr uninitialised, so a non-NULL garbage stack
value could be passed to rist_logging_set(), which only allocates a new
settings struct when *log_ptr is NULL and otherwise writes the level/cb/
stream through the pointer as-is. That write to an invalid address caused
an intermittent SIGSEGV at startup (a zeroed stack slot happened to work,
masking the bug).
Point log_ptr at the static logging_settings global, matching
ristsender/ristreceiver/udp2udp/rist2rist. As a side effect the -v
loglevel now also applies to risttunnel's own messages, which previously
used the unconfigured global.
The issue #188 NAT/socket-rebind and multipath cname integration
tests included <pthread.h> and initialised their tracker mutex with
PTHREAD_MUTEX_INITIALIZER. MSVC has no system pthread.h, and the
Windows pthread-shim maps pthread_mutex_t to a CRITICAL_SECTION,
which cannot be initialised statically, so the whole librist VS
solution failed to build (only these tests; the library and tools
were fine).
Follow the convention already used by test_send_receive.c and
test_reflector.c: include "pthread-shim.h" (directly, or via
rist-private.h), initialise the tracker mutex at runtime with
pthread_mutex_init(), and declare the feeder threads with the
PTHREAD_START_FUNC() macro so they match the shim's __stdcall/LPVOID
signature. POSIX builds are unchanged; the suite now also compiles
under MSVC.
Drop the pre-release marker from the 0.2.18 heading and add the
canonical ABI 15:0:11 / API 4.12.0 summary lines, matching the
format of prior release sections.
Bump API to 4.12.0, ABI to 15:0:11 (soversion 4 unchanged,
binary-compatible with 0.2.15/0.2.16/0.2.17 and rc2). New interfaces
since rc2: rist_recovery_depth_set() with the
RIST_RECOVERY_DEPTH_MIN/DEFAULT/MAX macros, and the additive stats
fields (RIST_STATS_VERSION 3: profile / seq_bits / advanced_active).
Changes since rc2:
- Advanced-profile correctness fixes (>64k sequence gaps, ring-index
masking in receiver_mark_missing, expected-next-seq wrap mask,
payload-only byte accounting, sequence-based duplicate detection,
32-bit-safe merge-mode pairing).
- Default profile is now Advanced with Main interoperability (TR-06-3
Section 9); risttunnel follows RIST_DEFAULT_PROFILE.
- Tunable Advanced recovery-ring depth via ?recovery-depth=.
- Advanced-profile profile/framing visibility in the stats API and the
Prometheus exporter.
Add 0.2.18 pre-release entries for the changes that landed since rc2:
- Advanced-profile correctness fixes: >64k sequence gaps, ring-index
masking in receiver_mark_missing, expected-next-seq wrap mask,
payload-only byte accounting, sequence-based duplicate detection,
and 32-bit-safe merge-mode pairing
- Advanced-profile profile/framing visibility in the stats API
(RIST_STATS_VERSION 2 -> 3) and the Prometheus exporter, plus the
receiver counters that were previously JSON-only
- risttunnel follows RIST_DEFAULT_PROFILE
Also normalize two em dashes in the section to ASCII ' - '.
receiver_mark_missing() indexed receiver_queue[] with the raw last_seq_found
and current_seq, whereas every other queue access masks with
(receiver_queue_max - 1). For 16-bit flows the sequence range equals
receiver_queue_max, so a raw sequence was always a valid index. A 32-bit
(Advanced) sequence can exceed receiver_queue_max once the stream passes the
ring size: the access then reads off the end of the array and dereferences a
stale or NULL slot's packet_time, crashing the receiver. Mask both indices
like the rest of the queue accesses.
Several receiver counters were present in the stats JSON but never surfaced
as Prometheus metrics. Export retries, dropped_late, dropped_full,
duplicates, and the nack-depth buckets (recovered after two, three, four, and
more nacks).
The stats output gave no way to tell which RIST profile a flow used or
whether Advanced framing was active on the wire. Add `profile` and
`advanced_active` to the sender peer stats, and `profile`, `seq_bits`, and
`advanced_active` to the receiver flow stats, with matching fields in the
JSON payloads. Bump RIST_STATS_VERSION to 3 and the JSON schema version to 4.
The Prometheus exporter emits new `rist_client_flow_info` and
`rist_sender_peer_info` series that carry these as labels, so a scrape can
identify an Advanced flow (and 16-bit vs 32-bit framing) without changing the
label set of the existing series. On the sender, advanced_active reflects
whether Advanced framing is in use toward the peer; on the receiver flow it
is true only when the context is Advanced and the flow is on 32-bit framing,
so it reads false when an Advanced context stays on Main framing because the
peer has not advertised Advanced support (TR-06-3 Section 9) and false for
Main and Simple contexts.
The merge-mode pairing test computed the next sequence number with
`& UINT16_MAX`, truncating it to 16 bits, which mis-pairs packets near a
16-bit boundary on a 32-bit Advanced flow. Use rist_seq_next(), which wraps
correctly for both short_seq (16-bit) and 32-bit flows.
receiver_enqueue compared source_time to detect a duplicate in an occupied
slot. The Advanced path stamps source_time with the arrival time, so that
comparison never matched a genuine duplicate and the duplicate counter
under-reported. Compare the sequence number instead, which is what the slot
index is derived from.
This does not change how Advanced buffers packets: it still buffers on
arrival time, so avg_buffer_time measures buffer residence rather than
source-to-output latency. Buffering on the on-wire timestamp is left as a
separate change.
The Advanced receive path passed the full datagram size as the ingest size
while the Main path passes the payload only, so received_bytes and the
derived bitrate were not comparable across profiles. Pass adv_data_len (the
delivered payload) instead. ts_null_bytes stays 0 on the Advanced path,
which performs no ts-null reinsertion on receive.
receiver_enqueue computed the expected next sequence number with
`& (UINT16_MAX - 1)` (0xFFFE), which clears bit 0 and forces the value
even, so every odd successor was mis-computed. The mask was meant to wrap
a 16-bit counter, i.e. `& UINT16_MAX`.
The error is gated by the `packet_time < last_packet_ts` guard, so it never
fires on normal in-order delivery. It bites in ARRIVAL timing mode, where a
per-path arrival timeline can legitimately regress for an in-order packet on
a bonded flow, sending roughly half of odd-successor packets into the
late-drop path.
Extract rist_seq_next() (symmetric with rist_seq_gap) so the successor wraps
correctly for both short_seq (16-bit) and 32-bit flows, and add a link-free
unit test covering the odd-successor regression and both wrap points.
receiver_mark_missing() and receiver_enqueue() unconditionally masked the
forward sequence gap to 16 bits (& UINT16_MAX / & (UINT16_MAX-1)). That is
correct for Simple/Main (16-bit) flows but wrong for Advanced 32-bit flows:
a genuine >64k contiguous gap was truncated, distorting the NACK pacing math
and mis-classifying in-order packets in the arrival-timing reorder guard.
Reachable now that the Advanced recovery ring is resizable up to the full
32-bit space (?recovery-depth=).
Add a short_seq-aware rist_seq_gap() helper and use it at both sites. The
short_seq branch reproduces the original masks byte-for-byte, so 16-bit flows
are unchanged; only 32-bit flows now see the true gap. The wrap false-positive
cap is unified with the recovery-walk hole cap already in the tree
(short_seq ? UINT16_SIZE/2 : receiver_queue_max/2), finishing a conversion
that was only half-applied. Adds a pure, link-free unit test.
MR note: the 16-bit branch intentionally keeps the long-standing
`& (UINT16_MAX - 1)` (0xFFFE) mask in expected_seq, which forces the value
even and looks like it should be `& UINT16_MAX`. Preserving it keeps this
patch a zero-behavior-change for 16-bit flows; whether that mask is itself a
bug is a separate, older question and should be its own change.
Completes the default-profile sweep: risttunnel was still hardcoded to
Main while ristsender/ristreceiver/YAML now follow RIST_DEFAULT_PROFILE
(Advanced). risttunnel uses one profile for both its sender and receiver
side, so it now defaults to Advanced too and interoperates with a
Main-only peer via the TR-06-3 Section 9 fallback. Help text updated to
list advanced and the new default.
rist2rist is intentionally left at Simple (its documented purpose is to
receive Simple-profile input and re-emit Main); udp2udp does not select
a profile.
Changes the default RIST profile from Main to Advanced. This is now safe
because an Advanced endpoint interoperates with Main: it negotiates
Advanced framing with an Advanced peer and falls back to Main framing
with a Main-only peer (TR-06-3 Section 9, implemented in the preceding
commit), so existing Main deployments keep working without a flag.
- RIST_DEFAULT_PROFILE is now RIST_PROFILE_ADVANCED.
- ristsender, ristreceiver and the YAML config now take their default
profile from RIST_DEFAULT_PROFILE instead of a hardcoded
RIST_PROFILE_MAIN, so there is a single source of truth. Pass
-p 1 (or profile: 1 in YAML) to force Main as before.
- yamlparse.c gains an explicit <librist/peer.h> include for the macro.
The default only affects the profile a context starts in; the
?profile= URL override and the explicit profile argument to
rist_*_create() are unchanged. rist_peer_config still records the
default with profile_set = 0, so it is applied only when ?profile= is
present.
An Advanced-profile sender previously emitted Advanced (Type-8) framing
unconditionally, so a Main-only receiver could not decode the stream;
an Advanced receiver likewise assumed 32-bit framing for the whole flow.
This implements the spec's optional Main/Advanced interop method so a
single flow works across mismatched profiles.
Sender (udp.c): an Advanced device now starts in Main-Profile framing
and only switches to Advanced framing for a peer once that peer
advertises Advanced capability (I=1 in its Main keep-alives, tracked as
remote_supports_advanced). Until then it emits Main-conformant media a
Main-only peer can decode. Control and OOB are unchanged.
Receiver (rist-common.c): rist_receiver_recv_data() learns the actual
wire framing of each data packet (pkt_short_seq) and refines the flow's
short_seq from it. The two framings carry different sequence widths and
timestamp encodings, so a mid-stream framing change (the Main->Advanced
upgrade once a peer advertises I=1) is treated like a flow-id change:
the timing baseline is dropped so the next enqueue re-derives time_offset
and the seq->index mapping from the new framing instead of blending the
two. Without this the stale baseline corrupts delivery after the switch.
flow.c: Advanced flows default to 32-bit framing at create time;
recv_data refines it per wire. The recovery ring size stays fixed per
context.
Matched Advanced and matched Main pairs are unaffected. Adds a permanent
cross-profile interop suite (both directions, with and without 10% loss)
to test/rist/meson.build.
The Advanced-profile recovery/retransmission ring was a fixed
compile-time size, and Simple/Main flows over-allocated to that same
size. This makes the Advanced ring sizable and stops the 16-bit
profiles from paying the Advanced memory footprint.
- Recovery rings are now heap-allocated and profile-sized. Simple/Main
allocate the 16-bit cap (65536 entries); Advanced allocates the
configured depth. Sender and receiver rings are freed on teardown.
- New ?recovery-depth=<n> URL parameter and rist_recovery_depth_set(ctx,
depth) API size the Advanced ring. depth is a base-2 exponent: the
ring holds 65536 << depth packets. Default 3 (8x, the prior fixed
size); range 0..16 (16 == full 32-bit sequence space). Must be set
before rist_start().
- rist_peer_config grows a trailing recovery_depth field at the existing
RIST_PEER_CONFIG_VERSION 5; rist_peer_config_defaults_set_versioned()
sets it to RIST_RECOVERY_DEPTH_DEFAULT for version-5 callers.
Zero-initialised and older configs keep the default ring.
- A one-shot WARN fires at peer-config time when recovery-maxbitrate and
the max buffer would queue more packets than the recovery window can
address, because packets beyond the window cannot be retransmitted.
The Advanced hint points at ?recovery-depth=; the Simple/Main hint
points at the Advanced profile.
Tests: test_recovery_depth (depth->packet mapping, ring resize, receiver
capacity, Main behaviour, range clamping, NULL ctx) and
test_url_recovery_depth_parse (numeric parse, range and garbage
rejection).