1406 Commits

Author SHA1 Message Date
roman 2e8cd5072b Update tools/moo-ristsender.c 2026-06-22 10:18:59 +00:00
roman 0d416b9700 Update tools/moo-ristsender.c 2026-06-22 10:05:12 +00:00
roman e9fed8ac4e Add tools/moo-ristsender.c 2026-06-22 09:53:12 +00:00
roman c510a07e8f Update tools/meson.build 2026-06-22 09:48:57 +00:00
Sergio Ammirata 618207c7b0 Merge branch 'release/v0.2.18' into 'master'
release: v0.2.18

See merge request rist/librist!362
v0.2.18
2026-06-22 01:54:10 +00:00
Sergio Ammirata d8d9034777 fix(test): build NAT/multipath rebind tests under MSVC
The issue #188 NAT/socket-rebind and multipath cname integration
tests included <pthread.h> and initialised their tracker mutex with
PTHREAD_MUTEX_INITIALIZER.  MSVC has no system pthread.h, and the
Windows pthread-shim maps pthread_mutex_t to a CRITICAL_SECTION,
which cannot be initialised statically, so the whole librist VS
solution failed to build (only these tests; the library and tools
were fine).

Follow the convention already used by test_send_receive.c and
test_reflector.c: include "pthread-shim.h" (directly, or via
rist-private.h), initialise the tracker mutex at runtime with
pthread_mutex_init(), and declare the feeder threads with the
PTHREAD_START_FUNC() macro so they match the shim's __stdcall/LPVOID
signature.  POSIX builds are unchanged; the suite now also compiles
under MSVC.
2026-06-21 21:37:52 -04:00
Sergio Ammirata 2863e43834 NEWS: finalize 0.2.18 release
Drop the pre-release marker from the 0.2.18 heading and add the
canonical ABI 15:0:11 / API 4.12.0 summary lines, matching the
format of prior release sections.
2026-06-21 20:39:45 -04:00
Sergio Ammirata df64a2f1c0 meson: prepare 0.2.18-rc3
Bump API to 4.12.0, ABI to 15:0:11 (soversion 4 unchanged,
binary-compatible with 0.2.15/0.2.16/0.2.17 and rc2).  New interfaces
since rc2: rist_recovery_depth_set() with the
RIST_RECOVERY_DEPTH_MIN/DEFAULT/MAX macros, and the additive stats
fields (RIST_STATS_VERSION 3: profile / seq_bits / advanced_active).

Changes since rc2:
  - Advanced-profile correctness fixes (>64k sequence gaps, ring-index
    masking in receiver_mark_missing, expected-next-seq wrap mask,
    payload-only byte accounting, sequence-based duplicate detection,
    32-bit-safe merge-mode pairing).
  - Default profile is now Advanced with Main interoperability (TR-06-3
    Section 9); risttunnel follows RIST_DEFAULT_PROFILE.
  - Tunable Advanced recovery-ring depth via ?recovery-depth=.
  - Advanced-profile profile/framing visibility in the stats API and the
    Prometheus exporter.
v0.2.18-rc3
2026-06-21 16:46:17 -04:00
Sergio Ammirata 1e53f5ce09 NEWS: document changes since 0.2.18-rc2
Add 0.2.18 pre-release entries for the changes that landed since rc2:
  - Advanced-profile correctness fixes: >64k sequence gaps, ring-index
    masking in receiver_mark_missing, expected-next-seq wrap mask,
    payload-only byte accounting, sequence-based duplicate detection,
    and 32-bit-safe merge-mode pairing
  - Advanced-profile profile/framing visibility in the stats API
    (RIST_STATS_VERSION 2 -> 3) and the Prometheus exporter, plus the
    receiver counters that were previously JSON-only
  - risttunnel follows RIST_DEFAULT_PROFILE

Also normalize two em dashes in the section to ASCII ' - '.
2026-06-21 16:45:39 -04:00
Sergio Ammirata db88969ebf Merge branch 'fix/receiver-mark-missing-oob' into 'master'
fix(receiver): mask sequence into ring index in receiver_mark_missing

See merge request rist/librist!361
2026-06-21 18:46:56 +00:00
Sergio Ammirata f3a72d9ede fix(receiver): mask sequence into ring index in receiver_mark_missing
receiver_mark_missing() indexed receiver_queue[] with the raw last_seq_found
and current_seq, whereas every other queue access masks with
(receiver_queue_max - 1). For 16-bit flows the sequence range equals
receiver_queue_max, so a raw sequence was always a valid index. A 32-bit
(Advanced) sequence can exceed receiver_queue_max once the stream passes the
ring size: the access then reads off the end of the array and dereferences a
stale or NULL slot's packet_time, crashing the receiver. Mask both indices
like the rest of the queue accesses.
2026-06-21 14:29:36 -04:00
Sergio Ammirata 520d1b60d6 Merge branch 'feat/advanced-stats-visibility' into 'master'
feat(stats): expose RIST profile and complete Advanced receiver stats

See merge request rist/librist!360
2026-06-21 16:36:04 +00:00
Sergio Ammirata 8cf3c817b2 feat(stats): export collected receiver counters
Several receiver counters were present in the stats JSON but never surfaced
as Prometheus metrics. Export retries, dropped_late, dropped_full,
duplicates, and the nack-depth buckets (recovered after two, three, four, and
more nacks).
2026-06-21 12:28:30 -04:00
Sergio Ammirata 4d55974da5 feat(stats): expose RIST profile and Advanced framing
The stats output gave no way to tell which RIST profile a flow used or
whether Advanced framing was active on the wire. Add `profile` and
`advanced_active` to the sender peer stats, and `profile`, `seq_bits`, and
`advanced_active` to the receiver flow stats, with matching fields in the
JSON payloads. Bump RIST_STATS_VERSION to 3 and the JSON schema version to 4.

The Prometheus exporter emits new `rist_client_flow_info` and
`rist_sender_peer_info` series that carry these as labels, so a scrape can
identify an Advanced flow (and 16-bit vs 32-bit framing) without changing the
label set of the existing series. On the sender, advanced_active reflects
whether Advanced framing is in use toward the peer; on the receiver flow it
is true only when the context is Advanced and the flow is on 32-bit framing,
so it reads false when an Advanced context stays on Main framing because the
peer has not advertised Advanced support (TR-06-3 Section 9) and false for
Main and Simple contexts.
2026-06-21 12:27:27 -04:00
Sergio Ammirata 3bcf28caf7 fix(receiver): use rist_seq_next for merge-mode pairing
The merge-mode pairing test computed the next sequence number with
`& UINT16_MAX`, truncating it to 16 bits, which mis-pairs packets near a
16-bit boundary on a 32-bit Advanced flow. Use rist_seq_next(), which wraps
correctly for both short_seq (16-bit) and 32-bit flows.
2026-06-21 12:26:00 -04:00
Sergio Ammirata 83cfd7fd9e fix(receiver): detect duplicates by sequence number
receiver_enqueue compared source_time to detect a duplicate in an occupied
slot. The Advanced path stamps source_time with the arrival time, so that
comparison never matched a genuine duplicate and the duplicate counter
under-reported. Compare the sequence number instead, which is what the slot
index is derived from.

This does not change how Advanced buffers packets: it still buffers on
arrival time, so avg_buffer_time measures buffer residence rather than
source-to-output latency. Buffering on the on-wire timestamp is left as a
separate change.
2026-06-21 12:26:00 -04:00
Sergio Ammirata 81f8a50722 fix(receiver): match Advanced byte accounting to Main
The Advanced receive path passed the full datagram size as the ingest size
while the Main path passes the payload only, so received_bytes and the
derived bitrate were not comparable across profiles. Pass adv_data_len (the
delivered payload) instead. ts_null_bytes stays 0 on the Advanced path,
which performs no ts-null reinsertion on receive.
2026-06-21 12:25:38 -04:00
Sergio Ammirata d1b65ea6dd Merge branch 'fix/expected-seq-wrap-mask' into 'master'
fix(receiver): correct 16-bit wrap mask for expected-next-seq

See merge request rist/librist!359
2026-06-20 23:59:55 +00:00
Sergio Ammirata 639b388536 fix(receiver): correct 16-bit wrap mask for expected-next-seq
receiver_enqueue computed the expected next sequence number with
`& (UINT16_MAX - 1)` (0xFFFE), which clears bit 0 and forces the value
even, so every odd successor was mis-computed. The mask was meant to wrap
a 16-bit counter, i.e. `& UINT16_MAX`.

The error is gated by the `packet_time < last_packet_ts` guard, so it never
fires on normal in-order delivery. It bites in ARRIVAL timing mode, where a
per-path arrival timeline can legitimately regress for an in-order packet on
a bonded flow, sending roughly half of odd-successor packets into the
late-drop path.

Extract rist_seq_next() (symmetric with rist_seq_gap) so the successor wraps
correctly for both short_seq (16-bit) and 32-bit flows, and add a link-free
unit test covering the odd-successor regression and both wrap points.
2026-06-20 19:47:25 -04:00
Sergio Ammirata b688f44c7f Merge branch 'fix/advanced-32bit-gap' into 'master'
fix(advanced): don't truncate >64k sequence gaps on 32-bit flows

See merge request rist/librist!358
2026-06-20 23:18:17 +00:00
Sergio Ammirata 6ac6ce78b5 fix(advanced): don't truncate >64k sequence gaps on 32-bit flows
receiver_mark_missing() and receiver_enqueue() unconditionally masked the
forward sequence gap to 16 bits (& UINT16_MAX / & (UINT16_MAX-1)). That is
correct for Simple/Main (16-bit) flows but wrong for Advanced 32-bit flows:
a genuine >64k contiguous gap was truncated, distorting the NACK pacing math
and mis-classifying in-order packets in the arrival-timing reorder guard.
Reachable now that the Advanced recovery ring is resizable up to the full
32-bit space (?recovery-depth=).

Add a short_seq-aware rist_seq_gap() helper and use it at both sites. The
short_seq branch reproduces the original masks byte-for-byte, so 16-bit flows
are unchanged; only 32-bit flows now see the true gap. The wrap false-positive
cap is unified with the recovery-walk hole cap already in the tree
(short_seq ? UINT16_SIZE/2 : receiver_queue_max/2), finishing a conversion
that was only half-applied. Adds a pure, link-free unit test.

MR note: the 16-bit branch intentionally keeps the long-standing
`& (UINT16_MAX - 1)` (0xFFFE) mask in expected_seq, which forces the value
even and looks like it should be `& UINT16_MAX`. Preserving it keeps this
patch a zero-behavior-change for 16-bit flows; whether that mask is itself a
bug is a separate, older question and should be its own change.
2026-06-20 19:01:48 -04:00
Sergio Ammirata 5046e25965 Merge branch 'feat/advanced-main-interop' into 'master'
Advanced profile: Main interop, tunable recovery depth, Advanced default

See merge request rist/librist!357
2026-06-20 18:30:31 +00:00
Sergio Ammirata 4b1b264e94 feat(tunnel): default risttunnel to RIST_DEFAULT_PROFILE
Completes the default-profile sweep: risttunnel was still hardcoded to
Main while ristsender/ristreceiver/YAML now follow RIST_DEFAULT_PROFILE
(Advanced). risttunnel uses one profile for both its sender and receiver
side, so it now defaults to Advanced too and interoperates with a
Main-only peer via the TR-06-3 Section 9 fallback. Help text updated to
list advanced and the new default.

rist2rist is intentionally left at Simple (its documented purpose is to
receive Simple-profile input and re-emit Main); udp2udp does not select
a profile.
2026-06-20 14:18:16 -04:00
Sergio Ammirata e57e1e2ff7 feat(profile): default to the Advanced profile (was Main)
Changes the default RIST profile from Main to Advanced. This is now safe
because an Advanced endpoint interoperates with Main: it negotiates
Advanced framing with an Advanced peer and falls back to Main framing
with a Main-only peer (TR-06-3 Section 9, implemented in the preceding
commit), so existing Main deployments keep working without a flag.

- RIST_DEFAULT_PROFILE is now RIST_PROFILE_ADVANCED.
- ristsender, ristreceiver and the YAML config now take their default
  profile from RIST_DEFAULT_PROFILE instead of a hardcoded
  RIST_PROFILE_MAIN, so there is a single source of truth. Pass
  -p 1 (or profile: 1 in YAML) to force Main as before.
- yamlparse.c gains an explicit <librist/peer.h> include for the macro.

The default only affects the profile a context starts in; the
?profile= URL override and the explicit profile argument to
rist_*_create() are unchanged. rist_peer_config still records the
default with profile_set = 0, so it is applied only when ?profile= is
present.
2026-06-20 14:18:16 -04:00
Sergio Ammirata 615b01d2c4 feat(advanced): Main profile interoperability (TR-06-3 Section 9)
An Advanced-profile sender previously emitted Advanced (Type-8) framing
unconditionally, so a Main-only receiver could not decode the stream;
an Advanced receiver likewise assumed 32-bit framing for the whole flow.
This implements the spec's optional Main/Advanced interop method so a
single flow works across mismatched profiles.

Sender (udp.c): an Advanced device now starts in Main-Profile framing
and only switches to Advanced framing for a peer once that peer
advertises Advanced capability (I=1 in its Main keep-alives, tracked as
remote_supports_advanced). Until then it emits Main-conformant media a
Main-only peer can decode. Control and OOB are unchanged.

Receiver (rist-common.c): rist_receiver_recv_data() learns the actual
wire framing of each data packet (pkt_short_seq) and refines the flow's
short_seq from it. The two framings carry different sequence widths and
timestamp encodings, so a mid-stream framing change (the Main->Advanced
upgrade once a peer advertises I=1) is treated like a flow-id change:
the timing baseline is dropped so the next enqueue re-derives time_offset
and the seq->index mapping from the new framing instead of blending the
two. Without this the stale baseline corrupts delivery after the switch.

flow.c: Advanced flows default to 32-bit framing at create time;
recv_data refines it per wire. The recovery ring size stays fixed per
context.

Matched Advanced and matched Main pairs are unaffected. Adds a permanent
cross-profile interop suite (both directions, with and without 10% loss)
to test/rist/meson.build.
2026-06-20 14:18:16 -04:00
Sergio Ammirata 1ecc10c187 feat(advanced): tunable recovery ring depth (?recovery-depth=)
The Advanced-profile recovery/retransmission ring was a fixed
compile-time size, and Simple/Main flows over-allocated to that same
size. This makes the Advanced ring sizable and stops the 16-bit
profiles from paying the Advanced memory footprint.

- Recovery rings are now heap-allocated and profile-sized. Simple/Main
  allocate the 16-bit cap (65536 entries); Advanced allocates the
  configured depth. Sender and receiver rings are freed on teardown.
- New ?recovery-depth=<n> URL parameter and rist_recovery_depth_set(ctx,
  depth) API size the Advanced ring. depth is a base-2 exponent: the
  ring holds 65536 << depth packets. Default 3 (8x, the prior fixed
  size); range 0..16 (16 == full 32-bit sequence space). Must be set
  before rist_start().
- rist_peer_config grows a trailing recovery_depth field at the existing
  RIST_PEER_CONFIG_VERSION 5; rist_peer_config_defaults_set_versioned()
  sets it to RIST_RECOVERY_DEPTH_DEFAULT for version-5 callers.
  Zero-initialised and older configs keep the default ring.
- A one-shot WARN fires at peer-config time when recovery-maxbitrate and
  the max buffer would queue more packets than the recovery window can
  address, because packets beyond the window cannot be retransmitted.
  The Advanced hint points at ?recovery-depth=; the Simple/Main hint
  points at the Advanced profile.

Tests: test_recovery_depth (depth->packet mapping, ring resize, receiver
capacity, Main behaviour, range clamping, NULL ctx) and
test_url_recovery_depth_parse (numeric parse, range and garbage
rejection).
2026-06-20 14:18:00 -04:00
Sergio Ammirata 340fea4285 Merge branch 'fix/advanced-rtt-echo-scale' into 'master'
fix(advanced): scale RTT echo to NTP ticks

See merge request rist/librist!356
2026-06-17 13:37:10 +00:00
Thomas Symborski 9a30f7bad2 fix(advanced): scale RTT echo to NTP ticks
The RTT Echo Response handler converted the round-trip NTP interval with
a scale that inflated the measured RTT by a constant factor and left it in
microseconds rather than the NTP ticks the rest of the code expects.
peer->last_rtt drives the sender's retransmit re-queue suppression window,
so on a path with non-trivial RTT the inflated value made the sender
refuse the re-NACKs a twice-dropped packet needs, leaving packets
unrecovered under sustained loss; recovery on low-latency links was
unaffected, which is why it went unnoticed.

Derive the round-trip time with calculate_rtt_delay(), the same helper the
Simple and Main RTCP echo path uses: it returns NTP ticks and subtracts
the responder's processing delay in the correct units.
2026-06-16 01:58:17 -04:00
Sergio Ammirata a04f901d5c Merge branch 'fix/issue-188-cname-reassoc-srp-only' into 'master'
fix(security): restrict cname reassociation to authenticated SRP

See merge request rist/librist!355
2026-06-15 21:32:12 +00:00
Sergio Ammirata faa39c474a fix(security): restrict cname reassociation to authenticated SRP
Listener-side cname re-association (added for issue #188 NAT source-port
rebind recovery) treated a matching cname as proof of identity. Under a
shared PSK the cname is not a per-peer secret, so any peer holding the
group key could replay a victim's cname from a new source tuple and take
over its peer record. Gate the migration on an authenticated per-peer SRP
session; plaintext and shared-PSK callers instead recover through the
existing caller-side socket rebind.

Coverage:
- test_psk_cname_hijack: a shared-PSK caller forging a victim's cname is
  not absorbed into the victim's peer record
- test_nat_port_rebind: per-mode distinct-caller expectations
  (plaintext/psk stay separate, srp migrates)
- test_multipath_cname: same-cname bonded paths stay distinct under PSK
  and under SRP with shared credentials
- test_multipath_leg_rebind: one SRP leg rebinds mid-stream without
  merging into the surviving same-cname leg or disturbing it

The new tests use rist_receiver_data_callback_set2 and release each
delivered block via rist_receiver_data_block_free2, so they compile under
-Werror=deprecated-declarations on win64 and run clean under the ubuntu
LeakSanitizer build.

Update NEWS for the SRP-only reassociation and the caller-side rebind path.
2026-06-15 17:00:36 -04:00
Sergio Ammirata 667034da4d Merge branch 'release/v0.2.18-rc2' into 'master'
meson: prepare 0.2.18-rc2

See merge request rist/librist!354
v0.2.18-rc2
2026-06-15 14:52:53 +00:00
Sergio Ammirata a3027634cd meson: prepare 0.2.18-rc2
Bump API to 4.11.0, ABI to 14:0:10 (soversion 4 unchanged,
binary-compatible with 0.2.15/0.2.16/0.2.17).  New interfaces since
rc1: rist_peer_config_defaults_set_versioned() and the recovery_priority
peer-config field (RIST_PEER_CONFIG_VERSION 5).

Changes since rc1:
  - Per-peer retransmission routing via ?recovery-priority=.
  - Versioned peer-config initialisation closes an ABI-related
    out-of-bounds write for callers built against older struct revisions.
  - Fixed a global-logger use-after-free when an application frees its
    logging settings while the global logger still references them.
  - Receiver no longer aborts when an interpolated packet arrival time
    falls outside its neighbouring queue slots.
  - NAT source-port rebind recovery (issue #188).
2026-06-15 10:47:51 -04:00
Sergio Ammirata 62ad3df049 Merge branch 'repro/issue-188-nat-port-rebind' into 'master'
issue #188: NAT source-port rebind recovery

Closes #188

See merge request rist/librist!352
2026-06-15 13:27:42 +00:00
Sergio Ammirata 28c442a1e1 issue #188: NAT source-port rebind recovery (split by encryption)
A calling receiver behind a dynamic-IP NAT can have its external
source port change (DHCP renewal, NAT mapping rebind, etc.).  The
listening sender's peer match keys on the full caller tuple, so
the new tuple is treated as a brand-new caller, the existing
child peer record sits orphaned for session_timeout, and the
receiver-side state is not inherited.  The issue reports a manual
restart of ristreceiver as the workaround.

The recovery path is split by whether encryption is in play, so
cname (forgeable on the wire in plaintext) is only trusted to
identify a peer when an authenticated envelope is present.

Listener-side cname re-association (encrypted callers):
  Once a new listener-child peer authenticates and learns its
  cname from an RTCP SDES, look for an existing sibling with the
  same cname / is_rtcp / is_data / adv_flow_id and migrate the
  new source tuple into it.  Gated on PSK or SRP being
  configured, on exactly one matching candidate sibling that is
  either dead or has been silent for >= 2 * keepalive_interval,
  and on no other sibling sharing the cname being currently
  alive (protects multipath and same-NAT duplicate-cname cases).
  For SRP, kick a fresh EAPOL START on the surviving sibling so
  the periodic re-auth timer fires now instead of after up to
  EAP_REAUTH_PERIOD; the full EAP-SRP handshake still takes its
  normal multi-RTT exchange.  _librist_peer_match_peer_addr now
  skips dead children so an absorbed peer cannot shadow the
  surviving record that now owns the migrated source tuple.

Receiver-caller socket rebind (plaintext callers):
  From rist_timeout_check, immediately before the kill_peer
  that would normally fire on session_timeout, close and reopen
  the local UDP socket on a fresh ephemeral port and restart
  the handshake.  Gated on caller-mode receiver + MAIN+ profile
  + no encryption + peer->config.local_port == 0 + min_silence =
  max(session_timeout, 4 * keepalive_interval) so a misconfigured
  short session_timeout against a slower keepalive cadence does
  not flap-loop.  Linear backoff capped at REBIND_BACKOFF_CAP *
  session_timeout.  If rist_create_socket fails after the close,
  fall through to the normal kill_peer path.

Reproducers:
  * test/rist/test_nat_port_rebind.c -- two modes (plaintext /
    psk) covering both branches of the encryption gate.
  * test/rist/test_caller_socket_rebind.c -- destroys sender,
    asserts the receiver auto-rebinds to a new ephemeral local
    port and the next sender to come up sees the reconnected
    caller without operator intervention.
2026-06-15 09:20:34 -04:00
Sergio Ammirata 56ccd87fdf Merge branch 'fix/receiver-packet-time-assert' into 'master'
fix(receiver): never abort on non-monotonic packet_time interpolation

See merge request rist/librist!353
2026-06-15 13:09:22 +00:00
Sergio Ammirata 780ee59d93 fix(receiver): never abort on non-monotonic packet_time interpolation
The retry/gap interpolation path in receiver_enqueue() asserted that the
CBR-interpolated packet_time fell below next->packet_time.  Under
arrival-based timing a flow can be fed by peers on unsynchronised clocks,
so neighbouring queue slots may carry packet_times from different clock
domains and the interpolated estimate can land outside (previous, next) --
aborting the receiver in debug builds and silently corrupting queue
ordering under NDEBUG.

Replace the assert with explicit guards and clamping:
  - only interpolate when next->packet_time > previous->packet_time;
  - multiply before dividing so integer rounding cannot push the estimate
    up to or past next->packet_time;
  - handle gap/reorder/sequence-wrap (seq not strictly between the
    neighbours) by placing the packet just before next;
  - clamp the final result into [previous->packet_time, next->packet_time].
2026-06-15 08:55:45 -04:00
Sergio Ammirata 5e73e582f4 Merge branch 'feat/recovery-priority' into 'master'
feat(nack): route retransmissions by per-peer recovery_priority

See merge request rist/librist!350
2026-06-11 00:23:19 +00:00
Sergio Ammirata 8da9e7f737 feat(nack): route retransmissions by per-peer recovery_priority
When a receiver flow is carried by more than one RTCP-capable peer,
send_nack_group historically sent every NACK to the eligible peer
with the lowest RTT.  That is wrong when the lowest-RTT peer cannot
actually retransmit: e.g. a duplicate/relay feed that delivers the
same packets but keeps no retransmit buffer.  NACKs go to it, the
peer that holds the buffer never hears them, and recovery stalls.

Add a per-peer recovery_priority (new ?recovery-priority=<n> URL
parameter, rist_peer_config.recovery_priority).  NACK groups now go
to the eligible peer with the highest recovery_priority, ties broken
by lowest RTT.  The default of 0 on every peer reduces exactly to the
prior lowest-RTT selection, so existing deployments are unchanged.

The selection predicate is factored into src/rist-nack-select.h
(rist_nack_peer_preferred) so it can be unit-tested without building
peers or starting threads.

Also name the weight=0 duplicate sentinel: new public macro
RIST_PEER_WEIGHT_DUPLICATE makes the load-balancer's "duplicate to
all peers" branch read as intent rather than a magic 0.

ABI: RIST_PEER_CONFIG_VERSION 4 -> 5 (one new trailing field,
defaulted to 0 by rist_peer_config_defaults_set; zero-initialised
configs keep legacy behaviour).

Tests:
  - test/rist/unit/test_url_recovery_priority_parse.c: ?recovery-priority=
    parsing through the public rist_parse_address2 API.
  - test/rist/unit/test_nack_peer_select.c: priority/RTT selection,
    tie-breaks, legacy-equivalence and unmeasured-RTT edge cases.
  - meson test --suite unit: 8/8 OK.
  - meson test (full): 51 OK, 11 Expected Fail, 0 unexpected.
2026-06-10 17:07:37 -04:00
Sergio Ammirata 2bf4ad2ad8 Merge branch 'fix/abi-peer-buffer-overflow' into 'master'
core: implement versioned struct initialization and library guards to prevent ABI memory issues

See merge request rist/librist!349
2026-06-10 20:40:49 +00:00
Sergio Ammirata 250c11c119 test, news: cover versioned peer-config init and document legacy fallback
Add a unit test asserting rist_peer_config_defaults_set_versioned() writes
only the fields defined at or below the requested version (split_mode and
merge_mode from version 1, profile and profile_set from version 4) and that
the rist_peer_config_defaults_set() inline records the caller's compiled
RIST_PEER_CONFIG_VERSION.

Document in NEWS the new versioned initialiser, the version guards in
parse_url_options() / store_peer_settings(), and the retained exported
rist_peer_config_defaults_set() symbol, which assumes version 0 for binaries
linked against an earlier library and therefore leaves version-1+ fields
untouched.
2026-06-10 16:27:50 -04:00
RossWang b8dc4832f8 core: implement versioned struct initialization and library guards to prevent ABI memory issues
Introduces version checks and a versioned struct initialization mechanism for
`struct rist_peer_config` to protect against memory corruption and out-of-bounds
access when older client binaries run against a newer dynamic library.

Changes:
- Declared `rist_peer_config_defaults_set_versioned()` to initialize defaults
  conditionally based on the version of the calling client.
- Redefined `rist_peer_config_defaults_set()` as a static inline function in
  `peer.h` for external clients to capture their compiled version, while keeping
  the non-inline legacy wrapper in `rist.c` (gated by `LIBRIST_INTERNAL` and
  defaulted to version 0) for ABI linkage compatibility.
- Added version checks to `parse_url_options()` inside `src/rist-common.c` to prevent
  writing out-of-bounds (`multicast_ttl`, `multicast_source`, `reflector`,
  `local_port` require version >= 2; `srp_compat_legacy` requires version >= 3).
- Added version checks to `store_peer_settings()` inside `src/rist-common.c` to prevent
  reading out-of-bounds when copying configuration settings to internal structures
  (`reflector` requires version >= 2; `srp_compat_legacy` requires version >= 3).
2026-06-10 16:27:26 -04:00
Sergio Ammirata 8c23f24c86 Merge branch 'fix/log-cb-use-after-free' into 'master'
logging: unset global logs when settings or context are freed

See merge request rist/librist!348
2026-06-10 20:25:19 +00:00
Sergio Ammirata ea35c55b82 news: document global-logger use-after-free fix
Record the fix that clears the global logging settings when an
application frees its rist_logging_settings (directly or by destroying a
context that embeds them), preventing later logs from calling a freed
callback.
2026-06-10 16:02:42 -04:00
RossWang 2790539e88 logging: unset global logs when settings or context are freed
Fixes an access violation/use-after-free in client applications (such
as FFmpeg) when a stream is quickly closed and opened.

If a client allocates or embeds logging settings, they are copied into
the static global logging settings. When the stream is destroyed, the
memory hosting the logging callback and its user argument is freed.
If they are not unset from the global logger, subsequent library logs
routed through rist_log_priv3 (e.g., in background evsocket threads
or new context initialization) will attempt to use the dangling
callback argument, triggering a crash inside the logging callback.

This fixes the issue by unsetting the global logging configuration
if the settings structure being cleaned up matches the currently
configured global logging settings.

- Add rist_logging_unset_global_if_matches() to check and unset settings.
- Call it in rist_logging_settings_free2() when freeing settings directly.
- Call it in rist_receiver_destroy_local() and rist_sender_destroy_local()
  to handle clients like FFmpeg that embed the settings structure.
2026-06-10 20:37:23 +08:00
Sergio Ammirata 977ef1b329 Merge branch 'feature/prometheus-receiver-peer-url' into 'master'
prometheus-exporter: add peer_url label to rist_receiver_peer_*

See merge request rist/librist!347
2026-06-09 09:44:22 +00:00
Sergio Ammirata 460032f555 prometheus-exporter: add peer_url label to rist_receiver_peer_*
Mirrors the existing peer_url label on rist_sender_peer_*.  peer_id
is reissued (higher) on reconnect, so it isn't a stable per-peer
identifier; peer_url is.

Sender peers carry the URL through rist_prometheus_sender_add_peer.
Receiver peers are auto-discovered from the stats callback, so the
URL is added to the receiver-flow stats JSON ("url" field on each
peer object) and parsed at first registration in the exporter.
For listener-mode peers the URL lives on the parent (the inbound
child is created with url=NULL); the stats populator walks the
parent chain.
2026-06-09 05:38:49 -04:00
Sergio Ammirata 8461d46117 Merge branch 'fix/prometheus-unix-socket-reliability' into 'master'
fix(prometheus-exporter): six fixes for flaky --metrics-unixsock scrapes

See merge request rist/librist!346
2026-06-06 01:07:23 +00:00
Sergio Ammirata bae99c6c64 prometheus-exporter: fix flaky unix-socket scrapes
The unix-socket scraper exposed by ristreceiver / ristsender
(activated by --metrics-unixsock) returned empty or stub-only
bodies on a substantial fraction of scrapes in production.
Six separate defects in tools/prometheus-exporter.c contributed:

1. listen(fd, 1) -- a backlog of one is too small for any
   real deployment where Prometheus + an external collector
   may scrape within the same RTT. Replaced with SOMAXCONN.

2. accept() returning EINTR was treated as fatal and broke the
   thread out of the loop, silently disabling the exporter for
   the remainder of the process lifetime. EINTR is now retried.

3. The hot path issued a single unchecked write() with the
   result discarded via (void)!. Partial writes and transient
   EINTR therefore truncated the response, and an early peer
   close raised SIGPIPE on the exporter thread. Replaced with
   a write_all helper that loops on partial writes, retries
   EINTR, and uses MSG_NOSIGNAL where available.

4. shutdown(SHUT_RDWR) immediately before close() can behave
   as an abortive teardown on AF_UNIX SOCK_STREAM under some
   kernels, discarding bytes that write() had just queued.
   Switched to shutdown(SHUT_WR), which is the documented
   "no more writes -- let the receive queue drain" sequence.

5. close() with unread bytes in the receive queue is the
   biggest source of empty-body scrapes in the field. The
   exporter is a dump-on-connect protocol that never reads
   the client's request, but tools like nc / curl / the
   Prometheus scraper itself all send "GET /metrics HTTP/1.0
   \r\n\r\n". On Linux, close() on an AF_UNIX SOCK_STREAM
   with unread input is treated as an abortive teardown: the
   kernel sends RST and discards everything in the transmit
   queue -- including the 10 KB body we just wrote. The
   client sees ECONNRESET / EOF with zero bytes. We now
   shutdown(SHUT_WR) and drain the receive queue with a
   SO_RCVTIMEO-bounded recv loop before close, so the kernel
   does a clean FIN-FIN handshake and the body actually
   reaches the client. Confirmed by a Python probe that
   went from 5/5 ECONNRESET to 5/5 success in the same
   environment.

6. In single-stat-point mode (the default for ristreceiver
   and ristsender) rist_prometheus_stats_format() was zeroing
   container_count after every scrape. Any subsequent scrape
   that arrived before the next 1 Hz stats callback then
   emitted only HELP/TYPE/UNIT preambles with no data points,
   which manifested in the field as metrics flickering and
   downstream consumers parsing NaN / blank bandwidth. The
   ring is now only reset in --metrics-multipoint mode;
   single-point mode keeps the latest sample visible until
   the next callback overwrites it. cleanup_stale_locked()
   still ages stale flows / peers out after 15 s.
2026-06-05 21:02:54 -04:00
Sergio Ammirata dc4b993302 Merge branch 'fix/prometheus-single-stat-point-slot' into 'master'
fix(prometheus-exporter): write to container[0] in single-stat-point mode

See merge request rist/librist!345
2026-06-06 00:48:34 +00:00
Sergio Ammirata 70f5fc7089 fix(prometheus-exporter): write to container[0] in single-stat-point mode
In single-stat-point mode the writer set container_offset = 1 after
each callback, but the format() macro reads only container[0..count-1]
and container_count stays at 1.  container[0] was therefore frozen at
the value from the first callback while every subsequent callback
silently overwrote container[1], which the reader never touches.

Three sites had the same pattern:
- rist_prometheus_handle_client_stats (receiver flow stats)
- rist_prometheus_handle_client_stats (per-receiver-peer stats)
- rist_prometheus_handle_sender_peer_stats

NEWS entry added under 0.2.18 Bug Fixes.
2026-06-05 20:41:48 -04:00