1272 Commits

Author SHA1 Message Date
roman 56b7c3bff5 Update tools/meson.build 2026-05-24 16:43:27 +00:00
roman 467841d071 Add tools/moo-ristsender.c 2026-05-24 16:42:30 +00:00
Sergio Ammirata 812fd9dd8f Merge branch 'feat/pmtu-safety-net' into 'master'
udpsocket: set IP don't-fragment on every UDP socket

See merge request rist/librist!323
2026-05-24 08:00:38 +00:00
Sergio Ammirata 00ee7c4027 udp,gre: log actionable PMTU hint on EMSGSIZE
When a send fails with EMSGSIZE, log the packet size and suggest
reducing the application payload. Rate-limited to one message per
peer per 5 seconds. Both simple-profile and main-profile send paths
use the same helper.
2026-05-24 03:56:06 -04:00
Sergio Ammirata 0b8d3d849a udpsocket: set IP don't-fragment on every UDP socket
Set DF so the kernel returns EMSGSIZE instead of silently
IP-fragmenting. Covers Linux, Windows, and BSD for v4/v6.
Best-effort: platforms without the option keep legacy behaviour.
2026-05-24 03:55:59 -04:00
Sergio Ammirata 64d511a275 Merge branch 'fix/ristsender-rtp-seq-ts-uncomment' into 'master'
fix(ristsender): enable --rtp-sequence and --rtp-timestamp extraction

See merge request rist/librist!322
2026-05-23 19:58:25 +00:00
Sergio Ammirata aa141517a3 fix(ristsender): enable --rtp-sequence and --rtp-timestamp extraction
The RTP seq/timestamp extraction in input_udp_recv was disabled since
be32351 (2020) with a "TODO: Figure out why this does not work" comment.

The root cause was a buffer offset bug: recvfrom writes the UDP payload
at recv_buf + ipheader_bytes, but the extraction code read recv_buf[2..7]
— the reserved IP-header prefix area — returning garbage.
2026-05-23 15:54:49 -04:00
Sergio Ammirata 88467c0e4e Merge branch 'security/audit2-followup' into 'master'
Security follow-ups to v0.2.15 (audit, part 2)

See merge request rist/librist!319
2026-05-18 17:23:25 +00:00
Sergio Ammirata 91d666fb66 NEWS: document audit2 follow-up fixes for pre-v0.2.16
Add a "Post-0.2.15 audit follow-ups" subsection under the existing
0.2.16 hardening notes. Groups the changes by component (receiver
flow accounting, EAP authenticator, PSK, SRP, stats JSON, build
portability) so a downstream reader can scan to the bit they care
about.
2026-05-18 12:52:49 -04:00
Sergio Ammirata df2834f906 build(srp): drop dead !USE_SHA_RET branch, require mbedTLS >= 2.7
a66b7bf fixed the !USE_SHA_RET branch so it would compile against
mbedTLS versions prior to 2.7 (March 2018), where the SHA-256 calls
returned void and could not signal an internal allocation failure
back to the caller. The branch is otherwise dead: the bundled
mbedTLS in contrib/ is 2.28.x, every distro of interest is well
past 2.7, and no CI job builds against anything older. A latent
bug regressing into that branch would never be caught.

Drop the branch, emit an explicit #error if anyone builds against
mbedTLS < 2.7 with directions to either upgrade their system
mbedTLS or use the bundled tree. Surface area down, untestable
dead code gone.

Nettle has always returned int and is unaffected.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/info-02).
2026-05-18 12:45:42 -04:00
Sergio Ammirata feabe74762 fix(stats): add schema_version field to JSON payload
86e2a16 changed the sender-stats JSON shape from duplicate "peer"
keys (technically valid, practically broken) to a single "peers"
array. The change is correct, but it was made without any way for
downstream consumers to know which shape they're parsing. A
monitoring script that did .["sender-stats"].peer.id silently
started getting null after the upgrade. The prometheus exporter
shipped inside the tools tree was rewritten for the new shape
(3ea6c49), so a librist sender on the new tree paired with an
external exporter pinned to the old shape sees no peers at all,
again silently.

Add a top-level "schema_version" integer to the three places we
emit stats JSON (sender flow stats, sender per-peer wrapper,
receiver flow stats). Set it to 2 - version 1 was the duplicate-key
shape, version 2 is the array shape we ship now. Receiver stats has
always been array-shaped, but tagging it lets consumers branch on a
single field regardless of which side they're parsing.

Any future incompatible shape change must bump this number and call
it out in NEWS.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-04).
2026-05-18 12:44:47 -04:00
Sergio Ammirata 88c1bcc446 fix(srp): reserve former NG_512 / NG_768 enum slots to avoid silent ABI shift
eefe26e dropped the sub-1024-bit RFC 5054 groups (correct call -
they're below current minimum-strength thresholds) but did so by
deleting the enum entries outright, which shifted every remaining
group's integer value down by two. librist_get_ng_constants is
RIST_API-exported and external callers that hardcoded the integer
2 to mean NG_1024 silently started getting NG_4096 instead. Same
idea, different group, equally silent and equally wrong.

Restore the original integer values by reserving slots 0 and 1 with
explicit LIBRIST_SRP_NG_RESERVED_0 / RESERVED_1 names, and make
librist_get_ng_constants reject them with -1 (the existing
out-of-range error). Code using the symbolic enum names is
unaffected; code using literal integers gets either the right group
or a clean failure.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-05).
2026-05-18 12:43:52 -04:00
Sergio Ammirata ace6eab94e fix(stats): NUL-terminate sender peer cname explicitly
strncpy does not guarantee NUL termination when the source is at
least n bytes long. The defensive cname[0] = '\0' before the copy
in rist_stats_sender_peer_stats did nothing - strncpy overwrites
it. Today the bound is safe in practice (the SDES bounds fix in
c308e10 limits peer->receiver_name to 127 bytes + NUL within its
128-byte field, so the implicit NUL is copied) but the contract is
fragile and is the same pattern the udpsocket fix in 38cb58f
already excised elsewhere.

Use a copy length one byte shy of the destination and write the
terminator unconditionally.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-03).
2026-05-18 12:43:14 -04:00
Sergio Ammirata 859224faa3 fix(flow): make the RIST_MAX_FLOWS cap race-free
rist_receiver_associate_flow counted existing flows under
flows_lock, released the lock, called create_flow, and create_flow
re-took the lock to insert. Two concurrent callers could each see
count == RIST_MAX_FLOWS - 1, each release the lock, and each insert
- pushing the actual flow count past the cap by (racers - 1). The
practical impact was small because the cap still bounds memory
growth to O(N + racers), but the documented invariant "at most
RIST_MAX_FLOWS flows" did not hold under concurrency.

Move the authoritative count + cap check into create_flow under
the same flows_lock acquisition that does the append. The early
count in associate_flow stays as an allocation-avoidance
optimization (skip the calloc + pthread_init dance when the cap is
already definitively reached), but is no longer relied on for
correctness. If the early count is stale and create_flow trips the
cap, we tear down the freshly-allocated flow cleanly and return
NULL - the caller's existing NULL handling kicks in.

Move the RIST_MAX_FLOWS / RIST_MAX_PEERS_PER_FLOW defines above
create_flow so the cap check sees them.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-02).
2026-05-18 12:42:55 -04:00
Sergio Ammirata d5c1ccc1e9 fix(psk): drop the AES-128 fallthrough in the nettle session dispatch
The session-path AES-CTR dispatch carried the same default-falls-
through-to-AES-128 shape that audit/high-13 spotted on the
standalone _librist_crypto_aes_ctr. 9e3703e fixed the standalone
function but missed the session sibling. Today it is not reachable
(key_size always comes from SRP and is 128/192/256) but the pattern
is a one-typo regression away - a future caller that passes
key_size = 0 (e.g. a use-before-init) would silently run AES-128
over an uninitialized key schedule.

Replicate the fixed shape: explicit cases for 128 / 192 / 256, and
return on default rather than encrypting under whatever key happens
to be in key->nettle_ctx.u.ctx128.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-01).
2026-05-18 12:41:47 -04:00
Sergio Ammirata 0501b30d22 fix(psk): skip PBKDF2 work for nonces while the key is locked out
_librist_crypto_psk_decrypt ran a full PBKDF2 + AES re-keying on
every attacker-chosen GRE nonce, even after the key had been locked
out for excessive decryption failures. 1c1ae12 correctly closed the
audit's lockout-reset path by preserving the bad_decryption flag
across nonce rolls, but the PBKDF2 itself still ran on every packet
with a new nonce. Each iteration costs a few milliseconds of CPU
against an attacker that needs to flip four bytes of GRE header per
packet, so a modest packet rate could pin the receiver CPU.

Move the lockout test before the rekey: while bad_decryption is
set, attacker-supplied nonces are ignored without doing any work.
The flag is still only cleared by passphrase rotation or by a
successful decrypt + RTP validation upstream, so the high-15 fix
property is preserved.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-01).
2026-05-18 12:41:26 -04:00
Sergio Ammirata 8ae996c0a7 fix(flow): align peer_lst lock discipline on f->mutex, cap peers per flow
flow->peer_lst is reallocated under f->mutex on the add path
(rist_receiver_associate_flow), but several other writers and
readers were not taking the same lock:

  - remove_peer_from_flow took no lock at all on the shrink path.
  - send_nack_group took peerlist_lock (not f->mutex) around the
    peer_lst walk, so a concurrent realloc-move from the protocol
    thread left it walking a freed array. Reproduced under TSan and
    under ASan with audit2/high-01/poc-h1-race-harness.c.
  - The session-timeout deletion path (rist_receiver_pthread_protocol)
    walked peer_lst under peerlist_lock only after dropping f->mutex,
    same class of race.

Take f->mutex everywhere peer_lst is read or written. Lock order is
peerlist_lock outside, f->mutex inside, matching the existing
convention elsewhere in the file. Both shrink call-sites
(remove_peer_from_flow and the inline writer in rist_peer_remove)
now also check the realloc return: a shrink that returns NULL
legitimately means "could not move" and we keep the oversized
allocation rather than overwriting peer_lst with NULL while
peer_lst_len is still positive.

Add RIST_MAX_PEERS_PER_FLOW = 256, checked under f->mutex in
rist_receiver_associate_flow. peer_lst grew unbounded before this;
256 is generous for any legitimate multipath deployment and bounds
the worst case for both attack and misconfiguration.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-01).
2026-05-18 12:40:59 -04:00
Sergio Ammirata 91da8b5e06 fix(crypto): stop poking at mbedTLS internal entropy struct fields on Windows
The b05ff88 Windows CSPRNG fix reached into mbedtls_entropy_context's
internal source_count and source[] fields to drop the default
CryptGenRandom source and substitute BCryptGenRandom. Those fields
are not part of the mbedTLS public API. mbedTLS 3.x has started
hiding internal struct members behind MBEDTLS_PRIVATE() and reserves
the right to reorder the layout, so the existing approach will fail
to compile (or, worse, compile but read wrong offsets) on any distro
that builds librist against system mbedTLS 3.x.

Rework the Windows path to bypass the entropy context entirely: feed
BCryptGenRandom directly into mbedtls_ctr_drbg_seed via the f_rng
interface, which is part of the stable public API and does not care
how the random bytes are produced. The entropy context is now
declared and initialised only on non-Windows; the seed-failure
plumbing (ctr_drbg_seed_ret, the fail-closed property in
_librist_crypto_ramdom_get_bytes) is unchanged.

Bundled contrib mbedtls 2.28.10 is unaffected, but this removes a
build hazard for downstream packagers building against system
mbedtls 3.x.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/info-03).
2026-05-18 12:38:39 -04:00
Sergio Ammirata 0089625aa2 fix(crypto): propagate CSPRNG failure through the nettle SRP wrapper
_librist_srp_nettle_wrap_random is the random-bytes callback nettle
invokes for the SRP private exponents (a on the authenticatee side,
b on the authenticator side) and for the verifier salt. Its
signature is void, so when _librist_crypto_ramdom_get_bytes failed
the wrapper used to discard the error and leave the destination
buffer at whatever was on the stack. mpz_import then read those
stack bytes as the SRP private exponent. On a build using the
nettle backend during a sustained entropy outage that meant the
SRP session key could be derived from uninitialized memory.

8f22b8e made the mbedTLS path fail-closed because the mbedTLS
wrapper returns int and propagates the error naturally. This commit
applies the equivalent property to the nettle path: thread the
caller's `int ret` through the otherwise-unused void* context arg
of nettle_mpz_random / nettle_mpz_random_size, and have the wrapper
write -1 plus zero the buffer on CSPRNG failure. The existing
`if (ret != 0)` checks at the two BIGNUM_RANDOM call sites and at
the salt-generation site then abort the SRP operation cleanly.

The buffer-zero on failure is defense in depth: even if a caller
were added that ignored ret, the bignum would be zero rather than
attacker-influenced stack, and SRP's mod-N checks already reject
0 as A or B.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-05).
2026-05-18 12:37:40 -04:00
Sergio Ammirata 4e8c08e64e fix(crypto): fail PSK encrypt/decrypt closed when CSPRNG is unavailable
prand_u32 silently substituted (uint32_t)timestampNTP_u64() when the
CSPRNG returned non-zero, and the PSK GRE nonce generator
(_librist_crypto_psk_generate_nonce) consumed that fallback as the
PBKDF2 salt. The result was a 32-bit wall-clock-truncated nonce
feeding key derivation whenever the entropy source failed -
predictable to anyone who could observe wire timing, and prone to
nonce collisions across restarts that would reuse the AES-CTR
keystream. 8f22b8e was supposed to make the crypto stack fail-closed
on entropy failure; this commit closes the prand_u32 hole that
defeated that property.

Split the API so the fallback only lives where it is genuinely
benign:

  - prand_u32() keeps the wall-clock fallback. Used for SSRC,
    flow-id, and peer-id where unpredictability is not a security
    property and the caller just needs "some non-zero u32".

  - _librist_crypto_random_u32(uint32_t *out) is the new fail-closed
    sibling. Returns the underlying mbedTLS / gnutls error on
    failure, leaves *out untouched, and is the only entry point used
    by security-critical sites.

Route the PSK nonce generator through the fail-closed call and add a
csprng_failed flag on struct rist_key: when set, both
_librist_crypto_psk_encrypt and _librist_crypto_psk_decrypt
short-circuit. The encrypt path zeroes the output so a caller that
ignores the lockout never emits ciphertext under a weak key. The
flag is cleared on the next successful passphrase install (a fresh
nonce attempt is then made).

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-03).
2026-05-18 12:36:51 -04:00
Sergio Ammirata fceba02731 fix(eap): cap attacker-supplied SRP modulus and generator lengths
process_eap_request_srp_challenge took the bignum lengths straight
from the wire and only bounded them against the remaining EAPOL
frame size (~10 KB after d71d96e). mbedtls_mpi_read_binary plus the
subsequent exp_mod / inv_mod are super-linear in input size, so a
single CHALLENGE with a ~10 KB N pinned the CPU pre-auth on the
authenticatee side without ever completing the handshake.

Add an explicit cap at EAP_MAX_MODULUS_BYTES (1024 = 8192 bits, the
largest RFC 5054 group librist supports). Anything beyond that is
neither legitimate SRP nor decodable into a usable group, and is
rejected with EAP_LENERR. eefe26e already removed the sub-1024-bit
groups; this is the corresponding upper bound.

Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-03).
2026-05-18 12:34:48 -04:00
Sergio Ammirata 985cde4c6b fix(eap): harden REQUEST_IDENTITY handler against pre-auth abuse
Two small follow-ups on process_eap_request_identity surfaced by the
audit2 review:

1. Role gate (audit2/low-04). The handler ran on any role, so an
   authenticator that received a spoofed IDENTITY REQUEST would echo
   whatever happens to be in config.username (typically empty on a
   server peer) and burn an eap_reset_data() cycle. The
   process_eap_response_identity counterpart and the SRP REQUEST
   dispatch already carry an authenticator-only / authenticatee-only
   gate; this is the same pattern applied to the IDENTITY REQUEST.

2. Rate-limit on the pre-auth reply path (audit2/medium-02).
   b2915a7 already blocks IDENTITY REQUEST once the peer is SUCCESS,
   but pre-auth the original behaviour was unchanged: one EAPOL
   frame from an attacker wiped any in-flight SRP state and echoed
   the configured SRP username back to the spoof source. The
   username is operationally sensitive (often the production SRP
   account name) and the wipe is a free DoS on the in-flight
   handshake. Cap to one reply per EAP_IDENTITY_REPLY_INTERVAL
   (200 ms). Legitimate authenticator retransmits run at
   EAP_AUTH_TIMEOUT (500 ms) so the real handshake is unaffected;
   a flooded peer leaks the username at most 5x/s rather than at
   line rate.

Found by Thomas Guillem during the post-v0.2.15 re-audit.
2026-05-18 12:34:12 -04:00
Sergio Ammirata bdbe4fb123 fix(eap): gate FAILURE on in-flight identifier; recover from soft FAILED state
EAP_CODE_FAILURE was honored regardless of which identifier it
carried, so four spoofed FAILURE packets to a peer's listening UDP
port were enough to bump tries past EAP_AUTH_RETRY_MAX, set the state
to EAP_AUTH_STATE_FAILED, and permanently silence the peer on
EAP-SRP until the process restarted. d71d96e bounded the reset storm
but left the terminal state reachable from any unauthenticated
source.

Two changes (audit2/high-02):

1. Identifier gating on the FAILURE handler. A legitimate FAILURE
   always carries the identifier of the in-flight exchange. To make
   that property usable on both sides we also set last_identifier
   from the incoming REQUEST in process_eap_request — previously the
   field was maintained only on outgoing REQUESTs and the
   authenticatee side had no notion of "current exchange identifier".
   An off-path attacker now has at best a 1/256 chance per spoof of
   landing a matching identifier; mismatches are dropped silently
   with no state mutation and no tries++.

2. Recovery from soft-FAILED. The eap_periodic path now watches for
   EAP_AUTH_STATE_FAILED entered via the increment path (tries past
   the retry max but below the permanent sentinel set by SRP-failure
   paths) and resets to UNAUTH after EAP_AUTH_FAILED_RECOVERY ms of
   quiet. A new failed_state_timestamp field carries the recovery
   clock; it is cleared on every SUCCESS / LOGOFF transition.

Together these reduce the single-direction DoS from "trivial with
four packets, permanent until restart" to "needs to guess the
identifier and sustain the spoof faster than 30 s of quiet". On-path
attackers who can observe the identifier still win the spoof, but
recovery means they cannot make the silence permanent without
continuous traffic.

Found by Thomas Guillem during the post-v0.2.15 re-audit.
2026-05-18 12:33:02 -04:00
Sergio Ammirata 2f1cd95368 fix(eap): refuse off-path LOGOFF on authenticated peers; saturate tries counter
Two issues in eap_process_eapol's EAPOL_TYPE_LOGOFF case that together
let an off-path attacker tear down or destabilize an established
EAP-SRP session, surfaced by the audit2 follow-up (high-04).

1. A spoofed LOGOFF would unconditionally drop an authenticated peer
   to UNAUTH state. EAPOL frames carry no origin authentication, so a
   single packet to the listening UDP port was enough to deauth a
   peer. Mirror the b2915a7 guard already applied to REQUEST_IDENTITY:
   once the peer is SUCCESS or REAUTH, refuse the LOGOFF as
   EAP_UNEXPECTEDREQUEST. Re-auth is driven by the timers in
   eap_periodic, never by an attacker-supplied LOGOFF.

2. On the legitimate (pre-auth) LOGOFF path, the retry counter was
   left untouched. Combined with the 255-sentinel that several
   SRP-failure paths set on ctx->tries, a LOGOFF could move the peer
   out of FAILED while tries was still primed at 255, and the next
   spoofed FAILURE would wrap (uint8_t 255 + 1 = 0), bypassing the
   FAILED-state gate at process_eap_pkt and putting the attacker
   back in business. Reset tries=0 on the legitimate LOGOFF, and
   widen tries from uint8_t to unsigned int with a saturating
   eap_tries_inc() helper so the wrap primitive is closed for good.

The 255 sentinel is replaced by EAP_AUTH_TRIES_PERMANENT = UINT_MAX,
which is a fixed point under the saturating increment.

Found by Thomas Guillem during the post-v0.2.15 re-audit.
2026-05-18 12:32:16 -04:00
Sergio Ammirata f9176ed0af Merge branch 'tests/ci-modernization' into 'master'
ci+build: JUnit on Linux, link threads dep for ristsrppasswd

Closes #210

See merge request rist/librist!318
2026-05-17 22:34:34 +00:00
Sergio Ammirata 4e294b56bc NEWS: document the Windows CSPRNG fix and CI improvements
Bug fix: BCryptGenRandom-backed seeding unblocks encryption on
Windows runtimes where the legacy CryptoAPI is missing (wine,
sandboxed containers). Test/CI: SRP and AES suites now run on
Windows under wine, and per-test results are visible directly in the
GitLab MR UI via JUnit.
2026-05-17 18:30:13 -04:00
Sergio Ammirata 526b13585c ci: publish JUnit test reports from test-ubuntu
Meson writes testlog.junit.xml alongside testlog.txt on every test
run. Point GitLab at it so reviewers see per-test pass/fail directly
in the MR's "Tests" tab instead of fishing through job logs. The
text log is still uploaded as a fallback.
2026-05-17 18:30:13 -04:00
Sergio Ammirata 2a1652cd72 ci: build the Windows test binaries with mbedTLS
The Windows cross build was using -Duse_mbedtls=false, which silently
disabled the SRP and AES test suites under wine. Build with the same
crypto configuration as the released binaries so test-win64 actually
exercises those code paths. Coverage goes from 19/27 to 27/27 tests.
2026-05-17 18:30:13 -04:00
Sergio Ammirata 3bacd5f180 build: link the threads dependency into ristsrppasswd
ristsrppasswd compiles in contrib/pthread-shim.c, which calls
pthread_cond_timedwait under have_mingw_pthreads. The other tools
inherit the threads dependency transitively via prometheus and
microhttpd; ristsrppasswd does not, and so fails to link on Windows
with `undefined reference to pthread_cond_timedwait` as soon as
mbedTLS is enabled in the cross build.

Add threads to its dependency list. The variable resolves to [] on
platforms that don't need it, so the change is a no-op elsewhere.
2026-05-17 18:30:13 -04:00
Sergio Ammirata b05ff88388 fix(crypto): seed CSPRNG via BCryptGenRandom on Windows
mbedTLS auto-installs an entropy source on Windows that uses the
legacy CryptoAPI (CryptAcquireContext + CryptGenRandom). It silently
depends on the RSA crypto provider being installed and reachable,
which is not the case in wine, in several common container images,
or on some hardened Windows deployments. On those systems
mbedtls_ctr_drbg_seed fails, and after 0.2.15's CSPRNG hardening
librist refuses to run any encryption code path at all.

Replace the source with BCryptGenRandom (CNG): same cryptographic
strength, present on every supported Windows release (Vista+), and
reliably implemented by wine. Add bcrypt to the Windows link line.
No behaviour change on Linux or macOS.

Fixes #210.
2026-05-17 18:30:13 -04:00
Sergio Ammirata e2b2581729 Merge branch 'tests/regression-coverage' into 'master'
tests: regression test for #208 + run Windows meson suite under wine

See merge request rist/librist!317
2026-05-17 16:44:44 +00:00
Sergio Ammirata 650212df73 NEWS: document the new Windows test coverage
The 0.2.16 pre-release section now lists what the new test infra
catches: every protocol test that gates Linux releases also gates
Windows releases, so a #208-class winsock-runtime regression cannot
ship without being seen first.
2026-05-17 12:40:34 -04:00
Sergio Ammirata 9a274f4a53 ci: run the full Windows test suite under wine
CI infrastructure improvement. Until now build-win64 only verified
that the code cross-compiles; the test suite ran on Linux only, so
18 of the 19 protocol tests had zero Windows signal. #208 (the
0.2.15 winsock-init bug) made it into a release because of this gap.

Splits build-win64 into a pure build job plus a new test-win64 job
that runs the same `meson test` suite against the cross-compiled
.exe artifacts, transparently launched under wine via the meson
cross-file's exe_wrapper. The wine runtime was already provisioned
in the CI image; only the test invocation was missing.

Does not replace a native-Windows runner (wine does not reproduce
every Windows kernel behaviour - see #209's ICMP-unreach interaction)
but it would have caught #208 on the first MR run, which is the
class of bug we want to stop shipping.
2026-05-17 12:40:29 -04:00
Sergio Ammirata d4cfb379d6 test: regression test for rist_logging_set remote-address path
Test coverage gap. Catches #208 (winsock not initialised before
rist_logging_set on Windows 11) and any future regression in the
public logging API's startup ordering.

The bug shipped in 0.2.15 because no test exercised the
rist_logging_set(.., address, ..) path before any rist_*_create call.
On POSIX this test is a baseline smoke check; under wine (see the
companion .gitlab-ci.yml change) it fails with WSANOTINITIALISED if
winsock init regresses.

Standalone executable (no cmocka dependency) so it runs on every
build that has -Dtest=true.
2026-05-17 12:22:23 -04:00
Sergio Ammirata 3c502814e7 Merge branch 'release-0.2.16' into 'master'
release-0.2.16: Windows fixes (#208, #209) and 0.2.15 follow-ups

See merge request rist/librist!316
2026-05-17 16:07:42 +00:00
Sergio Ammirata 546a22b5d0 NEWS: open the 0.2.16 maintenance release section
Document the Windows fixes (#208, #209) and the 0.2.15 follow-ups
that have accumulated so far. 0.2.16 is binary-compatible with 0.2.15;
downstream consumers will not need to recompile. More MRs are expected
before the release goes out.
2026-05-17 12:05:13 -04:00
Sergio Ammirata d3263ffc0c fix(udpsocket): Windows EAP authentication fails when bonding peers
Bug fix - Windows only, 0.2.15 regression. Reported and confirmed
fixed by Roman Dissertori (@moo) in #209.

Setting up multiple peers on a Windows receiver with EAP-SRP
authentication (Linux ristsender -> Windows ristreceiver) fails with
"Failed to process EAPOL pkt, return code: -1". Ethernet peers fail
consistently; wifi peers sometimes succeed depending on timing.
Worked in 0.2.14, broken in 0.2.15.

Root cause: two 0.2.15 changes interact badly on Windows.

  1. Windows enqueues a synthetic WSAECONNRESET indication on any UDP
     socket whose outbound packets provoke an ICMP port-unreachable
     reply. The 0.2.15 hardening added a recvfrom(drain[1], ...) call
     to clear that indication, but on Windows that recvfrom either
     consumes a real datagram outright or returns WSAEMSGSIZE and
     silently drops one. librist then handed a truncated buffer to
     its EAPOL handler.

  2. Also in 0.2.15, eap_process_eapol started rejecting anything
     shorter than a complete EAPOL frame. A truncated buffer that
     0.2.14 would have tolerated now fails outright.

Together they break the authenticator on Windows whenever ICMP
unreachables happen to align with the EAP handshake.

Fix: disable the synthetic WSAECONNRESET indication on every UDP
socket via WSAIoctl(SIO_UDP_CONNRESET, FALSE), and drop the unsafe
drain. ICMP unreachables on a connectionless socket are advisory
only, and librist detects peer loss via RTCP keepalive timeouts.
2026-05-17 12:05:13 -04:00
Sergio Ammirata f61c1560e2 diag(udpsocket): show actual Windows error code on socket failures
Diagnostics only - no behaviour change.

On Windows, errno carries POSIX values that do not match what the
socket layer actually reported, so strerror(errno) in our existing
error logs is misleading. End-user bug reports from Windows users
were missing the WSA error code needed to identify the real cause.

Add WSAGetLastError() to the error log lines in
udpsocket_open_connect, udpsocket_open_bind and
udpsocket_set_nonblocking. Mirrors the existing handling in
udpsocket_open().
2026-05-17 12:05:13 -04:00
Sergio Ammirata 09cead14b3 fix(udpsocket): Windows tools fail to start when -r remote log is used
Bug fix - Windows only. Reported by Roman Dissertori (@moo) in #208.

Running `ristreceiver -r 127.0.0.1:port ...` on Windows 11 fails
immediately with "Failed to open logsocket / Failed to setup logging!"
and the tool exits before doing any work. Same failure path for any
caller that uses rist_logging_set() before rist_sender_create() /
rist_receiver_create().

Root cause: WSAStartup ran inside init_common_ctx(), which is reached
only via rist_*_create. The public logging API opens a UDP socket
before that, so socket() returns INVALID_SOCKET and the log socket is
never created.

Fix: initialise winsock from udpsocket.c itself, guarded by
InitOnceExecuteOnce / pthread_once so the first call from any thread
to udpsocket_open() or udpsocket_resolve_host() is enough. WSAStartup
is ref-counted by Windows so the existing init_common_ctx() call
stays in place for out-of-tree consumers that may call rist-common
directly. POSIX path is a no-op.
2026-05-17 12:05:13 -04:00
Sergio Ammirata 8f22b8e5b5 fix(crypto/random): silent crypto downgrade when entropy source missing
Security hardening. Not externally observed; the failure mode requires
a deployment with no kernel entropy source (sandboxed container,
embedded image without /dev/urandom, broken mbedTLS build).

What was wrong: if the CSPRNG seeding call failed at startup, the
return value was silently dropped. Every subsequent crypto-random
request would then return zeros / failure codes, and the
wall-clock-derived fallback inside prand_u32() would be used for the
entire process lifetime - including for SRP nonces and AES-CTR IVs.
The 0.2.15 PRNG hardening was therefore quietly absent in exactly the
environments where it matters most.

Fix: capture the seed return code in a static, log a loud error if
seeding failed, and short-circuit subsequent crypto-random calls so
the failure is visible at every caller instead of papered over.

The Nettle path already retries gnutls_rnd up to 10 times; added a
single error log on the final failure for parity.

No behaviour change when the CSPRNG is healthy.
2026-05-17 12:05:13 -04:00
Sergio Ammirata 5222b3c325 fix(eap): authentication retry exhaustion never dropped failed peer
Bug fix. Latent since 0.2.14 - not externally reported but exploitable:
an EAP-SRP authentication that fails every retry was supposed to mark
the peer as permanently failed and drop it; instead the peer was kept
around in a half-authenticated state and kept consuming retry attempts.

Root cause: rist-common's EAPOL handler checked `if (eapret == 255)` to
detect the permanent-failure path, but eap_process_eapol returns
*negative* error codes (-255 in the old numbering). The comparison
could never match. Fixed by introducing named error constants
(EAP_INTERNALERR, EAP_AUTH_FAILED, EAP_AUTH_TERMINATED) and comparing
against EAP_AUTH_TERMINATED instead.

Also caught while adding the names: EAP_SRP_WRONGSUBTYPE and
EAP_UNEXPECTEDREQUEST were both defined as -4. No current caller
discriminates on the value so the collision had no observed effect,
but it was a trap for any future per-error handling. Moved
EAP_SRP_WRONGSUBTYPE to -5.

No wire-format change. Error codes are internal to librist.
2026-05-17 11:57:21 -04:00
Sergio Ammirata e7c1fd3b0f docs(eap): clarify why IDENTITY-response handler rejects authenticatee
Documentation only - no behaviour change, no bug fix.

The original guard comment was labelled "Defensive:" which made it look
like a paranoia check that could be removed. It is not: on the
authenticatee side the lookup callback is NULL by construction, so the
handler MUST refuse the IDENTITY response or it will dereference NULL.
Reword the comment so a future reader does not mistakenly delete the
check.
2026-05-17 11:57:21 -04:00
Sergio Ammirata 561c2536e6 Merge branch 'hotfix/srp-nettle-macro' into 'master'
hotfix: SRP Nettle backend build failure (BIGNUM_WRITE_BYTES_OR_GOTO)

See merge request rist/librist!315
v0.2.15
2026-05-15 04:48:53 +00:00
Sergio Ammirata 443f6d71e4 fix(srp): drop invalid (void)(lbl) cast in Nettle BIGNUM_WRITE_BYTES_OR_GOTO
The macro tried to silence an "unused argument" warning by casting lbl
to void, but lbl is a goto label, not an expression. GCC and clang both
reject this with "undeclared identifier".

The Nettle backend builds (Linux ARM, Windows ARM/LLVM) all fail at
src/crypto/srp.c:230 calling BIGNUM_WRITE_BYTES_OR_GOTO. Drop the
useless cast; macro arguments don't need referencing to suppress
unused-parameter warnings.
2026-05-15 00:35:07 -04:00
Sergio Ammirata d59491fc2f Merge branch 'release-0.2.15' into 'master'
Release 0.2.15: security audit fixes, Windows build, reported bug fixes

See merge request rist/librist!314
2026-05-14 22:37:31 +00:00
Sergio Ammirata e137d0a4a1 release: prepare 0.2.15
Security release on top of 0.2.14. ABI bumps to 4.6.0 / 4:3:0,
binary-compatible with 0.2.14 (no symbol additions or removals,
soversion stays at 4).

Combines:

  * a full receive-side, EAP/SRP and PSK/AES-CTR audit
  * a follow-up audit shared by VideoLAN that closes the residual
    gaps (EAP role enforcement, SRP cleanup paths, NPD scratch
    sizing, RTP header guard for the FULL/EAPOL paths, hardened TUN
    helpers, SDES length clamp, qsort/median fixes)
  * the Windows / MinGW build fixes (clock_gettime probe, the
    PTHREAD_START_FUNC return type, WSAECONNRESET drain, strtok_r
    shim header)
  * three reported bugs picked up while the branch was open: invalid
    JSON for multi-peer sender stats (#206) plus the prometheus
    parser landing for the sender side, the UDP log socket
    condition (!313) and the infinite socket-error loop on hostname
    resolution failure (!312).

NEWS lists the per-fix detail.
2026-05-14 18:29:13 -04:00
Sergio Ammirata bfcac70ddc tools/udp2udp: warn that --metrics has no rist_* series to export
udp2udp uses librist for URL parsing and the Prometheus httpd shell,
not for actual data flow, so the exporter context comes up but no
sender/receiver callbacks ever feed it. People enabling --metrics on
udp2udp expecting RIST stats end up scratching their heads at an
empty scrape endpoint.

Print a one-shot warning right after rist_setup_prometheus_stats so
the limitation is obvious from the logs.
2026-05-14 18:28:22 -04:00
Sergio Ammirata 1f5299f2fc tools/ristsender: drop the duplicate sender-stats callback registration
setup_rist_peer was calling rist_sender_stats_callback_set twice on the
same ctx: once with (void*)w->id as the callback arg, then a second
time a few lines down with NULL. The second call wins (the library
just overwrites sender_stats_callback_argument), so the JSON callback
was running with arg = NULL and the Prometheus parser couldn't tell
which sender instance the JSON came from when more than one was
running in the same process.

Drop the second registration; the first one is correct.
2026-05-14 18:27:59 -04:00
Sergio Ammirata 3ea6c49ca9 feat(prometheus-exporter): parse sender-side stats JSON
ristsender uses rist_sender_stats_callback_set, which only delivers
the JSON blob to the callback. The Prometheus exporter's parser for
that path was a (void)... // TODO stub, so sender-side metrics from
ristsender simply never showed up.

Walk the "sender-stats" -> "peers" array introduced in 0.2.15
(86e2a16), pull each peer's id, cname and stats numbers, and feed
them through the existing rist_prometheus_handle_sender_peer_stats
helper so the gauge/counter shape stays exactly what already shipped
on the receiver-callback path.

While here, factor the stale-peer cleanup block out of
rist_prometheus_parse_stats into rist_prometheus_cleanup_stale_locked
so both parsers run it. Without that, peers added with
from_callback=true on the JSON path would never expire.

JSON rtt is already in milliseconds (last_rtt / RIST_CLOCK in
src/stats.c) and the handler converts ms -> seconds, so the parser
feeds the value straight through with no extra scaling.

Closes the rist_prometheus_parse_sender_stats half of #206.
Builds on the parser shape from manueldev's mr/206 work, reworked
against the 0.2.15 JSON schema.

Co-authored-by: Manuel <malejandrodev@gmail.com>
2026-05-14 18:27:46 -04:00
Sergio Ammirata b07fc5f13e fix(build): always compile random.c, no-op when no CSPRNG backend
prand_u32() now routes through _librist_crypto_ramdom_get_bytes()
unconditionally, but random.c was only added to the build when
have_srp was true (mbedTLS or nettle). The mingw-w64 -Duse_mbedtls=false
job legitimately has neither, so the librist.dll link fails with an
undefined reference.

Move random.c out of the have_srp block so the symbol always exists,
and guard the body so it returns -1 when no backend is compiled in.
prand_u32 already falls back to the wall clock on that path.

Reported by upstream CI on !314.
2026-05-14 17:42:18 -04:00