When a send fails with EMSGSIZE, log the packet size and suggest
reducing the application payload. Rate-limited to one message per
peer per 5 seconds. Both simple-profile and main-profile send paths
use the same helper.
Set DF so the kernel returns EMSGSIZE instead of silently
IP-fragmenting. Covers Linux, Windows, and BSD for v4/v6.
Best-effort: platforms without the option keep legacy behaviour.
The RTP seq/timestamp extraction in input_udp_recv was disabled since
be32351 (2020) with a "TODO: Figure out why this does not work" comment.
The root cause was a buffer offset bug: recvfrom writes the UDP payload
at recv_buf + ipheader_bytes, but the extraction code read recv_buf[2..7]
— the reserved IP-header prefix area — returning garbage.
Add a "Post-0.2.15 audit follow-ups" subsection under the existing
0.2.16 hardening notes. Groups the changes by component (receiver
flow accounting, EAP authenticator, PSK, SRP, stats JSON, build
portability) so a downstream reader can scan to the bit they care
about.
a66b7bf fixed the !USE_SHA_RET branch so it would compile against
mbedTLS versions prior to 2.7 (March 2018), where the SHA-256 calls
returned void and could not signal an internal allocation failure
back to the caller. The branch is otherwise dead: the bundled
mbedTLS in contrib/ is 2.28.x, every distro of interest is well
past 2.7, and no CI job builds against anything older. A latent
bug regressing into that branch would never be caught.
Drop the branch, emit an explicit #error if anyone builds against
mbedTLS < 2.7 with directions to either upgrade their system
mbedTLS or use the bundled tree. Surface area down, untestable
dead code gone.
Nettle has always returned int and is unaffected.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/info-02).
86e2a16 changed the sender-stats JSON shape from duplicate "peer"
keys (technically valid, practically broken) to a single "peers"
array. The change is correct, but it was made without any way for
downstream consumers to know which shape they're parsing. A
monitoring script that did .["sender-stats"].peer.id silently
started getting null after the upgrade. The prometheus exporter
shipped inside the tools tree was rewritten for the new shape
(3ea6c49), so a librist sender on the new tree paired with an
external exporter pinned to the old shape sees no peers at all,
again silently.
Add a top-level "schema_version" integer to the three places we
emit stats JSON (sender flow stats, sender per-peer wrapper,
receiver flow stats). Set it to 2 - version 1 was the duplicate-key
shape, version 2 is the array shape we ship now. Receiver stats has
always been array-shaped, but tagging it lets consumers branch on a
single field regardless of which side they're parsing.
Any future incompatible shape change must bump this number and call
it out in NEWS.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-04).
eefe26e dropped the sub-1024-bit RFC 5054 groups (correct call -
they're below current minimum-strength thresholds) but did so by
deleting the enum entries outright, which shifted every remaining
group's integer value down by two. librist_get_ng_constants is
RIST_API-exported and external callers that hardcoded the integer
2 to mean NG_1024 silently started getting NG_4096 instead. Same
idea, different group, equally silent and equally wrong.
Restore the original integer values by reserving slots 0 and 1 with
explicit LIBRIST_SRP_NG_RESERVED_0 / RESERVED_1 names, and make
librist_get_ng_constants reject them with -1 (the existing
out-of-range error). Code using the symbolic enum names is
unaffected; code using literal integers gets either the right group
or a clean failure.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-05).
strncpy does not guarantee NUL termination when the source is at
least n bytes long. The defensive cname[0] = '\0' before the copy
in rist_stats_sender_peer_stats did nothing - strncpy overwrites
it. Today the bound is safe in practice (the SDES bounds fix in
c308e10 limits peer->receiver_name to 127 bytes + NUL within its
128-byte field, so the implicit NUL is copied) but the contract is
fragile and is the same pattern the udpsocket fix in 38cb58f
already excised elsewhere.
Use a copy length one byte shy of the destination and write the
terminator unconditionally.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-03).
rist_receiver_associate_flow counted existing flows under
flows_lock, released the lock, called create_flow, and create_flow
re-took the lock to insert. Two concurrent callers could each see
count == RIST_MAX_FLOWS - 1, each release the lock, and each insert
- pushing the actual flow count past the cap by (racers - 1). The
practical impact was small because the cap still bounds memory
growth to O(N + racers), but the documented invariant "at most
RIST_MAX_FLOWS flows" did not hold under concurrency.
Move the authoritative count + cap check into create_flow under
the same flows_lock acquisition that does the append. The early
count in associate_flow stays as an allocation-avoidance
optimization (skip the calloc + pthread_init dance when the cap is
already definitively reached), but is no longer relied on for
correctness. If the early count is stale and create_flow trips the
cap, we tear down the freshly-allocated flow cleanly and return
NULL - the caller's existing NULL handling kicks in.
Move the RIST_MAX_FLOWS / RIST_MAX_PEERS_PER_FLOW defines above
create_flow so the cap check sees them.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-02).
The session-path AES-CTR dispatch carried the same default-falls-
through-to-AES-128 shape that audit/high-13 spotted on the
standalone _librist_crypto_aes_ctr. 9e3703e fixed the standalone
function but missed the session sibling. Today it is not reachable
(key_size always comes from SRP and is 128/192/256) but the pattern
is a one-typo regression away - a future caller that passes
key_size = 0 (e.g. a use-before-init) would silently run AES-128
over an uninitialized key schedule.
Replicate the fixed shape: explicit cases for 128 / 192 / 256, and
return on default rather than encrypting under whatever key happens
to be in key->nettle_ctx.u.ctx128.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/low-01).
_librist_crypto_psk_decrypt ran a full PBKDF2 + AES re-keying on
every attacker-chosen GRE nonce, even after the key had been locked
out for excessive decryption failures. 1c1ae12 correctly closed the
audit's lockout-reset path by preserving the bad_decryption flag
across nonce rolls, but the PBKDF2 itself still ran on every packet
with a new nonce. Each iteration costs a few milliseconds of CPU
against an attacker that needs to flip four bytes of GRE header per
packet, so a modest packet rate could pin the receiver CPU.
Move the lockout test before the rekey: while bad_decryption is
set, attacker-supplied nonces are ignored without doing any work.
The flag is still only cleared by passphrase rotation or by a
successful decrypt + RTP validation upstream, so the high-15 fix
property is preserved.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-01).
flow->peer_lst is reallocated under f->mutex on the add path
(rist_receiver_associate_flow), but several other writers and
readers were not taking the same lock:
- remove_peer_from_flow took no lock at all on the shrink path.
- send_nack_group took peerlist_lock (not f->mutex) around the
peer_lst walk, so a concurrent realloc-move from the protocol
thread left it walking a freed array. Reproduced under TSan and
under ASan with audit2/high-01/poc-h1-race-harness.c.
- The session-timeout deletion path (rist_receiver_pthread_protocol)
walked peer_lst under peerlist_lock only after dropping f->mutex,
same class of race.
Take f->mutex everywhere peer_lst is read or written. Lock order is
peerlist_lock outside, f->mutex inside, matching the existing
convention elsewhere in the file. Both shrink call-sites
(remove_peer_from_flow and the inline writer in rist_peer_remove)
now also check the realloc return: a shrink that returns NULL
legitimately means "could not move" and we keep the oversized
allocation rather than overwriting peer_lst with NULL while
peer_lst_len is still positive.
Add RIST_MAX_PEERS_PER_FLOW = 256, checked under f->mutex in
rist_receiver_associate_flow. peer_lst grew unbounded before this;
256 is generous for any legitimate multipath deployment and bounds
the worst case for both attack and misconfiguration.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-01).
The b05ff88 Windows CSPRNG fix reached into mbedtls_entropy_context's
internal source_count and source[] fields to drop the default
CryptGenRandom source and substitute BCryptGenRandom. Those fields
are not part of the mbedTLS public API. mbedTLS 3.x has started
hiding internal struct members behind MBEDTLS_PRIVATE() and reserves
the right to reorder the layout, so the existing approach will fail
to compile (or, worse, compile but read wrong offsets) on any distro
that builds librist against system mbedTLS 3.x.
Rework the Windows path to bypass the entropy context entirely: feed
BCryptGenRandom directly into mbedtls_ctr_drbg_seed via the f_rng
interface, which is part of the stable public API and does not care
how the random bytes are produced. The entropy context is now
declared and initialised only on non-Windows; the seed-failure
plumbing (ctr_drbg_seed_ret, the fail-closed property in
_librist_crypto_ramdom_get_bytes) is unchanged.
Bundled contrib mbedtls 2.28.10 is unaffected, but this removes a
build hazard for downstream packagers building against system
mbedtls 3.x.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/info-03).
_librist_srp_nettle_wrap_random is the random-bytes callback nettle
invokes for the SRP private exponents (a on the authenticatee side,
b on the authenticator side) and for the verifier salt. Its
signature is void, so when _librist_crypto_ramdom_get_bytes failed
the wrapper used to discard the error and leave the destination
buffer at whatever was on the stack. mpz_import then read those
stack bytes as the SRP private exponent. On a build using the
nettle backend during a sustained entropy outage that meant the
SRP session key could be derived from uninitialized memory.
8f22b8e made the mbedTLS path fail-closed because the mbedTLS
wrapper returns int and propagates the error naturally. This commit
applies the equivalent property to the nettle path: thread the
caller's `int ret` through the otherwise-unused void* context arg
of nettle_mpz_random / nettle_mpz_random_size, and have the wrapper
write -1 plus zero the buffer on CSPRNG failure. The existing
`if (ret != 0)` checks at the two BIGNUM_RANDOM call sites and at
the salt-generation site then abort the SRP operation cleanly.
The buffer-zero on failure is defense in depth: even if a caller
were added that ignored ret, the bignum would be zero rather than
attacker-influenced stack, and SRP's mod-N checks already reject
0 as A or B.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-05).
prand_u32 silently substituted (uint32_t)timestampNTP_u64() when the
CSPRNG returned non-zero, and the PSK GRE nonce generator
(_librist_crypto_psk_generate_nonce) consumed that fallback as the
PBKDF2 salt. The result was a 32-bit wall-clock-truncated nonce
feeding key derivation whenever the entropy source failed -
predictable to anyone who could observe wire timing, and prone to
nonce collisions across restarts that would reuse the AES-CTR
keystream. 8f22b8e was supposed to make the crypto stack fail-closed
on entropy failure; this commit closes the prand_u32 hole that
defeated that property.
Split the API so the fallback only lives where it is genuinely
benign:
- prand_u32() keeps the wall-clock fallback. Used for SSRC,
flow-id, and peer-id where unpredictability is not a security
property and the caller just needs "some non-zero u32".
- _librist_crypto_random_u32(uint32_t *out) is the new fail-closed
sibling. Returns the underlying mbedTLS / gnutls error on
failure, leaves *out untouched, and is the only entry point used
by security-critical sites.
Route the PSK nonce generator through the fail-closed call and add a
csprng_failed flag on struct rist_key: when set, both
_librist_crypto_psk_encrypt and _librist_crypto_psk_decrypt
short-circuit. The encrypt path zeroes the output so a caller that
ignores the lockout never emits ciphertext under a weak key. The
flag is cleared on the next successful passphrase install (a fresh
nonce attempt is then made).
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/high-03).
process_eap_request_srp_challenge took the bignum lengths straight
from the wire and only bounded them against the remaining EAPOL
frame size (~10 KB after d71d96e). mbedtls_mpi_read_binary plus the
subsequent exp_mod / inv_mod are super-linear in input size, so a
single CHALLENGE with a ~10 KB N pinned the CPU pre-auth on the
authenticatee side without ever completing the handshake.
Add an explicit cap at EAP_MAX_MODULUS_BYTES (1024 = 8192 bits, the
largest RFC 5054 group librist supports). Anything beyond that is
neither legitimate SRP nor decodable into a usable group, and is
rejected with EAP_LENERR. eefe26e already removed the sub-1024-bit
groups; this is the corresponding upper bound.
Found by Thomas Guillem during the post-v0.2.15 re-audit
(audit2/medium-03).
Two small follow-ups on process_eap_request_identity surfaced by the
audit2 review:
1. Role gate (audit2/low-04). The handler ran on any role, so an
authenticator that received a spoofed IDENTITY REQUEST would echo
whatever happens to be in config.username (typically empty on a
server peer) and burn an eap_reset_data() cycle. The
process_eap_response_identity counterpart and the SRP REQUEST
dispatch already carry an authenticator-only / authenticatee-only
gate; this is the same pattern applied to the IDENTITY REQUEST.
2. Rate-limit on the pre-auth reply path (audit2/medium-02).
b2915a7 already blocks IDENTITY REQUEST once the peer is SUCCESS,
but pre-auth the original behaviour was unchanged: one EAPOL
frame from an attacker wiped any in-flight SRP state and echoed
the configured SRP username back to the spoof source. The
username is operationally sensitive (often the production SRP
account name) and the wipe is a free DoS on the in-flight
handshake. Cap to one reply per EAP_IDENTITY_REPLY_INTERVAL
(200 ms). Legitimate authenticator retransmits run at
EAP_AUTH_TIMEOUT (500 ms) so the real handshake is unaffected;
a flooded peer leaks the username at most 5x/s rather than at
line rate.
Found by Thomas Guillem during the post-v0.2.15 re-audit.
EAP_CODE_FAILURE was honored regardless of which identifier it
carried, so four spoofed FAILURE packets to a peer's listening UDP
port were enough to bump tries past EAP_AUTH_RETRY_MAX, set the state
to EAP_AUTH_STATE_FAILED, and permanently silence the peer on
EAP-SRP until the process restarted. d71d96e bounded the reset storm
but left the terminal state reachable from any unauthenticated
source.
Two changes (audit2/high-02):
1. Identifier gating on the FAILURE handler. A legitimate FAILURE
always carries the identifier of the in-flight exchange. To make
that property usable on both sides we also set last_identifier
from the incoming REQUEST in process_eap_request — previously the
field was maintained only on outgoing REQUESTs and the
authenticatee side had no notion of "current exchange identifier".
An off-path attacker now has at best a 1/256 chance per spoof of
landing a matching identifier; mismatches are dropped silently
with no state mutation and no tries++.
2. Recovery from soft-FAILED. The eap_periodic path now watches for
EAP_AUTH_STATE_FAILED entered via the increment path (tries past
the retry max but below the permanent sentinel set by SRP-failure
paths) and resets to UNAUTH after EAP_AUTH_FAILED_RECOVERY ms of
quiet. A new failed_state_timestamp field carries the recovery
clock; it is cleared on every SUCCESS / LOGOFF transition.
Together these reduce the single-direction DoS from "trivial with
four packets, permanent until restart" to "needs to guess the
identifier and sustain the spoof faster than 30 s of quiet". On-path
attackers who can observe the identifier still win the spoof, but
recovery means they cannot make the silence permanent without
continuous traffic.
Found by Thomas Guillem during the post-v0.2.15 re-audit.
Two issues in eap_process_eapol's EAPOL_TYPE_LOGOFF case that together
let an off-path attacker tear down or destabilize an established
EAP-SRP session, surfaced by the audit2 follow-up (high-04).
1. A spoofed LOGOFF would unconditionally drop an authenticated peer
to UNAUTH state. EAPOL frames carry no origin authentication, so a
single packet to the listening UDP port was enough to deauth a
peer. Mirror the b2915a7 guard already applied to REQUEST_IDENTITY:
once the peer is SUCCESS or REAUTH, refuse the LOGOFF as
EAP_UNEXPECTEDREQUEST. Re-auth is driven by the timers in
eap_periodic, never by an attacker-supplied LOGOFF.
2. On the legitimate (pre-auth) LOGOFF path, the retry counter was
left untouched. Combined with the 255-sentinel that several
SRP-failure paths set on ctx->tries, a LOGOFF could move the peer
out of FAILED while tries was still primed at 255, and the next
spoofed FAILURE would wrap (uint8_t 255 + 1 = 0), bypassing the
FAILED-state gate at process_eap_pkt and putting the attacker
back in business. Reset tries=0 on the legitimate LOGOFF, and
widen tries from uint8_t to unsigned int with a saturating
eap_tries_inc() helper so the wrap primitive is closed for good.
The 255 sentinel is replaced by EAP_AUTH_TRIES_PERMANENT = UINT_MAX,
which is a fixed point under the saturating increment.
Found by Thomas Guillem during the post-v0.2.15 re-audit.
Bug fix: BCryptGenRandom-backed seeding unblocks encryption on
Windows runtimes where the legacy CryptoAPI is missing (wine,
sandboxed containers). Test/CI: SRP and AES suites now run on
Windows under wine, and per-test results are visible directly in the
GitLab MR UI via JUnit.
Meson writes testlog.junit.xml alongside testlog.txt on every test
run. Point GitLab at it so reviewers see per-test pass/fail directly
in the MR's "Tests" tab instead of fishing through job logs. The
text log is still uploaded as a fallback.
The Windows cross build was using -Duse_mbedtls=false, which silently
disabled the SRP and AES test suites under wine. Build with the same
crypto configuration as the released binaries so test-win64 actually
exercises those code paths. Coverage goes from 19/27 to 27/27 tests.
ristsrppasswd compiles in contrib/pthread-shim.c, which calls
pthread_cond_timedwait under have_mingw_pthreads. The other tools
inherit the threads dependency transitively via prometheus and
microhttpd; ristsrppasswd does not, and so fails to link on Windows
with `undefined reference to pthread_cond_timedwait` as soon as
mbedTLS is enabled in the cross build.
Add threads to its dependency list. The variable resolves to [] on
platforms that don't need it, so the change is a no-op elsewhere.
mbedTLS auto-installs an entropy source on Windows that uses the
legacy CryptoAPI (CryptAcquireContext + CryptGenRandom). It silently
depends on the RSA crypto provider being installed and reachable,
which is not the case in wine, in several common container images,
or on some hardened Windows deployments. On those systems
mbedtls_ctr_drbg_seed fails, and after 0.2.15's CSPRNG hardening
librist refuses to run any encryption code path at all.
Replace the source with BCryptGenRandom (CNG): same cryptographic
strength, present on every supported Windows release (Vista+), and
reliably implemented by wine. Add bcrypt to the Windows link line.
No behaviour change on Linux or macOS.
Fixes#210.
The 0.2.16 pre-release section now lists what the new test infra
catches: every protocol test that gates Linux releases also gates
Windows releases, so a #208-class winsock-runtime regression cannot
ship without being seen first.
CI infrastructure improvement. Until now build-win64 only verified
that the code cross-compiles; the test suite ran on Linux only, so
18 of the 19 protocol tests had zero Windows signal. #208 (the
0.2.15 winsock-init bug) made it into a release because of this gap.
Splits build-win64 into a pure build job plus a new test-win64 job
that runs the same `meson test` suite against the cross-compiled
.exe artifacts, transparently launched under wine via the meson
cross-file's exe_wrapper. The wine runtime was already provisioned
in the CI image; only the test invocation was missing.
Does not replace a native-Windows runner (wine does not reproduce
every Windows kernel behaviour - see #209's ICMP-unreach interaction)
but it would have caught #208 on the first MR run, which is the
class of bug we want to stop shipping.
Test coverage gap. Catches #208 (winsock not initialised before
rist_logging_set on Windows 11) and any future regression in the
public logging API's startup ordering.
The bug shipped in 0.2.15 because no test exercised the
rist_logging_set(.., address, ..) path before any rist_*_create call.
On POSIX this test is a baseline smoke check; under wine (see the
companion .gitlab-ci.yml change) it fails with WSANOTINITIALISED if
winsock init regresses.
Standalone executable (no cmocka dependency) so it runs on every
build that has -Dtest=true.
Document the Windows fixes (#208, #209) and the 0.2.15 follow-ups
that have accumulated so far. 0.2.16 is binary-compatible with 0.2.15;
downstream consumers will not need to recompile. More MRs are expected
before the release goes out.
Bug fix - Windows only, 0.2.15 regression. Reported and confirmed
fixed by Roman Dissertori (@moo) in #209.
Setting up multiple peers on a Windows receiver with EAP-SRP
authentication (Linux ristsender -> Windows ristreceiver) fails with
"Failed to process EAPOL pkt, return code: -1". Ethernet peers fail
consistently; wifi peers sometimes succeed depending on timing.
Worked in 0.2.14, broken in 0.2.15.
Root cause: two 0.2.15 changes interact badly on Windows.
1. Windows enqueues a synthetic WSAECONNRESET indication on any UDP
socket whose outbound packets provoke an ICMP port-unreachable
reply. The 0.2.15 hardening added a recvfrom(drain[1], ...) call
to clear that indication, but on Windows that recvfrom either
consumes a real datagram outright or returns WSAEMSGSIZE and
silently drops one. librist then handed a truncated buffer to
its EAPOL handler.
2. Also in 0.2.15, eap_process_eapol started rejecting anything
shorter than a complete EAPOL frame. A truncated buffer that
0.2.14 would have tolerated now fails outright.
Together they break the authenticator on Windows whenever ICMP
unreachables happen to align with the EAP handshake.
Fix: disable the synthetic WSAECONNRESET indication on every UDP
socket via WSAIoctl(SIO_UDP_CONNRESET, FALSE), and drop the unsafe
drain. ICMP unreachables on a connectionless socket are advisory
only, and librist detects peer loss via RTCP keepalive timeouts.
Diagnostics only - no behaviour change.
On Windows, errno carries POSIX values that do not match what the
socket layer actually reported, so strerror(errno) in our existing
error logs is misleading. End-user bug reports from Windows users
were missing the WSA error code needed to identify the real cause.
Add WSAGetLastError() to the error log lines in
udpsocket_open_connect, udpsocket_open_bind and
udpsocket_set_nonblocking. Mirrors the existing handling in
udpsocket_open().
Bug fix - Windows only. Reported by Roman Dissertori (@moo) in #208.
Running `ristreceiver -r 127.0.0.1:port ...` on Windows 11 fails
immediately with "Failed to open logsocket / Failed to setup logging!"
and the tool exits before doing any work. Same failure path for any
caller that uses rist_logging_set() before rist_sender_create() /
rist_receiver_create().
Root cause: WSAStartup ran inside init_common_ctx(), which is reached
only via rist_*_create. The public logging API opens a UDP socket
before that, so socket() returns INVALID_SOCKET and the log socket is
never created.
Fix: initialise winsock from udpsocket.c itself, guarded by
InitOnceExecuteOnce / pthread_once so the first call from any thread
to udpsocket_open() or udpsocket_resolve_host() is enough. WSAStartup
is ref-counted by Windows so the existing init_common_ctx() call
stays in place for out-of-tree consumers that may call rist-common
directly. POSIX path is a no-op.
Security hardening. Not externally observed; the failure mode requires
a deployment with no kernel entropy source (sandboxed container,
embedded image without /dev/urandom, broken mbedTLS build).
What was wrong: if the CSPRNG seeding call failed at startup, the
return value was silently dropped. Every subsequent crypto-random
request would then return zeros / failure codes, and the
wall-clock-derived fallback inside prand_u32() would be used for the
entire process lifetime - including for SRP nonces and AES-CTR IVs.
The 0.2.15 PRNG hardening was therefore quietly absent in exactly the
environments where it matters most.
Fix: capture the seed return code in a static, log a loud error if
seeding failed, and short-circuit subsequent crypto-random calls so
the failure is visible at every caller instead of papered over.
The Nettle path already retries gnutls_rnd up to 10 times; added a
single error log on the final failure for parity.
No behaviour change when the CSPRNG is healthy.
Bug fix. Latent since 0.2.14 - not externally reported but exploitable:
an EAP-SRP authentication that fails every retry was supposed to mark
the peer as permanently failed and drop it; instead the peer was kept
around in a half-authenticated state and kept consuming retry attempts.
Root cause: rist-common's EAPOL handler checked `if (eapret == 255)` to
detect the permanent-failure path, but eap_process_eapol returns
*negative* error codes (-255 in the old numbering). The comparison
could never match. Fixed by introducing named error constants
(EAP_INTERNALERR, EAP_AUTH_FAILED, EAP_AUTH_TERMINATED) and comparing
against EAP_AUTH_TERMINATED instead.
Also caught while adding the names: EAP_SRP_WRONGSUBTYPE and
EAP_UNEXPECTEDREQUEST were both defined as -4. No current caller
discriminates on the value so the collision had no observed effect,
but it was a trap for any future per-error handling. Moved
EAP_SRP_WRONGSUBTYPE to -5.
No wire-format change. Error codes are internal to librist.
Documentation only - no behaviour change, no bug fix.
The original guard comment was labelled "Defensive:" which made it look
like a paranoia check that could be removed. It is not: on the
authenticatee side the lookup callback is NULL by construction, so the
handler MUST refuse the IDENTITY response or it will dereference NULL.
Reword the comment so a future reader does not mistakenly delete the
check.
The macro tried to silence an "unused argument" warning by casting lbl
to void, but lbl is a goto label, not an expression. GCC and clang both
reject this with "undeclared identifier".
The Nettle backend builds (Linux ARM, Windows ARM/LLVM) all fail at
src/crypto/srp.c:230 calling BIGNUM_WRITE_BYTES_OR_GOTO. Drop the
useless cast; macro arguments don't need referencing to suppress
unused-parameter warnings.
Security release on top of 0.2.14. ABI bumps to 4.6.0 / 4:3:0,
binary-compatible with 0.2.14 (no symbol additions or removals,
soversion stays at 4).
Combines:
* a full receive-side, EAP/SRP and PSK/AES-CTR audit
* a follow-up audit shared by VideoLAN that closes the residual
gaps (EAP role enforcement, SRP cleanup paths, NPD scratch
sizing, RTP header guard for the FULL/EAPOL paths, hardened TUN
helpers, SDES length clamp, qsort/median fixes)
* the Windows / MinGW build fixes (clock_gettime probe, the
PTHREAD_START_FUNC return type, WSAECONNRESET drain, strtok_r
shim header)
* three reported bugs picked up while the branch was open: invalid
JSON for multi-peer sender stats (#206) plus the prometheus
parser landing for the sender side, the UDP log socket
condition (!313) and the infinite socket-error loop on hostname
resolution failure (!312).
NEWS lists the per-fix detail.
udp2udp uses librist for URL parsing and the Prometheus httpd shell,
not for actual data flow, so the exporter context comes up but no
sender/receiver callbacks ever feed it. People enabling --metrics on
udp2udp expecting RIST stats end up scratching their heads at an
empty scrape endpoint.
Print a one-shot warning right after rist_setup_prometheus_stats so
the limitation is obvious from the logs.
setup_rist_peer was calling rist_sender_stats_callback_set twice on the
same ctx: once with (void*)w->id as the callback arg, then a second
time a few lines down with NULL. The second call wins (the library
just overwrites sender_stats_callback_argument), so the JSON callback
was running with arg = NULL and the Prometheus parser couldn't tell
which sender instance the JSON came from when more than one was
running in the same process.
Drop the second registration; the first one is correct.
ristsender uses rist_sender_stats_callback_set, which only delivers
the JSON blob to the callback. The Prometheus exporter's parser for
that path was a (void)... // TODO stub, so sender-side metrics from
ristsender simply never showed up.
Walk the "sender-stats" -> "peers" array introduced in 0.2.15
(86e2a16), pull each peer's id, cname and stats numbers, and feed
them through the existing rist_prometheus_handle_sender_peer_stats
helper so the gauge/counter shape stays exactly what already shipped
on the receiver-callback path.
While here, factor the stale-peer cleanup block out of
rist_prometheus_parse_stats into rist_prometheus_cleanup_stale_locked
so both parsers run it. Without that, peers added with
from_callback=true on the JSON path would never expire.
JSON rtt is already in milliseconds (last_rtt / RIST_CLOCK in
src/stats.c) and the handler converts ms -> seconds, so the parser
feeds the value straight through with no extra scaling.
Closes the rist_prometheus_parse_sender_stats half of #206.
Builds on the parser shape from manueldev's mr/206 work, reworked
against the 0.2.15 JSON schema.
Co-authored-by: Manuel <malejandrodev@gmail.com>
prand_u32() now routes through _librist_crypto_ramdom_get_bytes()
unconditionally, but random.c was only added to the build when
have_srp was true (mbedTLS or nettle). The mingw-w64 -Duse_mbedtls=false
job legitimately has neither, so the librist.dll link fails with an
undefined reference.
Move random.c out of the have_srp block so the symbol always exists,
and guard the body so it returns -1 when no backend is compiled in.
prand_u32 already falls back to the wall clock on that path.
Reported by upstream CI on !314.