Summary
We’ve been running mesh performance testing with 3 MM8108 radios (R1, R2, R3) in a line, where R2 is the intermediate forwarding radio between R1 and R3. RSSI between radios that can see each other is healthy, around -70 to -75 dBm. We’re running a standard 802.11s mesh with mesh11sd, firmware version 0-rel_1_16_4_2025_Sep_18.
What works
Point-to-point traffic behaves well. Running iperf3 between two directly peered radios (R1 ↔ R2), MMRC handles the MCS level appropriately. When we push offered rate above the link’s capacity, the rate control scales down gracefully, packets drop as expected, and the link remains usable. This is the behavior we’d expect from a healthy rate control algorithm.
What doesn’t work
Multi-hop traffic through an intermediate radio behaves very differently. Running iperf3 from R1 to R3 through R2 (one forwarding hop), R2 is highly sensitive to even moderate traffic levels. When offered rate approaches R2’s sustainable forwarding capacity, we observe:
- MCS plummets on R2’s link to R3 (7 → 5 → 3 → 2 → 0 over seconds)
- tx failed spikes heavily on R2’s peer entry for R3
- tx retries stop incrementing — R2 stops attempting transmission to R3
- The R2 peer entry for R1 remains healthy throughout — only the outbound
link to R3 is affected - Signal remains healthy (-66 to -78 dBm) throughout the failure
- mesh plink stays ESTAB even after the link is functionally dead
- iperf3 eventually reports “server has terminated” as no traffic reaches R3
This happens reproducibly on any offered rate above approximately 4 Mbps for R1->R3 multi-hop, well below the per-link PHY capacity.
Recovery
After the link wedges, recovery is inconsistent:
- Sometimes R2’s link to R3 recovers on its own after idle time
- Often the link stays locked at low MCS indefinitely until R2 is rebooted
- Lighter interventions (interface bounce, mesh leave/rejoin, mpath delete)
sometimes recover it, sometimes don’t - Repeated wedge events on the same radio appear to cause cumulative
degradation, with each event making recovery less likely
Apparent cause
It looks like R2’s combined receive-from-R1 and transmit-to-R3 load on a single half-duplex radio creates enough retry pressure that MMRC steps down MCS into a state it can’t recover from. Because the TX pipeline effectively stops attempting frames to R3, MMRC has no samples to probe higher rates and stays locked.
This looks similar to the MCS recovery issue documented in Thread 1485, where @michael.mccandless confirmed MMRC sampling has issues under low-traffic conditions and that fixes are being worked on for the 1.17 release and beyond. Our trigger is different (forwarding overload rather than mobility), but the terminal state looks like the same class of bug.
Questions
- Is this known / expected to be addressed by the upcoming 1.17 firmware
improvements to MMRC? - Is there an approximate timeline for 1.17 on MM8108?
- Is there a supported way to reset MMRC state for a single peer from
userspace without bouncing the whole interface or rebooting? - Are there any firmware tuning parameters or tc/qdisc configurations
that Morse would recommend to prevent the forwarding radio from getting
into this wedged state under bursty load? - For deployments with expected multi-hop mesh traffic, is there a
recommended design pattern to avoid hitting this failure mode? - Is there anything else that I am missing in my configuration?
Happy to share full iperf3 and station dump captures if useful for
reproducing.