MM8108 mesh: MMRC rate control wedges on intermediate forwarding radio under multi-hop traffic, sometimes non-recoverable

codbu13 · April 24, 2026, 12:18pm

Summary

We’ve been running mesh performance testing with 3 MM8108 radios (R1, R2, R3) in a line, where R2 is the intermediate forwarding radio between R1 and R3. RSSI between radios that can see each other is healthy, around -70 to -75 dBm. We’re running a standard 802.11s mesh with mesh11sd, firmware version 0-rel_1_16_4_2025_Sep_18.

What works

Point-to-point traffic behaves well. Running iperf3 between two directly peered radios (R1 ↔ R2), MMRC handles the MCS level appropriately. When we push offered rate above the link’s capacity, the rate control scales down gracefully, packets drop as expected, and the link remains usable. This is the behavior we’d expect from a healthy rate control algorithm.

What doesn’t work

Multi-hop traffic through an intermediate radio behaves very differently. Running iperf3 from R1 to R3 through R2 (one forwarding hop), R2 is highly sensitive to even moderate traffic levels. When offered rate approaches R2’s sustainable forwarding capacity, we observe:

MCS plummets on R2’s link to R3 (7 → 5 → 3 → 2 → 0 over seconds)
tx failed spikes heavily on R2’s peer entry for R3
tx retries stop incrementing — R2 stops attempting transmission to R3
The R2 peer entry for R1 remains healthy throughout — only the outbound
link to R3 is affected
Signal remains healthy (-66 to -78 dBm) throughout the failure
mesh plink stays ESTAB even after the link is functionally dead
iperf3 eventually reports “server has terminated” as no traffic reaches R3

This happens reproducibly on any offered rate above approximately 4 Mbps for R1->R3 multi-hop, well below the per-link PHY capacity.

Recovery

After the link wedges, recovery is inconsistent:

Sometimes R2’s link to R3 recovers on its own after idle time
Often the link stays locked at low MCS indefinitely until R2 is rebooted
Lighter interventions (interface bounce, mesh leave/rejoin, mpath delete)
sometimes recover it, sometimes don’t
Repeated wedge events on the same radio appear to cause cumulative
degradation, with each event making recovery less likely

Apparent cause

It looks like R2’s combined receive-from-R1 and transmit-to-R3 load on a single half-duplex radio creates enough retry pressure that MMRC steps down MCS into a state it can’t recover from. Because the TX pipeline effectively stops attempting frames to R3, MMRC has no samples to probe higher rates and stays locked.

This looks similar to the MCS recovery issue documented in Thread 1485, where @michael.mccandless confirmed MMRC sampling has issues under low-traffic conditions and that fixes are being worked on for the 1.17 release and beyond. Our trigger is different (forwarding overload rather than mobility), but the terminal state looks like the same class of bug.

Questions

Is this known / expected to be addressed by the upcoming 1.17 firmware
improvements to MMRC?
Is there an approximate timeline for 1.17 on MM8108?
Is there a supported way to reset MMRC state for a single peer from
userspace without bouncing the whole interface or rebooting?
Are there any firmware tuning parameters or tc/qdisc configurations
that Morse would recommend to prevent the forwarding radio from getting
into this wedged state under bursty load?
For deployments with expected multi-hop mesh traffic, is there a
recommended design pattern to avoid hitting this failure mode?
Is there anything else that I am missing in my configuration?

Happy to share full iperf3 and station dump captures if useful for
reproducing.

codbu13 · April 27, 2026, 2:55pm

I believe I found a post that addresses a similar issue here. I have done some testing with SQM enabled and playing with the upper and lower limits per hop.

For more context, I have a setup where I have 5 MM8108-EKH01-01 radios in a straight line about 10ft apart. I have a variable step attenuator on each radio to simulate distance. I attenuated each radio enough so that it only sees its peers and creates a proper mpath entry to reach radios that are BLOS.

The issue I put above was happening with just 3 radios in a similar scenario using the FIFO default packet management. Now that I am using SQM and I set the DL and UL data rate limits appropriately, the links on the radios don’t fail anymore. Before, when TX fails and retries spike and the MMRC drops my MCS to 1 or 0, the links between my radios would drop and sometimes never recover unless rebooted.

Is there a way to dynamically define a throughput limit instead of defining it based on a “worst case” data rate scenario? I am wondering if Morse Micro has anything implemented to deal with packet management and “bloating” scenarios that was happening in my case.

OffGridTechClub · April 28, 2026, 1:00am

I wonder if Reticulum would correct some of these issues?

Topic		Replies	Views
TX MCS not recovering after RSSI improvement under low unicast traffic (MM6108) Software	3	100	April 10, 2026
MM8108 Link quality optimization Software	3	110	March 6, 2026
Multicast/Broadcast MCS fixing Software	9	378	February 6, 2026
MM8108-EKH19 in 802.11s General	1	127	December 24, 2025
Unable to change MCS for MorseMicro MM8108 Software	8	123	March 24, 2026