Mesh network not behaving as expected

Hi!

I am setting up an 802.11s mesh but it is not behaving as expected in a multi‑hop scenario. Here is my setup:

  • Mesh gate (MG): This is either a Heltec or Alfa Halow router set up as a Mesh gate through the UI wizard. It runs a DHCP server to hand out IP addresses to all devices on the network
  • Laptop: It is connected to the Mesh Gate via ethernet no other router or upstream network.
  • Mesh Point 1 (MP1): This is a Heltec router. It is in range of the Mesh Gate via Halow.
  • Mesh Point 2 (MP2): This is my custom hardware. It uses a static IP. It is basically a camera that transmits video via Halow. It uses AsiaRF’s Halow module, and is controlled by an i.MX8M-Mini running Yocto.

What works

  • If MP1 and MP2 are both in range of the Mesh Gate, they associate directly to MG and everything works.
  • I can stream video from MP2 to the laptop via MG without issues.

What fails

When I move MP2 so that:

  • MP2 is out of range of MG,
  • but MP2 is in range of MP1

then:

  • MP2 associates to MP1 (shows up in MP1’s associated stations list),
  • but MG cannot ping MP2 and no traffic is forwarded through MP1.
  • On MG/MP1, MP2’s MAC shows up, but the UI never learns an IP for it (“?”)

Below is the wpa_supplicant_s1g file running on MP2.

 ctrl_interface=/var/run/wpa_supplicant_s1g
 update_config=1
 country=US
 pmf=2
 sae_pwe=1
 max_peer_links=10
 mesh_fwding=1

 network={
    ssid="\[SSID\]"
    key_mgmt=SAE
    sae_password="\[PASSWORD\]"
    pairwise=CCMP
    ieee80211w=2

    # Mesh mode
    mode=5

    # 904.5 MHz, 1 MHz BW, US
    channel=5
    op_class=68
    country="US"
    s1g_prim_chwidth=0
    s1g_prim_1mhz_chan_index=0

   beacon_int=1000
   dtim_period=1
   mesh_rssi_threshold=-85
}

I have tried a lot of different settings and configurations, but I am unable to get it to work. Perhaps my understanding of 802.11s is flawed, or somehow the hardware does not support it.

Any help is much appreciated!

Hi @SanderV

I’m definitely no mesh expert myself, but will run this by someone internally who may know more. There are a few things that stand out to me, and some things to consider in the meantime.

  • mesh_fwding might need to be included in the network block. Other configs I’ve seen place it there rather than the global section.
  • traffic will need to flow before MG builds the ARP table to map the IP to the MAC address. This could even be a DHCP exchange, but a ping will suffice (once the routing is working).
  • On the other nodes running OpenWrt, you can find their mesh configuration at /var/run/wpa_supplicant-wlanX.conf. It would be worth inspecting this and comparing your configuration.

The other thing I’m uncertain about is how mesh11sd may be interacting. It shouldn’t cause what you’re seeing, so have a look at the above first and then we can consider mesh11sd.

Hi @ajudge

Thanks for your reply!

I moved the mesh_fwding property to the network block, but the behavior remains unchanged.

During testing I did notice that when I first connect MP2 to MG so it get an IP address, and then move MP2 out of range from MG, but in range of MP1 I can ping it just fine. So your theory on the ARP table seems likely.

Find below the wpa_supplicant on the routers used. The only difference between MG and MP1 here is the dot11MeshGateAnnouncements property.

country=US
ctrl_interface=/var/run/wpa_supplicant_s1g
sae_pwe=1
max_peer_links=10
mesh_fwding=1
network={

    ssid="RescueRatMesh"
    key_mgmt=SAE
    mode=5
    channel=5
    op_class=68
    country="US"
    s1g_prim_chwidth=0
    s1g_prim_1mhz_chan_index=0
    dtim_period=1
    mesh_rssi_threshold=-85
    dot11MeshHWMPRootMode=0
    dot11MeshGateAnnouncements=1
    mbca_config=1
    mbca_min_beacon_gap_ms=25
    mbca_tbtt_adj_interval_sec=60
    dot11MeshBeaconTimingReportInterval=10
    mbss_start_scan_duration_ms=2048
    mesh_beaconless_mode=0
    mesh_dynamic_peering=0
    sae_password="[PASSWORD]"
    pairwise=CCMP
    ieee80211w=2
    beacon_int=1000
    mac_addr=3
    mac_value=e4:38:19:1f:f0:17

}

Without having had a chance to test anything. Can you make sure your kernel has the mac80211: Mesh Support patch as per mac80211: Mesh support · MorseMicro/linux@272137c · GitHub (which targets the 6.12.21 kernel for example).

Without this, path selection may not function correctly on a link with management frame protection enabled (ieee80211w).

Thanks! I did not apply that patch when building the image.

I am running the kernel version below. Is that patch available for it?

root@ucm-imx8m-mini:~# uname -r
5.10.35-ucm-imx8m-mini-2.2.1+ge47d436930ca

We have patches for a 5.10.11 kernel, which should be close enough that it will apply. For the mesh support, see mac80211: Mesh support · MorseMicro/linux@77e5dc0 · GitHub

You might want to consider examining the full list. For patches provided as of our 1.16 release, see mm/linux-5.10.11/1.16.x

Great, I found a few that might be worth applying. I won’t have time in the next couple of days, but I will check back in once I have done this with the results.

1 Like

@ajudge

I applied the patches below, but unfortunately the behavior remains unchanged.

Do you have any other suggestions on what to try next?

Thanks!

I’ll have to try to reproduce this I think.

In the meantime, do you see the same behaviour if you put your custom hardware (MP2) in the middle? i.e. move MP1 around instead.
I’m curious if there is something awry with the Heltec router when it needs to forward frames in the mesh.

I replaced MP2 to also be a Heltec router. SO MP1 and MP2 are mesh point Heltec router and it works as expected. From my laptop I can ping MP2 even when it is outside the range of MG.

For you information. This is how I initialize the mesh point on my custom hardware. Maybe the static IP throws a wrench in things, but I doubt it.

#!/bin/sh

LOG=/var/log/rc.local.log
echo "rc.local starting at $(date)" >> "$LOG"

# Wait for wlan0
sleep 5

# Start S1G wpa_supplicant in background
wpa_supplicant_s1g -i wlan0 -c /etc/morse/wpa_supplicant_s1g.conf -D nl80211 -d -B \
  || echo "wpa_supplicant_s1g failed" >> "$LOG"

# Set static IP (ignore if already set)
ip addr add 192.168.3.11/24 dev wlan0 2>>"$LOG" \
  || echo "ip addr add 192.168.3.11 failed (maybe already exists)" >> "$LOG"

# Start your Python app with error logging
python3 /home/root/Main.py >> "$LOG" 2>&1 \
  || echo "python3 /home/root/Main.py failed" >> "$LOG"

echo "rc.local finished at $(date)" >> "$LOG"

exit 0

Still haven’t been able to reproduce this yet.

I highly doubt the IP address configuration would cause a problem here - static or dhcp shouldn’t matter. Though to be sure, on your custom hardware please share the output of ip route show to check if there is anything wrong with the routing config.

Can you also share the output of iw dev wlan0 mpath dump when in the “failing” state?

It may also be helpful to see the output of iw dev wlan0 station dump and iw dev wlan0 mpp dump

Lastly, if possible, it would be nice to see a tcpdump/wireshark capture of the wlan0 interface of your failing devixe, and concurrently perform a capture on the mesh gates interface.
Attempt to run a ping while the captures are running - this will tell us if layer 2 traffic (arps) are getting through the mesh.