I’m experiencing an issue that could be buffer bloat related, or similar.
We have robots 100-1000m out in corn fields and are using a HalowLink1 in a mesh network configuration.
The robots are sending data at around 500kbps (62.5kB/s) upstream of around 95% UDP, 5% TCP traffic.
This works well when the robot is in range for higher bandwidth MCS settings at 2Mhz channel width, but down near MCS1/MCS0, the available bandwidth is insufficient to handle the traffic. At this point essentially all traffic from the robot stops arriving.
What I believe is happening is some sort of traffic jam caused by the retry percent at that long range eating into a sizable percentage of the available bandwidth, and causing the queue management on the radio to not handle things. It’s also possible that our PC (an x86 standard debian based server distro connected to the HalowLink 1 via ethernet) is the one mishandling the queues.
If I reduce the upload bandwidth usage, it works as expected, but I don’t want to limit bandwidth all of the time, since we have a large amount of telemetry data we wish to collect and ideally we’d keep the average datarate as high as we safely can. Additionally, due to the terrain, signal strength is not a straight funciton of distance.
Is it possible to enable something like SQM on the OpenWRT router, or some other traffic shaping/queue management system? I’ve tried installing both luci-app-sqm and sqm-scripts on the HalowLink, but unfortunately the packages all fail to install required dependencies due to a kernel version mismatch (at least on 2.7.4 and 2.7.6 firmware).
Any other ideas? I can scale things on the comptuer, but it does not seem trivial to estimate the upstream effective bandwidth on a mesh network. Does the HalowLink1 expose its queue depth and make that configurable or at least queryable somehow?