Quectel FGH100M-H throughput and SPI Test Mode Results

Resolution

Thanks for all the help!
After a lot of improvments to the system I was able to get the SPI bus throughput test on the RK3588 to be consistently 21.5Mbit/s.

Wrote 233600 bytes in 86 ms
Estimated IO upper bound: 21728 kbps

Wrote 233600 bytes in 86 ms
Estimated IO upper bound: 21728 kbps

Wrote 233600 bytes in 87 ms
Estimated IO upper bound: 21480 kbps

Mostly documenting this for anyone who runs into similar issues in the future.

The trace shared by @ajudge above was hugely valuable as a baseline for me to know where I should be looking for optimizations.

I should note that in my earlier testing there was a lot of variability in the throughput as reported by the spi bus test ranging between 10mbps to 15mbps with the most of the results being around 12mpbs.

UDP iperf3 tests were getting up to 10-15mpbs tx with rx being particulalry bad (around 6-8mpbs)

first tranche of changes

system improvement
SPI driver disable all power management functionality
SPI driver enable “rockchip,rt” dt parameter
DMAC driver disable all power management functionality
Morse driver align spi txn that use DMA
performance governor use “performance” instead of “schedutil” (duh)

this point i was seeing anywhere between 16-18 mpbs with the occasional 20mpbs.
And iperf UDP tests were showing similiar results both rx and tx around 17-18 mpbs.

Most of the problems here were spikes in delays of transactions and just unncessary delays before and after CS active/inactive to clk start/end and delays between cs active periods (scheduling related).

second tranche of changes

system improvement
IRQ Pinning move the dma and spi to a dedicated (big core) each
IRQ Pinning move other irqs off cores 4 and 5
cpu states disable sleep states on cpu 4 and 5

This got me to consistent 21mpbs spi bus tests.

and 20mpbs tx and rx on iperf tests.

last improvement

to 21.5mbps iperfs


All the 16byte transactions about the 1536 waste a lot of time on the bus

this may be cheating, but changing the MTU to 1486 instead of 1500 reduces an entire spi transaction after the block transfer, which helps a little bit. (not sure how well that holds up in real world scenarios).

our final iperf3 test now shows:

$ iperf3 -c 192.168.69.1 -u -b 30M -t 30 -p 5202
Connecting to host 192.168.69.1, port 5202
[  5] local 192.168.69.205 port 39012 connected to 192.168.69.1 port 5202
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  2.56 MBytes  21.5 Mbits/sec  1874
[  5]   1.00-2.00   sec  2.60 MBytes  21.8 Mbits/sec  1898
[  5]   2.00-3.00   sec  2.52 MBytes  21.2 Mbits/sec  1844
[  5]   3.00-4.00   sec  2.52 MBytes  21.1 Mbits/sec  1841
[  5]   4.00-5.00   sec  2.55 MBytes  21.4 Mbits/sec  1866
[  5]   5.00-6.00   sec  2.42 MBytes  20.3 Mbits/sec  1772
[  5]   6.00-7.00   sec  2.42 MBytes  20.3 Mbits/sec  1768
[  5]   7.00-8.00   sec  2.59 MBytes  21.7 Mbits/sec  1893
[  5]   8.00-9.00   sec  2.52 MBytes  21.1 Mbits/sec  1843
[  5]   9.00-10.00  sec  2.49 MBytes  20.9 Mbits/sec  1821
[  5]  10.00-11.00  sec  2.50 MBytes  21.0 Mbits/sec  1827
[  5]  11.00-12.00  sec  2.54 MBytes  21.3 Mbits/sec  1855
[  5]  12.00-13.00  sec  2.53 MBytes  21.2 Mbits/sec  1848
[  5]  13.00-14.00  sec  2.58 MBytes  21.7 Mbits/sec  1888
[  5]  14.00-15.00  sec  2.52 MBytes  21.1 Mbits/sec  1841
[  5]  15.00-16.00  sec  2.60 MBytes  21.8 Mbits/sec  1899
[  5]  16.00-17.00  sec  2.49 MBytes  20.9 Mbits/sec  1822
[  5]  17.00-18.00  sec  2.51 MBytes  21.1 Mbits/sec  1838
[  5]  18.00-19.00  sec  2.52 MBytes  21.1 Mbits/sec  1842
[  5]  19.00-20.00  sec  2.56 MBytes  21.5 Mbits/sec  1873
[  5]  20.00-21.00  sec  2.52 MBytes  21.2 Mbits/sec  1844
[  5]  21.00-22.00  sec  2.50 MBytes  20.9 Mbits/sec  1825
[  5]  22.00-23.00  sec  2.50 MBytes  21.0 Mbits/sec  1829
[  5]  23.00-24.00  sec  2.52 MBytes  21.1 Mbits/sec  1840
[  5]  24.00-25.00  sec  2.58 MBytes  21.6 Mbits/sec  1884
[  5]  25.00-26.00  sec  2.59 MBytes  21.8 Mbits/sec  1897
[  5]  26.00-27.00  sec  2.50 MBytes  21.0 Mbits/sec  1829
[  5]  27.00-28.00  sec  2.51 MBytes  21.1 Mbits/sec  1835
[  5]  28.00-29.00  sec  2.50 MBytes  21.0 Mbits/sec  1831
[  5]  29.00-30.00  sec  2.49 MBytes  20.9 Mbits/sec  1821
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-30.00  sec  75.7 MBytes  21.2 Mbits/sec  0.000 ms  0/55388 (0%)  sender
[  5]   0.00-30.04  sec  75.7 MBytes  21.2 Mbits/sec  0.578 ms  0/55388 (0%)  receiver

further improvements

as was mentioned above the interblock delay could be reduced, but likely won’t make a huge impact.

the mm6108 needs is 30us of delay but the host allots 40us.

Full spi test:

 Bus IO write estimator
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     batch(es):           16
     rounds:              10
     Wrote 233600 bytes in 86 ms
     Estimated IO upper bound: 21728 kbps
 Bus timing profiler
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     rounds:              16
     timing (us)
     bus claim  :    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     bus release:    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     read 32    :   26   27   26   26   26   26   24   26   26   26   26   26   26   26   25   26
     read bulk  :  458  509  462  456  453  454  453  510  503  455  459  455  455  454  463  455
     write 32   :   28   27   26   27   26   27   26   26   26   26   26   26   39  593   36   48
     write bulk :  458  454  552  457  455  452  451  451  452  451  457  496  544  456  457  507
 SKB allocation profiler (100 skbs w/ 1562 bytes)
     alloc: 44 us
     free:  45 us

 Bus IO write estimator
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     batch(es):           16
     rounds:              10
     Wrote 233600 bytes in 86 ms
     Estimated IO upper bound: 21728 kbps
 Bus timing profiler
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     rounds:              16
     timing (us)
     bus claim  :   42    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     bus release:    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     read 32    :   31   27   26   26   26   26   26   26   25   26   26   26   26   26   22   26
     read bulk  :  459  459  469  506  475  518  459  508  458  486  464  461  514  458  473  454
     write 32   :   29   29   29   43   29   29   29   29   43   30   29   29   29   71   27   27
     write bulk :  455  452  468  463  452  540  455  499  455  456  455  457  464  466  497  456
 SKB allocation profiler (100 skbs w/ 1562 bytes)
     alloc: 49 us
     free:  45 us

 Bus IO write estimator
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     batch(es):           16
     rounds:              10
     Wrote 233600 bytes in 87 ms
     Estimated IO upper bound: 21480 kbps
 Bus timing profiler
     packet size (bytes): 1460
     overhead (bytes):    102
     padding (bytes):     2
     rounds:              16
     timing (us)
     bus claim  :    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     bus release:    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
     read 32    :   27   27   26   26   26   26   26   26   26   26   26   26   26   26   25   26
     read bulk  :  457  454  454  453  454  452  453  454  453  454  454  454  453  454  506  503
     write 32   :   27   26   26   26   26   26   26   26   26   27   26   26   26   26   26   40
     write bulk :  455  451  452  452  451  452  451  452  451  451  452  451  451  451  452  451
 SKB allocation profiler (100 skbs w/ 1562 bytes)
     alloc: 49 us
     free:  43 us
1 Like

Glad you got there in the end! As mentioned earlier SPI host controllers can be particularly variable in the way they function and unfortunately it’s not uncommon to be doing this level of debug for optimisation :see_no_evil_monkey:

haha - this is probably a small enough difference that it will be okay.