Extremely slow Ethernet on A20-OLinuXino-LIME2

Started by StupidBeard, July 16, 2015, 10:01:57 PM

Previous topic - Next topic

StupidBeard

Hi,

Last week I got an A20-OLinuXino-LIME2 and have been having issues with the Ethernet. HTTP transfers over the Gigabit lan are limited to approximately 70 - 100KByte/sec, and an SSH shell is frequently laggy. The issues are not limited to HTTP or ssh, these were just the most convenient things to test. In all cases the board was powered from a 5V 2.5A wall wart via the DC jack.

I initially thought this was just my Internet connection being crap, but the transfer rate is still vastly below what it should be. In any case, to rule that out, all of the following was done over a Gigabit switch to my Linux desktop. I've also verified similar issues to my Mac and over the internet.

The following tests were done on the 4.1.2 kernel version of jessie from Igor Pecovnik's site. I have also verified the same behaviour on the official Olimex wheezy image and Debian's jessie installer as downloaded from their site directly.

To establish a baseline I have been downloading a 10MB file over HTTP from my Debian box:

Total wall clock time: 3m 11s
Downloaded: 1 files, 10M in 3m 11s (53.6 KB/s)


Copying the same file over with scp results in a comparable transfer speed.

To rule out SD card write speed, I wrote a 10MB file from /dev/zero to a file:

root@lime2:~# dd if=/dev/zero of=test count=10 bs=1M
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.134305 s, 78.1 MB/s


To rule out the network, I used iperf3 to benchmark the LAN:

root@lime2:~# iperf3 -c 10.0.0.10
Connecting to host 10.0.0.10, port 5201
[  4] local 10.0.0.170 port 59548 connected to 10.0.0.10 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.01   sec  83.7 MBytes   693 Mbits/sec    0    144 KBytes       
[  4]   1.01-2.01   sec  88.3 MBytes   742 Mbits/sec    0    174 KBytes       
[  4]   2.01-3.00   sec  87.4 MBytes   738 Mbits/sec    0    192 KBytes       
[  4]   3.00-4.00   sec  88.1 MBytes   739 Mbits/sec    0    238 KBytes       
[  4]   4.00-5.01   sec  88.7 MBytes   740 Mbits/sec    0    284 KBytes       
[  4]   5.01-6.00   sec  87.5 MBytes   739 Mbits/sec    0    284 KBytes       
[  4]   6.00-7.01   sec  88.8 MBytes   742 Mbits/sec    0    308 KBytes       
[  4]   7.01-8.01   sec  75.7 MBytes   629 Mbits/sec    0    332 KBytes       
[  4]   8.01-9.00   sec  86.2 MBytes   730 Mbits/sec    0    392 KBytes       
[  4]   9.00-10.01  sec  88.8 MBytes   739 Mbits/sec    0    392 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.01  sec   863 MBytes   723 Mbits/sec    0             sender
[  4]   0.00-10.01  sec   863 MBytes   723 Mbits/sec                  receiver


And finally, I repeated the above tests using a USB wifi adapter. The wget test completed at 2.69MB/sec so I didn't bother with running the others over wifi.

It is pretty clear that there's some issue with the Ethernet, but I am not sure if it's a software issue or hardware. Has anyone seen this issue before or have any suggestions on how to sort it?

Thanks in advance.

Gerrit

#1
check for the right driver, gmac and not emac and if it is running at full speed

# dmesg | grep gmac
[    1.577908] [gmac]: sun6i_gmac platform driver registration completed
[   17.697307] sunxi_gmac: probed
[   17.705767] eth0: PHY ID 001cc912 at 1 IRQ 0 (sunxi_gmac-0:01) active
[   19.715542] PHY: sunxi_gmac-0:01 - Link is Up - 1000/Full



# ethtool eth0
Settings for eth0:
Supported ports: [ TP AUI BNC MII FIBRE ]
Supported link modes:   10baseT/Half 10baseT/Full
                        100baseT/Half 100baseT/Full
                        1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
                        100baseT/Half 100baseT/Full
                        1000baseT/Half 1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: external
Auto-negotiation: on
Current message level: 0x0000003f (63)
       drv probe link timer ifdown ifup
Link detected: yes

StupidBeard

Thanks for replying. There was nothing gmac related in dmesg, but a grep on eth0 found:

root@lime2:~# dmesg|grep eth0
[    1.881348] eth0: PHY ID 001cc912 at 1 IRQ POLL (stmmac-0:01) active
[   12.097433] stmmaceth 1c50000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx


That was with kernel 4.1.2; I am going to try the image with the 3.4.108 in a sec to see if that is using gmac.

How do I go about changing this to use the gmac driver instead?

StupidBeard

#3
I just tried the 3.4.108 image, which is definitely using the gmac driver and it is exhibiting the same problem. Actually it seems to be worse, currently getting 12KB/sec on the 10MB wget.

Edit:

I manually changed the link speed to 100mbit full duplex using ethtool and now I'm getting 11MB/sec on the 10MB test file. This is sounding to me like some kind of driver not being able to keep up issue.

Gerrit

lets go for the obvious then, replace the cable or try an other port on your switch

StupidBeard

Quote from: Gerrit on July 16, 2015, 11:48:29 PM
lets go for the obvious then, replace the cable or try an other port on your switch

Those were actually the first things I tried, including another switch entirely. The only thing that has made any difference was forcing the link speed to 100mbit, which I mentioned in the edit to previous post that was probably posted at the same time you were posting this :)

soenke

You already tested, that your LAN works fine. To 10.0.0.10 you get > 700Mbit.
My guess is your internet-router. Their interface is often only 100Mbit so maybe there is a lot of packet dropping if you use 1Gbit on your LIME which is then slowing down the TCP connection. Can you try your internet-speed from your 10.0.0.10-machine using its Gbit-Interface?

Try using something like wireshark to see if you get NACK-packets from your router.

StupidBeard

Quote from: soenke on July 17, 2015, 12:11:21 PM
You already tested, that your LAN works fine. To 10.0.0.10 you get > 700Mbit.
My guess is your internet-router. Their interface is often only 100Mbit so maybe there is a lot of packet dropping if you use 1Gbit on your LIME which is then slowing down the TCP connection. Can you try your internet-speed from your 10.0.0.10-machine using its Gbit-Interface?

Try using something like wireshark to see if you get NACK-packets from your router.

It's not the internet connection. I did spend a bunch of time initially going down that rabbit hole, but I ruled that out entirely by restricting testing to the LAN only, with Gigabit only.

iperf3 does show that the LIME and the LAN is capable of reasonable speed even if it's not the full gigabit, but as soon as you try and do anything useful with it (e.g. wget, scp) it gets bogged down in molasses and is essentially unusable.

Repeating the same tests with eth0 forced to 100mbit full duplex, with all other things remaining the same, results in roughly expected speeds.

I think that I have conclusively ruled out everything outside of the LIME, which just leaves a software or hardware issue.

Wireshark is a good suggestion though, I will look into that tonight.

StupidBeard

I did some more digging, and came across https://linux-sunxi.org/Ethernet

Under GMAC, it states:

QuoteFor reliable Gigabit networking (1000Mbit operation), several sunxi devices require an important tweak that adjusts the relative timing of the clock and data signals to the PHY, in order to compensate for differing trace lengths on the PCB (details). [snip] Recent mainline U-Boot uses CONFIG_GMAC_TX_DELAY to initialize these devices accordingly. If a necessary GMAC TX delay isn't set, then GBit Ethernet operation might be unreliable or won't work at all. 10/100 Mbit/sec negotiation is unaffected, so misconfigured devices could actually work (faster) when connected to a Fast Ethernet port instead of a GBit Ethernet port.

This is basically exactly what I'm seeing, so will look into that tonight.

soenke

well, still i think it is not a communication problem between your LAN interface and the switch or a LIME hardware problem, else it would not be possible to get 700MBit to your local server. So physical communication works.
So it has to be a >layer1-problem. Maybe your linux sends jumbo packets if it is run on gigabit and your router discards/fragements them? Maybe you could also check MTU, look after fragmented packets in wireshark.

Btw, are you using VLANs? They are adding some bytes to all packets and some interfaces cant handle this.

Try to use your local server as router for your lime-box and see if that changes anything. Put your lime into a separate subnet for that to be sure the packets are not going to your router directly.

ThibG

Have you resolved your issue? I'm experiencing something similar on my two LIME2. I measured 500Mbps+ between them, but then, a few hours later, one of them started to drop packets, as much as 10%! Rebooting solved the issue, but still...

Another issue I had twice but can't reproduce is my second LIME2 (now plugged to a 100Mbps port) completely stopped being reachable through the ethernet port (unplugging/replugging the cable did not change anything, but did confirm the system was not dead, as this would appear in the logs). That might be unrelated, though. In both cases, a reboot solved it. This LIME2 is plugged to both AC and a LiPo battery.

My two LIME2 are running vanilla Debian Jessie with a 4.0 kernel and 2015-04 u-boot from testing.

StupidBeard

Quote from: ThibG on July 22, 2015, 01:28:06 AM
Have you resolved your issue? I'm experiencing something similar on my two LIME2. I measured 500Mbps+ between them, but then, a few hours later, one of them started to drop packets, as much as 10%! Rebooting solved the issue, but still...

Another issue I had twice but can't reproduce is my second LIME2 (now plugged to a 100Mbps port) completely stopped being reachable through the ethernet port (unplugging/replugging the cable did not change anything, but did confirm the system was not dead, as this would appear in the logs). That might be unrelated, though. In both cases, a reboot solved it. This LIME2 is plugged to both AC and a LiPo battery.

My two LIME2 are running vanilla Debian Jessie with a 4.0 kernel and 2015-04 u-boot from testing.

No, but from everything I could find it looks like it's related to the CONFIG_GMAC_TX_DELAY in uboot. It looks like to fix it will require building a custom uboot and fiddling with said setting to find one that works for the LIME2.

The links I found are on another computer right now, but if you search for CONFIG_GMAC_TX_DELAY then you should find more information on the issue.

I meant to look into sorting it out at the weekend, but I got sidetracked with other things so I probably won't get around to it until next weekend now.

soenke

Quote from: ThibG on July 22, 2015, 01:28:06 AM
Another issue I had twice but can't reproduce is my second LIME2 (now plugged to a 100Mbps port) completely stopped being reachable through the ethernet port (unplugging/replugging the cable did not change anything, but did confirm the system was not dead, as this would appear in the logs). That might be unrelated, though. In both cases, a reboot solved it. This LIME2 is plugged to both AC and a LiPo battery.

My two LIME2 are running vanilla Debian Jessie with a 4.0 kernel and 2015-04 u-boot from testing.

You can try to connect via UART (the pins besides the ethernet port) and see if the local console is still works?
I had similar effects but in the end it was related to randomly failing microSD-Card-controller. Some systems stopped after some hours, others after some months with MMC-errors. The only way to pinpoint this error was a live connection to UART0 and see the kernel output. New SD-Card (SDSDSQAE-032G) --> problems solved!

dario

Unfortunately I have the same problem. Has anybody found a fix?

JohnS

Look at the post before yours.  Use uart output / nw SD card.

John