[SOLVED] U-boot "Retry count exceeded"

Started by blinkenlight, August 07, 2015, 12:16:41 am

Previous topic - Next topic


Okay, first off? The registration / posting "captcha" is juuust insidious - frankly I couldn't care less how many cores the RK3188 may or may not have - IT'S NOT WHAT I CAME HERE FOR.

That said, I recently purchased an RT5350F module, to use as a stand-alone OpenWRT server (I DO have a router. It's busy doing other stuff); after copious amounts of fumbling about not understanding why it wouldn't remember any setting at all I finally realized I needed to update it to the latest OpenWRT .bin image because the one it was shipped with had the memory partition sizes wrong making it useless. So far so good, but I never planned to have a physical RJ45 Ethernet jack on the module, so I had to hack one up with what I had at hand, then tried to use U-boot to flash the latest .bin image. The result? Failure with "Retry count exceeded". Why? Well, read on...

There are three factors contributing to this failure (if you have the same problem) - any of them harmless on its own but fatal when compounded:

- The "hacked up" Ethernet connection, even if really, really short, might have significant UDP packet loss [1]
- Current TFTP servers DO NOT honour lost packets with a retransmission due to a problem called the "Sorcerer's Apprentice Syndrome" [2]
- U-boot will ABORT THE TFTP TRANSFER after 10 errors TOTAL (not sequential errors - total errors!) [3]

Now, here's what you can do about all of this:

[1] - try to make your physical Ethernet connection as lossless as possible. In my case, even though it was a only 10cm "cable" between the module and the laptop I was trying to configure it with, apparently -NOT- connecting the "device GND" to the RJ45 connector's shielding pins made all the difference (granted, I used a "common mode choke -> transformer / transformer -> common mode choke" magjack instead of the more common "common mode choke -> transformer / common mode choke -> transformer" configuration simply because that's what I had at hand). At any rate, just DON'T CONNECT SHIELDING AT ALL. Also, powering up directly (as opposed to "rebooting" from a fully booted OpenWRT session) into U-boot may or may not help - it's worth a try.

[2] - Current TFTP servers may have a good reason to ignore retries (see above why) but the fact remains: packets might still occasionally get lost and most of the the popular TFTP servers (like the oh-so-popular tftpd32) will do NOTHING to address that - they'll even brazenly ignore them (see their own logs reporting "Ack block XXXX ignored (received twice)"). You may want to try using a more forgiving TFTP server such as http://sourceforge.net/projects/tftputil - the mentioned "Sorcerer's Apprentice Syndrome" is indeed a problem when packets get DELAYED but it's NOT A PROBLEM (yet a full-on SHOWSTOPPER) when packets get JUST SIMPLY FSCKING LOST.

[3] - U-boot seems hard-configured to allow no more than 10 retransmission-errors in TOTAL, and there's not much you can do about that. Your best bet is trying to get rid of retransmission errors completely, as advised in [1], and try to get over the remaining few the best you can as advised in [2].