A10 LIME microSD card corruption

Started by DangeMask, January 21, 2016, 04:04:35 PM

Previous topic - Next topic

DangeMask

As i wrote before, we have about 170 devices based on OLinuXino A10 Lime in our company.

The devices are installed using one bit-copy of microSD card, we only change static IP and configuration file of our application. The OS is Debian, runs wifi connection, FTP server and our own Java app.

The issue is, from time to time, some devices stop working and we need to change the card.
Sometimes it looses any feature (FTP server not working, GPIO not working) and if we want to reboot, it wont boot up.
Sometimes there is no problem until new start of device, it just wont boot.

Every time it helps to change the microSD card and the device works fine.

We tried to test the cards, all I can say is the card is corrupted. Mainly I can find "circular directory structure" error while checking the card. If I use the source bit-copy and rewrite whole card, it can be used again.

All I found in this forum was this old topic without solution:
https://www.olimex.com/forum/index.php?topic=3838.0

Our app saves logs in text files the time changes from device to device. Sometimes it's about 2 writes per minute, sometimes less. I also found corruption of old log file, which wasn't opened for 2 days and after reboot, some characters were messed up. I assume the corruption doesn't just affect the currently worked-on file. When it corrupts any system file, the device can't boot.
We download 2 last log files from the device every 30 minutes and the FTP server (vsftpd) log's every connection. Maybe the amount of log writes (our, system, applications) is too much for flash card.

The cards are Kingston 4GB, class 10. We are waiting for better microSD cards (single layer, wear-leveling), but the price is crazy.
Future plans are replacing java app with C++ one and creating our own Debian distribution limited just for our use, but that is a long-track run.

Yesterday I found out there is one device, connected via ethernet instead of wifi, which is working from september without any issue. Maybe it can help us, maybe it's just a miracle.

We are not real Linux experts and are suffering from this behaviour for a few months now. Can anybody suggest where to look for the cause of this issue?

JohnS

Could be as simple (in terms of the cause, not the fix) as a marginal voltage or noise problem.  Finding the cause may be hard.  As it varies it is probably hardware, whether noise, voltage, or whatever.

You might start by checking data sheets very carefully as well as checking whether everything meets them.  Bear in mind that multiple voltages are set by software, for example (so could be changed).

In case it's noise / glitch you'll likely need hi-speed data logger or the like so you can find it.

Have you tried lab-quality PSUs in case the problem then goes away or is far less frequent?  (PSU is a classic source of noise etc but lab ones should be OK if good quality.)

RFI/EMC - more to think/worry about...

John

DangeMask

The devices are powered with 24V and has its own 24V to 5V converter. Sometimes connected to its own PSU in Assembly line power rack, some devices are powered by 24V 1A adapters from classic 220V socket.
We havent found any rule pointing to one specific power source. Cards become corrupted on different machines and different power sources.

The device which I use for SW development is laying on my desk allways turned on, with no isssue in months. We even cannot simulate the corruption.

Next week, a linux expert will come to check our Debian distro and help us to create our own trimmed only to features we will need. I realy hope it will help.

I tried to use the same microSD card im my android phone. It repetedly connected and disconnected the card and asked to format it, with no success. Maybe we are just using wrong type :-P

JohnS

With one being reliable it does not sound like a software problem.

John

MBR

Try to minimize writting to the SD card, the optimal situation shall be the rootfs mounted as readonly and all writes directed to the memory-backed filesystems like TmpFS. You can even use the OverlayFS that can make the work easier, becaiuse you don't have to worry if some process wants to write somehere - all writes simply go to the overlayed TmpFs. The only drawback is when you need to collect the data (confiurations,logs or such), but there is always an option of another storage device or network filesystem, or, at least for logging, the remote logging facility.

BMK

I posted about this before.
https://www.olimex.com/forum/index.php?topic=4918.msg20432#msg20432
I've used many A20 micro and LIME in automated test equipment and indusrial SCADA. No special precautions were taken, and they are all still alive. I always used the SanDisk cards bought from olimex with pre built images. Perhaps you have duff SD cards? Agree that tmpfs/nfs is the way to go to keep write activity to a minimum. I haven't bothered, so far I have not paid the price..

DangeMask

Thank you, I we all hope it's only a card model issue. I just received 20 new cards (ADATA class4) and will try them. If no new card will be corrupted in a week or two, we have a winner :-)