Crashing of A20-Olinuxino-Micro-n8GB Linux Image and its filesystem.

Started by bpir1, April 06, 2018, 11:10:39 AM

Previous topic - Next topic

bpir1

Hello,

We have purchased 50 numbers of A20-OLinuxino- Micro-n8GB development boards from your
authorized vendor in India Logsun Systems & have also copied this email to him.

Issue:

I have downloaded and installed Debian Jessie linux image for A20-olinuxino- micro-n8GB (A20-
OLinuXino-MICRO Debian Jessie with kernel 3.4.103+ release 15). I am using Power supply adaptor with
rating (12V, 2A).

It worked normally for a few days, then crashed (only Power led is on) and usually the only way to get
out of this is to rewrite the NAND flash with a new image.

Kindly answer the following questions,

1) What is the solution to above problem? (Other than reflashing a Linux OS).

2) When we reboot the A20-Olinuxino- Micro board sometimes the NAND files are getting corrupted
with following error.

    "cannot access files: input/output error."

   When we get above error, we can't remove those files.
   If above error occurs, then how to repair file system?
   (Without using following procedure.
    a) booting A20-Olinuxino- Micro board with sd card
    b) execute command e2fsck -y -b 32768 /dev/nandb
    c) power off and remove the sd card then reboot.)
 
3) What are the ways of crashing a A20-Olinuxino- Micro Linux NAND image and how to avoid it?

4) Are there any Linux OS / Kernel settings which doesn't allow the system from being crashed ?

5) How do I ascertain that the Olimex board is working 100% correctly, pointers on which tests to be run
will help?

Thanks

LubOlimex

Do you have a battery attached? Does the boards under test lose all power supply suddenly (unexpected hardware turn off)?
Technical support and documentation manager at Olimex

LubOlimex

This is a quite certainly a file system corruption that occurs frequently when you perform unexpected hardware power down. It happens typically when you disconnect all power supply from the board before performing a software shutdown.

This is very typical problem and it is not related to the hardware. I recommend you to read the following article "Preventing Filesystem Corruption in Embedded Linux" and the solutions it suggests for complete information on the matter: https://www.embeddedarm.com/about/resource/preventing-filesystem-corruption-in-embedded-linux

> 1) What is the solution to above problem? (Other than reflashing a Linux OS).

By default, using the official image, there is no recovery, except for partial or complete re-write of the initial image.

> 3) What are the ways of crashing a A20-Olinuxino- Micro Linux NAND image and how to avoid it?

Preventing sudden shut downs is the easiest and fastest way to avoid file system corruption. The best way to do it is to attach a small Li-Po battery to each board, this way even if the main power supply gets disconnected the board would still be operational. In this case there would be NO sudden power downs. To make the things even better, if you expect the main power supply to be absent for long time, you can use software means (available in the official image) to detect both the event of main power loss and the charge of the battery and if the battery is about to drain completely you can perform a software shutdown. A20-OLinuXino-MICRO boards come with built-in advanced Li-Po charger circuit and connector. Using a battery is highly recommended at all times. You can use these batteries as a reference to what to look for:

https://www.olimex.com/Products/Power/BATTERY-LIPO1400mAh/
https://www.olimex.com/Products/Power/BATTERY-LIPO4400mAh/

Another idea is to test whether the problem persists with the eMMC version of the board (A20-OLinuXino-MICRO-e4GB). Some people report that the eMMC flash memories tend to corrupt harder, compared to NAND flash memories.

Of course, there are software workarounds that might minimize the chance of such corruption – using another type of file system,  using read-only file systems for the important parts; reducing amount of disk writes; setting up frequent back ups – these are some of the software options to explore and consider.

> 5) How do I ascertain that the Olimex board is working 100% correctly, pointers on which tests to be run will help?

I will give you instructions on how you can test the hardware at your side.

Login as root and password olimex and run a stress test with tools like "stress" and "memtester" – both installed by default. I use the following two commands to stress the processor and the RAM (you can copy paste them):

memtester 200 &
stress --cpu 2 --io 2 &

Then monitor the activity of the processor and the RAM with command "top –d1". Leave the stress tests run for some time and inspect if the board under test hangs. You can stop processes running with & with command "fg" and CTRL+C.

Best regards,
Lub/OLIMEX
Technical support and documentation manager at Olimex

bpir1

Dear Sir,

Thank you for your reply.

1. As per your reply, I have used "memtester 200 &" command to check RAM Stress for A20-OlinuXino-Micro-n8GB hardware.
Below are the testing results for Memtester command.

RAM Stress Test for A20-OlinuxIno-Micro-n8GB board has always failed.

I have used latest Official Linux https://www.olimex.com/wiki/images/f/fc/A20_OLinuxino_Micro_debian_Jessie_34_103_2G_NAND_release_15.torrent
image only. Also While performing memtester, nothing was installed other than above mentioned Linux OS in A20-Olinuxino-micro-n8GB.

Since RAM Stress Test has failed, we have not checked CPU Stress Test. We desire the board to be stable.

2.  Is their any way to preserve the date and time of Debian Jessie Linux OS across each reboot in A20-Olinuxino-Micro-n8GB? Means Is their any way to set date and time in CPU RTC of A20-Olinuxino-Micro-n8GB which will preserve date and time across reboot so that on next boot Olimex should not go to Jan 01 2010 05:30:00 without internet connected to A20-Olinuxino-Micro-n8GB?

Please get back to us.

"memtester" Command Results
RAM Stress Test on A20-OlinuXino-Micro-n8GB board
1) Following are the testing results for "memtester 200 &" command.(A20-OlinuXino-Micro-
n8GB)
----------------------------------------------------------------------------------------------------
Board No : 1
----------------------------------------------------------------------------------------------------
=> 1st time board got freezed. ( directly displays the White screen ).
=> 2nd time worked properly for 24 hours without any issues.
----------------------------------------------------------------------------------------------------
Board No : 2
----------------------------------------------------------------------------------------------------
=> 1st time board got freezed. (directly displays the White screen ).
=> 2nd time board got freezed (displays the Failure Error list)
Failure : ox00005000 ! =ox000a000 at offset ox001e6e80
=> 3rd time board got freezed (displays the Failure Error list)
Failure : oxffffffff ! =ox0000ffff at offset ox00d87ffc
=> 4th time board got freezed (displays the Failure Error list)
Failure : ox00000000 ! =ox0000ffff at offset ox014970bc
=> 5th time board got freezed (displays the Failure Error list)
Failure : oxffffffff ! =ox0000ffff at offset ox014970c0
=> 6th time board got freezed (displays the Failure Error list)
--------------------------------------------------------------------------------------------------------
Board No :3
--------------------------------------------------------------------------------------------------------
=>1st time board got freezed. (directly displays the White screen).
=> 2nd time board got freezed (directly displays the White screen).
=> 3rd time board got freezed (displays the Failure Error list)
Failure : oxffffffff ! =ox0000ffff at offset ox010397fc
=> 4th time board got freezed (displays the Failure Error list)
Failure : oxffffffff ! =ox0000ffff at offset ox01327a84
=> 5th time board got freezed (displays the Failure Error list)
=> 6th time board got freezed (displays the Failure Error list)
------------------------------------------------------------------------------------------------------
Board No :4
-------------------------------------------------------------------------------------------------------
=> 1st time board got freezed (displays the Failure Error list)
=> 2nd time board got freezed (displays the Failure Error list)
=> 3rd time board got freezed (displays the Failure Error list)
=> 4th time board got freezed (displays the Failure Error list)
=> 5th time board got freezed (displays the Failure Error list)
=> 6th time board got freezed (displays the Failure Error list)
2) Following are the testing results for "memtester 200 8 &" command.(A20-OlinuXino-Micro-
n8GB)
------------------------------------------------------------------------------------------------------------------------
---------
Board No : 1
------------------------------------------------------------------------------------------------------------------------
---------
=> 1st time worked properly for 24 hours without any issues "memtester 200 &".
=> 2nd time
# loop 1 worked properly
# loop 2 Error :
Bit Flip : Failure :0x0400ffff !=0X04000000 at offset 0x00045b880
Failure :0xffffffff !=0fbfffffff at offset 0x00045b884
# loop 3 Error :
Bit Flip : Failure :0x0400ffff != 0fbfffffff at offset 0x0001e4ec0
Failure : 0fbfffffff != 0X04000000 at offset 0x0001e4ec4
# loop 4 worked properly
# loop 5 worked properly
# loop 6 worked properly
# loop 7 Error :
Bit Flip : Failure :0x7fffffff != 0x80000000 at offset 0x0001e8100
Failure : 0x80000000 != 0x7fffffff at offset
0x0001e8104
# loop 8 worked properly
------------------------------------------------------------------------------------------------------------------------
------------
Board No : 2
------------------------------------------------------------------------------------------------------------------------
----------
=> 1st time board got freezed. (directly displays the White screen ).
=> 2nd time board got freezed (displays the Failure Error list)
# loop 1 Error :
Solid Bit: Failure :0xffffbffb != 0xffffffff at offset 0x00046430000
Bit Spread : Failure :0x00000000 != 0x000001400 at offset
0x0016808000
Bit Flip : Failure :0xffvb001 != 0x00000001 at offset 0x003c1544
Failure : 0x00114fffe != 0xffffffff at offset 0x003c1548
# loop 2 Error :
Bit Flip : Failure :0x0000ffff != 0x00000000 at offset 0x003280000
Failure : 0xffff0000 != 0xffffffff at offset 0x003280004
=> 3rd time board got freezed. (directly displays the White screen ).
=> 4th time board got freezed. (directly displays the White screen ).
=> 5th time board got freezed. (directly displays the White screen ).
------------------------------------------------------------------------------------------------------------------------
----
Board No :3
------------------------------------------------------------------------------------------------------------------------
-----
=> 1st time board got freezed. (directly displays the White screen).
=> 2nd time board got freezed (directly displays the White screen).
=> 3rd time board got freezed (displays the Failure Error list)
=>4th time board got freezed. (directly displays the White screen).
=>5th time board got freezed. (directly displays the White screen).
------------------------------------------------------------------------------------------------------------------------
Board No :4
------------------------------------------------------------------------------------------------------------------------
=> 1st time board got freezed (displays the Failure Error list)
# loop 1 Error :
Solid Bit: Failure :0xffffffff != 0x000000000 at offset 0x0103089fc
=> 2nd time board got freezed(directly displays the White screen).
=> 3rd time board got freezed (displays the Failure Error list)
# loop 1 Error :
Solid Bit: Failure :0xffffbffb != 0xffffffff at offset 0x00046430000
Bit Flip : Failure : 0xfffffffe
!= 0x000000ffe at offset 0x00f3cffe
Failure : 0x000000000
!= 0xffff0001 at offset 0x00f3d000


Thanks

bpir1

Dear Sir,

Thank you for your reply.

1. Booting Linux OS from SD Card
   I have used "memtester 200 &" and "stress --cpu 2 --io 2 &" command to check RAM Stress and CPU Stress of A20-OlinuXino-Micro-n8GB hardware with Linux OS booting from micro SD Card. In this scenerio, above two commands worked properly without any errors or system hang.

2. Booting Linux OS from NAND flash
   RAM Stress Test ("memtester 200 &") for A20-OlinuxIno-Micro-n8GB board has always failed and CPU Stress test didn't give any error(Monitored CPU Stress Command for more than One Hour, but didn't observe any error or System freeze).

So A20-OlinuxIno-Micro-n8GB board fails when we boot from NAND Flash. Since our system works on NAND flash, we desire the board to work properly from NAND flash.

Is this issue is related with the NAND chip of A20-OlinuxIno-Micro-n8GB board?

Thank You
Shrikant
EdisonBro Smart Labs