Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => iMX233 => Topic started by: Kean on September 23, 2012, 02:06:28 PM

Title: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on September 23, 2012, 02:06:28 PM
Hi All,

Just wondering how many of you are working with the iMX233-OLinuXino-MICRO ?

I've got a couple of iMX233-OLinuXino-MAXI boards that we've used for initial development, and after a few minor issues have them working well running our code.  I've even re-compiled the kernel on the MAXI, which took a couple of days, and plenty of swap space.  But we need a smaller low cost solution, with just SPI, I2C, and a USB WiFi adapter, so we are now setting up everything to run on the MICRO.

The issue: We've found that the exact same SD card image that works great on the MAXI - running for days at a time with no issues - only runs our code for an hour or two before we get an Oops message on the console, and the program hangs (usually ends up a zombie, and can't be killed).

Just to confirm it isn't anything I've caused by modifying the board or SD image, I've taken a new MICRO out of the box, and used an new Olimex microSD as shipped from Olimex via Mouser.  We had the same issue, and that wasn't even running any of our code - just the standard image with just power and serial console connected.  This is 3 different MICRO boards, running the old Olimex distribution as well as ARCH.

I am using a good quality lab power supply, and custom made FTDI console cables with a Schottky diode to stop leakage current (killed an SD card on the first day and learnt my lesson there).  We have even tried enabling swap space on the SD card to ensure it isn't an out of memory condition.

A typical Oops looks like this (but every one is different):
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c385c000
[00000000] *pgd=43dc4031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio65/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXinoR4 #11)
PC is at kmem_cache_alloc_node+0x8/0x7c
LR is at prepare_creds+0x28/0x130
pc : [<c0090134>]    lr : [<c0058d9c>]    psr: 40000013
sp : c3dbff68  ip : 00000000  fp : bed1db44
r10: 4a3d7574  r9 : c3dbe000  r8 : c0026f04
r7 : ffffff9c  r6 : 00000004  r5 : c3d321c0  r4 : 00000000
r3 : c038834c  r2 : ffffffff  r1 : 000000d0  r0 : 00000000
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 4385c000  DAC: 00000015
Process sleep (pid: 1892, stack limit = 0xc3dbe270)
Stack: (0xc3dbff68 to 0xc3dc0000)
ff60:                   c038834c 00000000 c3d321c0 c0058d9c 00000000 4a3cc23c
ff80: 00000004 c009141c ffffff9c 4a3cc23c 00000000 4a3cc23c 4a3d7954 00000021
ffa0: c0026f04 c0026d80 00000000 4a3cc23c 4a3cc23c 00000004 00000000 4a3d6d70
ffc0: 00000000 4a3cc23c 4a3d7954 00000021 00000001 4a3d7968 4a3d7574 bed1db44
ffe0: 4a3d7954 bed1da84 4a3b27ec 4a3c813c 60000010 4a3cc23c 00000000 00000000
[<c0090134>] (kmem_cache_alloc_node+0x8/0x7c) from [<c0058d9c>] (prepare_creds+0x28/0x130)
[<c0058d9c>] (prepare_creds+0x28/0x130) from [<c009141c>] (sys_faccessat+0x1c/0x178)
[<c009141c>] (sys_faccessat+0x1c/0x178) from [<c0026d80>] (ret_fast_syscall+0x0/0x2c)
Code: c0389a58 c02f2abe e92d4038 e1a04000 (e5900000)
---[ end trace e714e7a61f35d43d ]---
Segmentation fault


Although there are a number of differences between the MICRO and MAXI, the primary one that I suspect is causing this is the routing of the EMI (DDR RAM) signals.  As mentioned in this Olimex blog post http://olimex.wordpress.com/2012/05/28/imx233-olinuxino-micro-doube-side-design-works-at-full-speed/

1) Is anyone else seeing these issues ?
2) Is it possible to reduce the DDR RAM clock speed to improve reliability ?

I've got a big demo planned pretty soon and really need something working...

Kean
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: davidjf2001 on September 23, 2012, 05:24:15 PM

If you suspect DRAM timing is on some critical edge trying heating and cooling the board. Does this make any difference? Otherwise could be a power supply issue.  Despite your lab supply, solder some bulk capacitance on the board itself.

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Fadil Berisha on September 23, 2012, 07:36:02 PM
 
QuoteWe had the same issue, and that wasn't even running any of our code - just the standard image with just power and serial console connected.

Probably problem is with power supply. Nobody has reported similar problem. Also, did you use ksymoops - a utility to decode Linux kernel Oops?  Maybe can help to identify source of failure.

Regards

Fadil Berisha
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: redfox74 on September 24, 2012, 03:29:27 PM
Hi had same problem on Olimex Micro a lot of problem on Maxi quite well. I see that the main problem is the usb device that need a lot of power to work as for example the 3G dongle.
So i think that the circuit use the same 5v for the micro and for the dongle so the main problem is that it share same power source.
When a 3g dongle require more power a glictch on power produce in extreme situation a reset in other could be cause a kernel panic because the dram change his value .
Now i solve main problem put on usb a powered usb hub
Best
Roberto
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on September 24, 2012, 04:35:47 PM
note also that some GSM/3G modems radiate lot of power which cause inductive feed back on the tracks and may drive the processor crazy, we do some examples now with MOD-GSM and found that during the call MOD-GSM if placed at less 10 cm from the iMX233 board cause re-boot
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: redfox74 on September 24, 2012, 07:13:30 PM
@Kean,
i see the same kernel panic with wifi dongle . The main reason is that the wifi gateway is ,too far.
The wifi dongle request more energy and so happen something could be a problem on usb vcc . Or a problem of RF ,too near micro that produce some problem.
If your wifi gateway is near the board don't have problem.
Check it  , that is my problem
Best
Roberto
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: andersop on September 25, 2012, 12:19:34 AM
I used to get a lot of these, here are some samples:

This one actually happened during the boot process (after udev but before runlevel 5)
Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at 0xc1b3b7c8
LR is at __find_get_block+0x258/0x27c
pc : [<c1b3b7c8>]    lr : [<c00b43b0>]    psr: 60000013
sp : c3c8fdd0  ip : 00000000  fp : c388c780
r10: c344e800  r9 : 00000280  r8 : 00000000
r7 : 00000000  r6 : 00000000  r5 : c344ea80  r4 : c3cc5490
r3 : c3cc546c  r2 : c344eabc  r1 : c3cc5408  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43be4000  DAC: 00000015
Process rc (pid: 367, stack limit = 0xc3c8e270)
Stack: (0xc3c8fdd0 to 0xc3c90000)
fdc0:                                     000000d0 00000008 00000000 c387eeb8
fde0: 00000280 00000024 c3871b04 00011dfe c3871a90 c3d26140 c3871b04 c3888380
fe00: 00000000 c3871a90 c38e9000 c00e787c c3871a90 c3738418 c3c8fec8 c3c8fe60
fe20: c3d26140 c0097370 0000000f c3c8fec8 00000000 c3c8e000 c3bea00b 00000001
fe40: c3871a90 00000006 0000000b c0098018 000f47ec 8165e699 0000000f c3bea00b
fe60: c3888380 c38e9000 000f47ec c3c8fec8 c3c8fe90 c3c8e000 c3bea000 00000000
fe80: c3c8e000 000cf640 000e9a68 c00985a0 c3888380 c3879000 ffffff9c c3c8fec8
fea0: c3bea000 00000000 ffffff9c 00000001 000cf640 c00989f8 c3bea000 bebcc388
fec0: c3c8ff38 c0099408 c3888380 c38e9000 c3c8ffb0 c0026644 00000011 c3888380
fee0: c3879000 00000001 00000001 00000000 0000081f c0361ddc 000f47ec c3c8ffb0
ff00: ffffffff 00000000 ffffffff c002321c 00000007 00000008 c3c8ff60 c3c8e000
ff20: c3c8ff50 bebcc388 00000004 000000c3 c0023f04 c0092708 ffffff9c 000f7848
ff40: c3c8ff50 bebcc388 bebcc388 c00928e8 4eac2000 000000ae c0023f04 bebcbdac
ff60: 00000014 00000002 000f2e24 000000af c3c8e000 c0045de8 00000000 c0045f00
ff80: 00000000 c00486cc 00010000 00000000 00000000 00000000 00000008 ffffffff
ffa0: 000f7848 c0023d80 000f7848 bebcc388 000f7848 bebcc388 bebcc388 00000065
ffc0: 000f7848 bebcc388 00000004 000000c3 00000002 00000000 000cf640 000e9a68
ffe0: 000cb3f4 bebcc350 00086b68 4ea4fad4 20000010 000f7848 00000000 00000000
[<c00b43b0>] (__find_get_block+0x258/0x27c) from [<c3871a90>] (0xc3871a90)
Code: ffffffff ffffffff ffffffff ffffffff (ffffffff)
---[ end trace c4fd3b1bd08bfafd ]---


another one which is more typical of those I encounter while running my application:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3ce4000
[00000000] *pgd=43eae031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio92/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at __remove_hrtimer+0x9c/0xa4
LR is at __hrtimer_start_range_ns+0x68/0x27c
pc : [<c0053208>]    lr : [<c0053774>]    psr: 40000013
sp : c3ea9ea8  ip : 0000c350  fp : 00016af8
r10: 001e8480  r9 : 00000000  r8 : 00000000
r7 : c3ea9f40  r6 : 00000001  r5 : c3ea9f40  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : c3ea9f40
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43ce4000  DAC: 00000015
Process myproc.bin (pid: 773, stack limit = 0xc3ea8270)
Stack: (0xc3ea9ea8 to 0xc3eaa000)
9ea0:                   c0366fc0 c3ea9f40 001e8480 3b9aca00 001e8480 c0053774
9ec0: a0000093 c01757a4 fffffdfd c3caf1c0 bed92b54 c3ea9f40 c3ea8000 00000001
9ee0: 3b9aca00 001e8480 001e8480 c00539d0 0000c350 00000001 00000001 00000000
9f00: c3ea9f40 c0271b5c 0000c350 00000001 001f47d0 0000c350 00000000 00000000
9f20: 00000000 00000001 c3ea8000 c0054098 0000c350 00000000 001f47d0 00000000
9f40: 00000000 00000000 00000000 00000000 001f47d0 00000000 001e8480 00000000
9f60: c0052fe4 c0366fc0 00000000 00000000 c3caf740 00016af8 00000000 00016af8
9f80: 00016b08 000000a2 c0023f04 0000000f 00016af8 c00541a4 00000000 001e8480
9fa0: 00000000 c0023d80 00000000 00016af8 bed92b58 00000000 001e8480 00000000
9fc0: 00000000 00016af8 00016b08 000000a2 00000000 00000000 0000000f 00016af8
9fe0: 00000000 bed92b58 4ea59dcc 4ea2e31c 60000010 bed92b58 00000000 00000000
[<c0053208>] (__remove_hrtimer+0x9c/0xa4) from [<c0053774>] (__hrtimer_start_range_ns+0x68/0x27c)
[<c0053774>] (__hrtimer_start_range_ns+0x68/0x27c) from [<c00539d0>] (hrtimer_start_range_ns+0x20/0x28)
[<c00539d0>] (hrtimer_start_range_ns+0x20/0x28) from [<c0271b5c>] (do_nanosleep+0x7c/0xf4)
[<c0271b5c>] (do_nanosleep+0x7c/0xf4) from [<c0054098>] (hrtimer_nanosleep+0x94/0x118)
[<c0054098>] (hrtimer_nanosleep+0x94/0x118) from [<c00541a4>] (sys_nanosleep+0x88/0xa0)
[<c00541a4>] (sys_nanosleep+0x88/0xa0) from [<c0023d80>] (ret_fast_syscall+0x0/0x2c)
Code: e1a00007 e2861008 eb040377 e5878028 (e8bd81f0)
---[ end trace a3e7968b8036dc63 ]---
./run_app: line 11:   773 Segmentation fault      /usr/bin/myproc.bin -nowdt


I generally use a Belkin F2U047 USB-ethernet adapter for development (using kernel driver compiled with "CONFIG_USB_NET_AX8817X=y") and I thought that might be the problem, so I ran without it. But I still got the errors. I kept seeing the "last sysfs file" line relating to the GPIO file; at the time my user application was doing shell calls to query GPIO status (read and parse the /sys/class/gpio/* files), which required a number of child processes.

After I switched to the MMAP'd GPIO interface (posted on the mailing list a few weeks ago) I have seen this error happen much less often - it is NOT gone completely, but definitely less often.

I am running on a custom baseboard PCB that takes 9-24VAC/DC and drops it to 5VDC using a switching regulator, I have used the same regulator topology in other products (many with similar ARM chips) and never encountered this type of issue so I am not certain it is power related. Also note that it will happen regardless of the USB ethernet is connected or not.

Some other examples I recorded:

In this case my app did not seem to be affected, it was still responding after the error and I could shut it down cleanly:
Unable to handle kernel paging request at virtual address c383bc40
pgd = c3d2c000
[c383bc40] *pgd=4380041e(bad)
Internal error: Oops: 80d [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio5/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at fput+0x148/0x228
LR is at security_file_free+0x14/0x1c
pc : [<c008fdc4>]    lr : [<c012d204>]    psr: 80000013
sp : c3ce5f60  ip : 0000003f  fp : 00000000
r10: c3bd0960  r9 : c3ce4000  r8 : c383bc40
r7 : 00000000  r6 : 00000008  r5 : c3cb9618  r4 : c385abc0
r3 : 00002000  r2 : 00000000  r1 : c3838be0  r0 : c03953e0
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43d2c000  DAC: 00000015
Process sh (pid: 1584, stack limit = 0xc3ce4270)
Stack: (0xc3ce5f60 to 0xc3ce6000)
5f60: 00000000 00000000 00000000 c385abc0 00000000 c385c4a0 00000006 c0023f04
5f80: 4e995000 c008d33c c385c4a0 c385abc0 000200f4 c008d40c 000d09e0 000cf634
5fa0: 00000000 c0023d80 000cf634 00000000 00000005 00000005 00000000 000d09e0
5fc0: 000cf634 00000000 000200f4 00000006 00000000 00000000 4e995000 00000000
5fe0: 00000000 bec3cbb8 0001e954 4ea50d3c 60000010 00000005 00000000 00000000
[<c008fdc4>] (fput+0x148/0x228) from [<c008d33c>] (filp_close+0x64/0x70)
[<c008d33c>] (filp_close+0x64/0x70) from [<c008d40c>] (sys_close+0xc4/0x11c)
[<c008d40c>] (sys_close+0xc4/0x11c) from [<c0023d80>] (ret_fast_syscall+0x0/0x2c)
Code: e3530a02 1a000003 e59500f0 e3500000 (0a000000)
---[ end trace d220d5c97d19c019 ]---


Will post any others that I am able to capture as well as the circumstances that caused them.

For the time being my suggestion would be to move to the MMIO implementation instead of the /sys/class/gpio/* files and see if that helps. Maybe it's just placebo effect but I really think that helped my situation.

---
Quote from: davidjf2001 on September 23, 2012, 05:24:15 PM
If you suspect DRAM timing is on some critical edge trying heating and cooling the board. Does this make any difference? Otherwise could be a power supply issue.  Despite your lab supply, solder some bulk capacitance on the board itself.

Any suggestions on where to place this? I'm having some other serious issues with noise on the audio output (will post a thread on here shortly) and I'm interesting in trying to add some more capacitance to see if would help.

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: davidjf2001 on September 25, 2012, 02:57:34 AM
Lead inductance is something to be concerned about with modern processors and high transient currents.  You may have a supply with high output current but the inductance of wires or traces can still wreck havoc with high speed transients. I would try putting bulk capacitors as close to the sdram bypass caps on the olimex board.  Before doing this though, do you have issues with other software, can you compile large files with GCC?  Did you try temperature tests? Add a fan, check with heat gun?

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 25, 2012, 05:03:42 PM
I wanted to mention that I have the same problem.  9 times out of 10, the kernel recovers, and I can continue working without a reboot.  However, I have never been able to successfully compile the kernel on the micro.  I can't even get it to run under load long enough to "make scripts" in the kernel sources.

I have a 5V 2A radio shack adjustable wall wart power supply, so I don't think it has anything to do with lack of power.

I've let the micro sit idle for the past 30 minutes.  During that time, several Oops have happened:


[ 3500.470000] Unable to handle kernel paging request at virtual address bec469f8
[ 3500.470000] pgd = c3a78000
[ 3500.480000] [bec469f8] *pgd=43ac3831, *pte=00000000, *ppte=00000000
[ 3500.480000] Internal error: Oops: 80000005 [#1] ARM
[ 3500.480000] Modules linked in:
[ 3500.480000] CPU: 0    Not tainted  (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 3500.480000] PC is at 0xbec469f8
[ 3500.480000] LR is at 0x0
[ 3500.480000] pc : [<bec469f8>]    lr : [<00000000>]    psr: 00000013
[ 3500.480000] sp : c3a2de9c  ip : 00000018  fp : 00000000
[ 3500.480000] r10: 00000008  r9 : ffc4ab77  r8 : 00000000
[ 3500.480000] r7 : 501ea721  r6 : 00000000  r5 : 50525977  r4 : 00000000
[ 3500.480000] r3 : 00000008  r2 : ffffff88  r1 : c3a2df20  r0 : bec4ab70
[ 3500.480000] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 3500.480000] Control: 0005317f  Table: 43a78000  DAC: 00000015
[ 3500.480000] Process crond (pid: 326, stack limit = 0xc3a2c270)
[ 3500.480000] Stack: (0xc3a2de9c to 0xc3a2e000)
[ 3500.480000] de80:                                                                bec4ab10
[ 3500.480000] dea0: 50525977 00000000 00001000 bec4ab10 00000068 00000000 c3a2df50 c00c9000
[ 3500.480000] dec0: 0000b304 00000000 00000000 00004372 000081a4 00000002 00000000 00000000
[ 3500.480000] dee0: 00000000 00000000 00000000 00000000 00000de7 00000000 00001000 00000000
[ 3500.480000] df00: 00000008 00000000 50525977 00000000 501ea721 00000000 5051ecec 00000000
[ 3500.480000] df20: 00004372 00000000 c3a2df50 bec4ab10 00000000 00000000 000000c3 c000e9c8
[ 3500.480000] df40: c3a2c000 00000000 bec4ab9c c00c9480 00004372 00000000 0b300004 c3a281a4
[ 3500.480000] df60: 00000002 00000000 00000000 00000000 00000de7 00000000 50525977 00000000
[ 3500.480000] df80: 501ea721 00000000 5051ecec 00000000 00001000 c000e918 00000008 00000000
[ 3500.480000] dfa0: b6f52228 c000e820 b6f52228 00000000 b6f52228 bec4ab10 bec4ab10 00000000
[ 3500.480000] dfc0: b6f52228 00000000 00000000 000000c3 00000001 b6f66000 ffffba92 bec4ab9c
[ 3500.480000] dfe0: 00000000 bec4aab0 b6ec2328 b6ef23b8 20000010 b6f52228 00000000 00000000
[ 3500.660000] Code: 00000000 00000000 00000000 00000000 (00000000)
[ 3500.660000] ---[ end trace 7f11451ac6811422 ]---
1969 Dec 31 19:03:47 micro [ 3500.480000] Internal error: Oops: 80000005 [#1] ARM
1969 Dec 31 19:03:47 micro [ 3500.480000] Process crond (pid: 326, stack limit = 0xc3a2c270)
1969 Dec 31 19:03:47 micro [ 3500.480000] Stack: (0xc3a2de9c to 0xc3a2e000)
1969 Dec 31 19:03:47 micro [ 3500.480000] de80:                                                                bec4ab10
1969 Dec 31 19:03:47 micro [ 3500.480000] dea0: 50525977 00000000 00001000 bec4ab10 00000068 00000000 c3a2df50 c00c9000
1969 Dec 31 19:03:47 micro [ 3500.480000] dec0: 0000b304 00000000 00000000 00004372 000081a4 00000002 00000000 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dee0: 00000000 00000000 00000000 00000000 00000de7 00000000 00001000 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df00: 00000008 00000000 50525977 00000000 501ea721 00000000 5051ecec 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df20: 00004372 00000000 c3a2df50 bec4ab10 00000000 00000000 000000c3 c000e9c8
1969 Dec 31 19:03:47 micro [ 3500.480000] df40: c3a2c000 00000000 bec4ab9c c00c9480 00004372 00000000 0b300004 c3a281a4
1969 Dec 31 19:03:47 micro [ 3500.480000] df60: 00000002 00000000 00000000 00000000 00000de7 00000000 50525977 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df80: 501ea721 00000000 5051ecec 00000000 00001000 c000e918 00000008 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dfa0: b6f52228 c000e820 b6f52228 00000000 b6f52228 bec4ab10 bec4ab10 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dfc0: b6f52228 00000000 00000000 000000c3 00000001 b6f66000 ffffba92 bec4ab9c
1969 Dec 31 19:03:47 micro [ 3500.480000] dfe0: 00000000 bec4aab0 b6ec2328 b6ef23b8 20000010 b6f52228 00000000 00000000
1969 Dec 31 19:03:47 micro [ 3500.660000] Code: 00000000 00000000 00000000 00000000 (00000000)
[ 4256.890000] Internal error: Oops - undefined instruction: 0 [#2] ARM
[ 4256.890000] Modules linked in:
[ 4256.890000] CPU: 0    Tainted: G      D       (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 4256.890000] PC is at 0xc19d9c88
[ 4256.890000] LR is at 0xb6f35a10
[ 4256.890000] pc : [<c19d9c88>]    lr : [<b6f35a10>]    psr: 20000093
[ 4256.890000] sp : c3bc3fb0  ip : b6f35a40  fp : b6f39814
[ 4256.890000] r10: b6f422a0  r9 : b6f39820  r8 : b6f422a0
[ 4256.890000] r7 : 00000000  r6 : ffffffff  r5 : 20000010  r4 : b6f35a10
[ 4256.890000] r3 : ffffffff  r2 : 00000010  r1 : 00000009  r0 : c05021ac
[ 4256.890000] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 4256.890000] Control: 0005317f  Table: 43b9c000  DAC: 00000015
[ 4256.890000] Process mandb (pid: 15501, stack limit = 0xc3bc2270)
[ 4256.890000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
[ 4256.890000] 3fa0:                                     ffffffff 00000009 00000010 00b13188
[ 4256.890000] 3fc0: 00000000 00d1ccb0 00000f9e 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
[ 4256.890000] 3fe0: b6f35a40 beaab290 b6e16f98 b6f35a10 20000010 ffffffff 532e0a2e 74222053
[ 4256.890000] Code: 0004017b 00000000 00000017 00000035 (ffffffff)
[ 4256.890000] ---[ end trace 7f11451ac6811423 ]---
1969 Dec 31 19:16:23 micro [ 4256.890000] Internal error: Oops - undefined instruction: 0 [#2] ARM
1969 Dec 31 19:16:23 micro [ 4256.890000] Process mandb (pid: 15501, stack limit = 0xc3bc2270)
1969 Dec 31 19:16:23 micro [ 4256.890000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fa0:                                     ffffffff 00000009 00000010 00b13188
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fc0: 00000000 00d1ccb0 00000f9e 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fe0: b6f35a40 beaab290 b6e16f98 b6f35a10 20000010 ffffffff 532e0a2e 74222053
1969 Dec 31 19:16:23 micro [ 4256.890000] Code: 0004017b 00000000 00000017 00000035 (ffffffff)
[ 4431.640000] Unable to handle kernel paging request at virtual address 20000010
[ 4431.640000] pgd = c3bf8000
[ 4431.640000] [20000010] *pgd=00000000
[ 4431.640000] Internal error: Oops: 805 [#3] ARM
[ 4431.640000] Modules linked in:
[ 4431.640000] CPU: 0    Tainted: G      D       (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 4431.640000] PC is at 0xc339e02c
[ 4431.640000] LR is at 0xb6f35a10
[ 4431.640000] pc : [<c339e02c>]    lr : [<b6f35a10>]    psr: 20000093
[ 4431.640000] sp : c3bc3fb0  ip : 00000000  fp : b6f39814
[ 4431.640000] r10: b6f422a0  r9 : b6f39820  r8 : b6f422a0
[ 4431.640000] r7 : 3f0c2356  r6 : ffffffff  r5 : 20000010  r4 : b6f35a10
[ 4431.640000] r3 : ffffffff  r2 : 00000000  r1 : 00000001  r0 : c05021ac
[ 4431.640000] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 4431.640000] Control: 0005317f  Table: 43bf8000  DAC: 00000015
[ 4431.640000] Process mandb (pid: 19738, stack limit = 0xc3bc2270)
[ 4431.640000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
[ 4431.640000] 3fa0:                                     ffffffff 00000001 00000000 00d285a8
[ 4431.640000] 3fc0: 00000000 00d89898 00002296 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
[ 4431.640000] 3fe0: 00000000 beaab5b8 b6f359a0 b6f35a10 20000010 ffffffff 65637865 20736465
[ 4431.640000] Code: 43093631 675f5f09 705f756e 3a736462 (7465643a)
[ 4431.640000] ---[ end trace 7f11451ac6811424 ]---
1969 Dec 31 19:19:18 micro [ 4431.640000] Internal error: Oops: 805 [#3] ARM
1969 Dec 31 19:19:18 micro [ 4431.640000] Process mandb (pid: 19738, stack limit = 0xc3bc2270)
1969 Dec 31 19:19:18 micro [ 4431.640000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fa0:                                     ffffffff 00000001 00000000 00d285a8
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fc0: 00000000 00d89898 00002296 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fe0: 00000000 beaab5b8 b6f359a0 b6f35a10 20000010 ffffffff 65637865 20736465
1969 Dec 31 19:19:18 micro [ 4431.640000] Code: 43093631 675f5f09 705f756e 3a736462 (7465643a)


These seem to be the usual Oops I get.  Since they happen both at idle and under load, they seem to be only marginally related to temperature (higher loads do seem to increase the frequency).  I am a little worried about pointing my 300 degree C hot air station at it to confirm or deny the impact of heat... maybe I'll try a hair dryer.

I did find that when I unplugged my RTL8187B wlan adapter, the frequency and severity of the Oops decreased greatly.  However, I don't see any correlation between /sys/class/gpio usage or my mmap gpio and the number of Oops.

After switching to only DUART for development, leaving nothing plugged in to USB, I still get messages similar to the above on a fairly consistent basis, both idle and under load, so I think there is still an issue.

I am running an up-to-date Arch Linux Arm distro with the usb ethernet inet scripts for the maxi disabled, and a 3.6rc2 mainline kernel with patches from koliqi (sp?).  I do have several additional features configured in the kernel, but almost all were built as modules and none are loaded at the time of the Oops.  I saw it on the old 2.6 kernel too.

I have 2 more micros that I haven't pulled out of the box yet.  When I get an opportunity (or when I get frustrated enough by the one I'm working on), I'll pull them out and see if they fare any better with the same SD card.

I also suspect a hardware issue somewhere, given the seemingly random nature of the Oops.  Is there an easy way to turn down the memory clock in linux, or am I going to need to mmap the clock registers and backdoor it to test that theory?

I've got plenty of small ceramic and larger electrolytic capacitors here, but not sure where to solder them to test the capacitance theories... if anybody can give me a pointer, I'll try that one as well.  I'm more of a software guy.  If I understand "try putting bulk capacitors as close to the sdram bypass caps on the olimex board" correctly, I think I could solder one of my 10 uF electrolytic capacitors in parallel across C36 (making sure I get the negative lead on the VSS - have to look at the board layout to see which side that is).  Is that correct?

Any other ideas?  I should have some time to try in the next week or two, if olimex doesn't do it themselves before I do.  Has anyone at olimex seen this issue?
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: fred on September 25, 2012, 10:11:58 PM
Same Problems on 2 iMX233-OLinuXino-MICRO.

I am using the last Arch Linux Arm distro.

Compiling the kernel on the iMX233-OLinuXino-MICRO
is not possible ( a swapfile was enabled ).

A "top -d 0" is crashing after some minutes.


Power supply and serial connection adapter are working correct
on another system.


The iMX233-OLinuXino-MICRO board works not correct.
Are there any activity from Olimex for solving this problem ?




PowerPrep start initialize power...
Battery Voltage = 0.68V
No battery or bad battery                                       
detected!!!.Disabling battery   voltage measurements./r/nLLCAug 22 201215:25:39
EMI_CTRL 0x1C084040
FRAC 0x92926192
init_ddr_mt46v32m16_133Mhz
power 0x00820710
Frac 0x92926192
start change cpu freq
hbus 0x00000003
cpu 0x00010001
LLLLLLLFCLJUncompressing Linux... done, booting the kernel.
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.35-6-ARCH+ (kiril@plug) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 PREEMPT Fri Aug 31 14:22:01 EEST 2012
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
CPU: VIVT data cache, VIVT instruction cache
Machine: iMX233-OLinuXino low cost board
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16256
Kernel command line: console=ttyAMA0,115200 root=/dev/mmcblk0p2 rw rootwait ssp1=mmc lcd_panel=tvenc_pal no_console_suspend
PID hash table entries: 256 (order: -2, 1024 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
allocated 327680 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Memory: 64MB = 64MB total
Memory: 56392k/56392k available, 9144k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xfde00000 - 0xffe00000   (  32 MB)
    vmalloc : 0xc4800000 - 0xf0000000   ( 696 MB)
    lowmem  : 0xc0000000 - 0xc4000000   (  64 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .init : 0xc0008000 - 0xc0028000   ( 128 kB)
      .text : 0xc0028000 - 0xc03b3000   (3628 kB)
      .data : 0xc03ce000 - 0xc03f9140   ( 173 kB)
Hierarchical RCU implementation.
        RCU-based detection of stalled CPUs is disabled.
        Verbose stalled-CPUs detection is disabled.
NR_IRQS:224
Console: colour dummy device 80x30
console [ttyAMA0] enabled
Calibrating delay loop... 226.09 BogoMIPS (lpj=1130496)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
CPU: Testing write buffer coherency: ok
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
regulator: vddd: 800 <--> 1575 mV at 1550 mV fast normal
regulator: vdddbo: 800 <--> 1575 mV fast normal
regulator: vdda: 1500 <--> 2275 mV at 1750 mV fast normal
regulator: vddio: 2800 <--> 3575 mV at 3300 mV fast normal
regulator: overall_current: fast normal
regulator: mxs-duart-1: fast normal
regulator: mxs-bl-1: fast normal
regulator: mxs-i2c-1: fast normal
regulator: mmc_ssp-1: fast normal
regulator: mmc_ssp-2: fast normal
regulator: charger-1: fast normal
regulator: power-test-1: fast normal
regulator: cpufreq-1: fast normal
i.MX IRAM pool: 28 KB@0xc4808000
usb: DR gadget (utmi) registered
bio: create slab <bio-0> at 0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Advanced Linux Sound Architecture Driver Version 1.0.23.
Switching to clocksource mxs clock source
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 4096K
Bus freq driver module loaded
mxs_cpu_init: cpufreq init finished
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
msgmni has been set to 118
alg: No test for stdrng (krng)
cryptodev: driver loaded.
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Console: switching to colour frame buffer device 90x36
mxs-duart.0: ttyAMA0 at MMIO 0x80070000 (irq = 0) is a DebugUART
mxs-auart.1: ttySP1 at MMIO 0x8006c000 (irq = 24) is a mxs-auart.1
Found APPUART 3.0.0
brd: module loaded
loop: module loaded
usbcore: registered new interface driver smsc95xx
usbmon: debugfs is not available
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
fsl-ehci fsl-ehci: Freescale On-Chip EHCI Host Controller
fsl-ehci fsl-ehci: new USB bus registered, assigned bus number 1
fsl-ehci fsl-ehci: irq 11, io base 0x80080000
fsl-ehci fsl-ehci: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Freescale On-Chip EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.35-6-ARCH+ ehci_hcd
usb usb1: SerialNumber: fsl-ehci
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver libusual
ARC USBOTG Device Controller driver (1 August 2005)
udc: request mem region for fsl-usb2-udc failed
fsl-usb2-udc: probe of fsl-usb2-udc failed with error -16
mice: PS/2 mouse device common for all mice
MXS RTC driver v1.0 hardware v2.0.0
mxs-rtc mxs-rtc.0: rtc core: registered mxs-rtc as rtc0
i2c /dev entries driver
WARNING : No battery connected !
Aborting power driver initialization
mxs-battery: probe of mxs-battery.0 failed with error 1
mxs watchdog: initialized, heartbeat 19 sec
mxs-mmc: MXS SSP Controller MMC Interface driver
ssp_set_rate: error -110
mxs-mmc mxs-mmc.0: mmc0: MXS SSP MMC DMAIRQ 14 ERRIRQ 15
dcp dcp.0: DCP crypto enabled.!
mxs-adc-audio mxs-adc-audio.0: MXS ADC/DAC Audio Codec
No device for DAI mxs adc/dac
No device for DAI mxs adc/dac
asoc: mxs adc/dac <-> mxs adc/dac mapping ok
ALSA device list:
  #0: MXS EVK (mxs adc/dac)
TCP cubic registered
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
registered taskstats version 1
mxs-rtc mxs-rtc.0: setting system clock to 1970-01-01 00:00:10 UTC (10)
RAMDISK: Couldn't find valid RAM disk image starting at 0.
Waiting for root device /dev/mmcblk0p2...
mmc0: new high speed SDHC card at address e624
mmcblk0: mmc0:e624 SU16G 14.8 GiB
mmcblk0: p1 p2
EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext2 filesystem) on device 179:2.
devtmpfs: mounted
Freeing init memory: 128K
INIT: version 2.88 booting

> Arch Linux ARM

> http://www.archlinuxarm.org

   ------------------------------
:: Mounting Root Read-Only                                               [BUSY] EXT4-fs (mmcblk0p2): re-mounted. Opts: barrier=1,data=ordered
                                                                         [DONE]
:: Adjusting system time and setting kernel timezone                     [DONE]
:: Starting UDev Daemon                                                  [BUSY] <30>systemd-udevd[63]: starting version 186
                                                                         [DONE]
:: Triggering UDev uevents                                               [DONE]
:: Loading User-specified Modules                                        [DONE]
:: Waiting for UDev uevents to be processed                              [DONE]
:: Configuring Virtual Consoles                                          [DONE]
:: Bringing up loopback interface                                        [DONE]
:: Unlocking encrypted volumes                                           [DONE]
:: Checking Filesystems                                                  [DONE]
:: Remounting Root and API filesystems                                   [BUSY] EXT4-fs (mmcblk0p2): re-mounted. Opts: barrier=1,data=ordered
                                                                         [DONE]
:: Mounting Local Filesystems                                            [DONE]
:: Activating Swap                                                       [DONE]
:: Configuring Time Zone                                                 [DONE]
:: Initializing Random Seed                                              [DONE]
:: Removing Leftover Files                                               [DONE]
:: Setting Hostname: alarm                                               [DONE]
:: Saving dmesg Log                                                      [DONE]
INIT: Entering runlevel: 3
:: Starting Syslog-NG                                                    [DONE]
:: Starting Network                                                      [BUSY]
Error: unknown interface in /etc/rc.conf: `usb0'
                                                                         [DONE]
:: Mounting Network Filesystems                                          [DONE]
:: Starting crond daemon                                                 [DONE]
:: Starting Secure Shell Daemon                                          [DONE]

Arch Linux 2.6.35-6-ARCH+ (ttyAMA0)

alarm login: root
Password:



[root@alarm ~]# cat /proc/cpuinfo
Processor       : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 226.09
Features        : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5

Hardware        : iMX233-OLinuXino low cost board
Revision        : 0000
Serial          : 0000000000000000
[root@alarm ~]# cat /proc/version
Linux version 2.6.35-6-ARCH+ (kiril@plug) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 PREEMPT Fri Aug 31 14:22:01 EEST 2012
[root@alarm ~]# cat /proc/meminfo
MemTotal:          60616 kB
MemFree:           35924 kB
Buffers:            3996 kB
Cached:            12780 kB
SwapCached:            0 kB
Active:             7864 kB
Inactive:          11900 kB
Active(anon):       2996 kB
Inactive(anon):       48 kB
Active(file):       4868 kB
Inactive(file):    11852 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:          3000 kB
Mapped:             4084 kB
Shmem:                60 kB
Slab:                  0 kB
SReclaimable:          0 kB
SUnreclaim:            0 kB
KernelStack:         288 kB
PageTables:          204 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       30308 kB
Committed_AS:       7224 kB
VmallocTotal:     712704 kB
VmallocUsed:          52 kB
VmallocChunk:     712260 kB



...


[root@alarm linux-2.6.35.3]# make clean
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = c0004000
[0000000c] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio65/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35-6-ARCH+ #1)
PC is at unmap_vmas+0x290/0x6a8
LR is at unmap_vmas+0x240/0x6a8
pc : [<c0094104>]    lr : [<c00940b4>]    psr: 60000013
sp : c1615e90  ip : 00040495  fp : 405ad3cf
r10: c0b3bcc0  r9 : c1655ec4  r8 : 00000000
r7 : 00004000  r6 : c03d2260  r5 : c3bbcf48  r4 : 405b0000
r3 : 00000000  r2 : 405ad3cf  r1 : 405b0000  r0 : c041c5a0
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 4165c000  DAC: 00000015
Process as (pid: 1606, stack limit = 0xc1614270)
Stack: (0xc1615e90 to 0xc1616000)
5e80:                                     c041e780 00000000 405cb000 405cb000
5ea0: 405cb000 c1614000 c165d014 c1615f10 c165d014 c1655ec0 405ac3cf 00000000
5ec0: 00000000 ffffffff 00000001 405cafff 00100100 00000000 fffffffd 00000000
5ee0: c38922f8 c0b3bcc0 00000000 c0b3bcc0 c3cc1200 00000001 c1614000 03c1a13f
5f00: 0009030c c0098820 c1615f14 00000000 c03d2260 000003ba c0b3bcc0 00000000
5f20: 00000000 c0b3bcf4 00000001 c0040e7c 00000001 c0b3bcc0 c0b3b700 c004671c
5f40: c0b3b700 c03dd7d8 c0b3b700 c0b3b700 c1614000 00000000 000000f8 c00468c4
5f60: 00000000 00095a70 c0b3b864 c0b3ba80 00000000 c1614000 000000f8 c0028f44
5f80: c1614000 03c1a13f 0009030c c00470a8 00095a84 00000000 4025173c c00470f4
5fa0: 00000000 c0028dc0 00095a84 00000000 00000000 00095a70 00000008 00000000
5fc0: 00095a84 00000000 4025173c 000000f8 0008fc90 0008fcf0 03c1a13f 0009030c
5fe0: 00000001 bed1d9f0 40154f9c 401bd614 60000010 00000000 40c0a9a0 00000000
[<c0094104>] (unmap_vmas+0x290/0x6a8) from [<c0098820>] (exit_mmap+0xd0/0x204)
[<c0098820>] (exit_mmap+0xd0/0x204) from [<c0040e7c>] (mmput+0x38/0x108)
[<c0040e7c>] (mmput+0x38/0x108) from [<c004671c>] (exit_mm+0x13c/0x140)
[<c004671c>] (exit_mm+0x13c/0x140) from [<c00468c4>] (do_exit+0x1a4/0x64c)
[<c00468c4>] (do_exit+0x1a4/0x64c) from [<c00470a8>] (do_group_exit+0xac/0xe8)
[<c00470a8>] (do_group_exit+0xac/0xe8) from [<c00470f4>] (__wake_up_parent+0x0/0x18)
Code: e59b2014 e5981008 e1520001 3a000067 (e598100c)
---[ end trace 5fb667f282fd4036 ]---

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on September 25, 2012, 11:03:28 PM
Hi
may I ask what SD card do you use with MICRO? The cards we sell or your own?
there are many fake SD cards on the market and here is blog from Chumby author which also uses iMX233 about this http://www.bunniestudios.com/blog/?p=918
let me know what we have to do to re-produce your results as we never encounter such problems in our lab
Tsvetan
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 26, 2012, 12:38:07 AM
Admittedly, I am using some older microSD cards pulled from some old smart phones.  However, I have tried 2 different microSD cards with the same issues.  One older 2GB one that was in service for about 2 years before being basically shelved for a year, and one newer 16GB one that was only used for a few months.  I'll add the brands when I get home tonight.  What brand are the cards you sell?  Would you recommend AmazonBasics?

I bought 3 of the very first olinuxino-micro's released, before the switch to hardware i2c default jumper positions.  Were there any other hardware changes in that batch?

I use the DUART for all my communication needs - even ZMODEM for file transfers.  Though again, the system was sitting idle without even any console messages when the Oops in my previous post occurred.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: andersop on September 26, 2012, 02:12:34 AM
Quote from: olimex on September 25, 2012, 11:03:28 PM
may I ask what SD card do you use with MICRO? The cards we sell or your own?
there are many fake SD cards on the market and here is blog from Chumby author which also uses iMX233 about this http://www.bunniestudios.com/blog/?p=918

Yes I am familiar with the topic, I have been buying SDcards in bulk for several products for some time now. In the course of collecting the data to post I got another interesting kernel panic - this time actually during the boot process (just after init):

INIT: version 2.88 booting
Unable to handle kernel NULL pointer dereference at virtual address 0000016a
pgd = c3be4000
[0000016a] *pgd=438cb031, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at cfq_should_idle+0x68/0xb4
LR is at cfq_dispatch_requests+0x550/0x7d4
pc : [<c014908c>]    lr : [<c014c0e0>]    psr: 00000093
sp : c38c3f68  ip : c38bb028  fp : 00000000
r10: 00000001  r9 : 00000000  r8 : 00000002
r7 : c3bec1b0  r6 : ffff8c8a  r5 : c3bec1b0  r4 : c3bec1b0
r3 : 000001a1  r2 : 00000001  r1 : 00000002  r0 : c3bec1b0
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 43be4000  DAC: 00000017
Process mmcqd (pid: 360, stack limit = 0xc38c2270)
Stack: (0xc38c3f68 to 0xc38c4000)
3f60:                   c3bec0c8 c014c0e0 c014bb90 c38c2000 c38b79a0 00000000
3f80: c38c2000 c38b79a0 00000000 00000000 00000000 c01419d4 c38b79a0 00000001
3fa0: c38c2000 c38bcc84 c38bcc8c c01419fc c38c2000 c01e198c 00000000 c3c85e30
3fc0: c01e1918 c38bcc84 00000013 00000000 00000000 c004f95c 00000000 00000000
3fe0: c38c3fe0 c38c3fe0 c3c85e30 c004f8e4 c00247c4 c00247c4 00000000 00000000
[<c014908c>] (cfq_should_idle+0x68/0xb4) from [<c014c0e0>] (cfq_dispatch_requests+0x550/0x7d4)
[<c014c0e0>] (cfq_dispatch_requests+0x550/0x7d4) from [<c01419d4>] (blk_peek_request+0x194/0x1b4)
[<c01419d4>] (blk_peek_request+0x194/0x1b4) from [<c01419fc>] (blk_fetch_request+0x8/0x1c)
[<c01419fc>] (blk_fetch_request+0x8/0x1c) from [<c01e198c>] (mmc_queue_thread+0x74/0x108)
[<c01e198c>] (mmc_queue_thread+0x74/0x108) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
Code: e5913004 e3130020 0a000006 e5901000 (e5911168)
---[ end trace 8b42fe0adb163e2c ]---
note: mmcqd[360] exited with preempt_count 1
Unable to handle kernel NULL pointer dereference at virtual address 00000234
pgd = c0004000
[00000234] *pgd=00000000
Internal error: Oops: 17 [#2] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Tainted: G      D      (2.6.35.3_OLinuXino #1)
PC is at cfq_set_request+0x38/0x4e0
LR is at elv_set_request+0x1c/0x2c
pc : [<c014aed0>]    lr : [<c013f040>]    psr: 20000013
sp : c3c21df8  ip : 00000000  fp : 00000000
r10: 00000001  r9 : c3bec558  r8 : 010c0001
r7 : 00000001  r6 : c38bb020  r5 : c3bec558  r4 : c38b79a0
r3 : 00000000  r2 : 00000010  r1 : 010c0001  r0 : 00000010
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 43be4000  DAC: 00000017
Process sync_supers (pid: 138, stack limit = 0xc3c20270)
Stack: (0xc3c21df8 to 0xc3c22000)
1de0:                                                       00200200 00000001
1e00: 010c0001 c3c20000 00040001 c38b79a0 c3bec558 00000000 00000001 010c0001
1e20: 00000010 00000001 00000000 c013f040 00011200 c014120c c38b79a0 c38b79a0
1e40: 00040001 00000000 00040001 00000000 000f1f00 c01414c8 c3bec078 c3bec4e0
1e60: c3c21ea4 000085e2 00000000 c013e9b8 c38b79a0 c38b79a0 000004c1 00000400
1e80: 00000000 00000000 00000000 000f1f00 00000000 c0141e50 c38b79a0 c3bec4e0
1ea0: 000f1efe c0141b84 c3bec4e0 c38b79a0 00000002 c3c20000 00000002 c01403c0
1ec0: 000e98b0 00000000 00000002 00000000 00000002 c3c40dc0 c3c52b00 00011210
1ee0: c3c21f18 00000010 00000000 c0050038 00000000 00000000 00011200 c3bec4e0
1f00: c3bec4e0 00000001 00000001 00000000 00000000 00000000 00000000 c0140528
1f20: c03655b0 c3886330 c3c40da0 000004c1 00000001 00000010 00000000 c00b6e68
1f40: c3bec4e0 c00b7690 00000010 0000000f c3886330 c3886330 c3bec4e0 000004c1
1f60: 00000001 00000000 00000000 c00b1a74 c3886330 00000002 c3400400 c00b5b88
1f80: 00000000 c3886330 c3c8c640 c00eb520 00000001 c3c8c640 c3c20000 c3c8c680
1fa0: c036c8a4 c00eb5b4 c3c8c640 c009122c c3c7d4c0 c3c20000 00000001 00000000
1fc0: 00000013 c007c9d8 00000000 c3c0ff78 c007c9a4 c004f95c 00000000 00000000
1fe0: c3c21fe0 c3c21fe0 c3c0ff78 c004f8e4 c00247c4 c00247c4 00000000 00000000
[<c014aed0>] (cfq_set_request+0x38/0x4e0) from [<c013f040>] (elv_set_request+0x1c/0x2c)
[<c013f040>] (elv_set_request+0x1c/0x2c) from [<c014120c>] (get_request.isra.35+0x1cc/0x2a8)
[<c014120c>] (get_request.isra.35+0x1cc/0x2a8) from [<c01414c8>] (get_request_wait.isra.36+0x20/0x120)
[<c01414c8>] (get_request_wait.isra.36+0x20/0x120) from [<c0141e50>] (__make_request+0x2cc/0x434)
[<c0141e50>] (__make_request+0x2cc/0x434) from [<c01403c0>] (generic_make_request+0x22c/0x298)
[<c01403c0>] (generic_make_request+0x22c/0x298) from [<c0140528>] (submit_bio+0xfc/0x118)
[<c0140528>] (submit_bio+0xfc/0x118) from [<c00b1a74>] (submit_bh+0x17c/0x1b4)
[<c00b1a74>] (submit_bh+0x17c/0x1b4) from [<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c)
[<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c) from [<c00eb520>] (ext4_commit_super+0x100/0x17c)
[<c00eb520>] (ext4_commit_super+0x100/0x17c) from [<c00eb5b4>] (ext4_write_super+0x18/0x24)
[<c00eb5b4>] (ext4_write_super+0x18/0x24) from [<c009122c>] (sync_supers+0xb4/0x114)
[<c009122c>] (sync_supers+0xb4/0x114) from [<c007c9d8>] (bdi_sync_supers+0x34/0x48)
[<c007c9d8>] (bdi_sync_supers+0x34/0x48) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
Code: e58d3004 e5963000 e58d1008 e1a00002 (e5931234)
---[ end trace 8b42fe0adb163e2d ]---
------------[ cut here ]------------
WARNING: at kernel/exit.c:896 do_exit+0x30/0x674()
Modules linked in:
[<c00280fc>] (unwind_backtrace+0x0/0xe0) from [<c0039f00>] (warn_slowpath_common+0x4c/0x64)
[<c0039f00>] (warn_slowpath_common+0x4c/0x64) from [<c0039f30>] (warn_slowpath_null+0x18/0x1c)
[<c0039f30>] (warn_slowpath_null+0x18/0x1c) from [<c003d164>] (do_exit+0x30/0x674)
[<c003d164>] (do_exit+0x30/0x674) from [<c0026f2c>] (die+0x2c4/0x304)
[<c0026f2c>] (die+0x2c4/0x304) from [<c026ec04>] (__do_kernel_fault.part.4+0x54/0x74)
[<c026ec04>] (__do_kernel_fault.part.4+0x54/0x74) from [<c0028fc4>] (do_page_fault+0x1ec/0x204)
[<c0028fc4>] (do_page_fault+0x1ec/0x204) from [<c002321c>] (do_DataAbort+0x34/0x98)
[<c002321c>] (do_DataAbort+0x34/0x98) from [<c002392c>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc3c21db0 to 0xc3c21df8)
1da0:                                     00000010 010c0001 00000010 00000000
1dc0: c38b79a0 c3bec558 c38bb020 00000001 010c0001 c3bec558 00000001 00000000
1de0: 00000000 c3c21df8 c013f040 c014aed0 20000013 ffffffff
[<c002392c>] (__dabt_svc+0x4c/0x60) from [<c014aed0>] (cfq_set_request+0x38/0x4e0)
[<c014aed0>] (cfq_set_request+0x38/0x4e0) from [<c013f040>] (elv_set_request+0x1c/0x2c)
[<c013f040>] (elv_set_request+0x1c/0x2c) from [<c014120c>] (get_request.isra.35+0x1cc/0x2a8)
[<c014120c>] (get_request.isra.35+0x1cc/0x2a8) from [<c01414c8>] (get_request_wait.isra.36+0x20/0x120)
[<c01414c8>] (get_request_wait.isra.36+0x20/0x120) from [<c0141e50>] (__make_request+0x2cc/0x434)
[<c0141e50>] (__make_request+0x2cc/0x434) from [<c01403c0>] (generic_make_request+0x22c/0x298)
[<c01403c0>] (generic_make_request+0x22c/0x298) from [<c0140528>] (submit_bio+0xfc/0x118)
[<c0140528>] (submit_bio+0xfc/0x118) from [<c00b1a74>] (submit_bh+0x17c/0x1b4)
[<c00b1a74>] (submit_bh+0x17c/0x1b4) from [<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c)
[<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c) from [<c00eb520>] (ext4_commit_super+0x100/0x17c)
[<c00eb520>] (ext4_commit_super+0x100/0x17c) from [<c00eb5b4>] (ext4_write_super+0x18/0x24)
[<c00eb5b4>] (ext4_write_super+0x18/0x24) from [<c009122c>] (sync_supers+0xb4/0x114)
[<c009122c>] (sync_supers+0xb4/0x114) from [<c007c9d8>] (bdi_sync_supers+0x34/0x48)
[<c007c9d8>] (bdi_sync_supers+0x34/0x48) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
---[ end trace 8b42fe0adb163e2e ]---


Anyway: a quick and easy way to access this data while running the card in the OLinuxIno is to check under /sys/block/mmcblk0/device, you will see a number of files which you can cat to read back the data. The two cards I am using are labelled as SanDisk, 512MB. Here's what I get from reading the card data:

Card #1: (primary development card, sporadic crashes, frequency varies)
date: 11/2007
fwrev: 0x0
hwrev: 0x8
manfid: 0x000003
name: SU512
oemid: 0x5344 ("SD" in ASCII)
scr: 0125000000000000
serial: 0x2088ebae

Card #2: backup card - similar markings, the back side looks a bit different, both say "Made in Taiwan". Haven't used this card much, but it did crash on the first load when I put it in to read the data...
date: 06/2008
fwrev: 0x0
hwrev: 0x8
manfid: 0x000003
name: SU512
oemid: 0x5344
scr: 0225000000000000
serial: 0x501681c4

Comparing to the info in the BunnieStudios post, the OEMID and names seem to suggest more or less genuine SanDisk cards (compare to his "Sample #6", but then again even he's not sure on that count...

Tsvetan (or anyone else who has one) what is the manuf. data on the cards Olimex sells?

Quote from: dpwhittaker on September 26, 2012, 12:38:07 AM
I'll add the brands when I get home tonight.  What brand are the cards you sell?  Would you recommend AmazonBasics?

As the linked article at BunnieStudios demonstrates, the brand printed on the card is not a reliable indicator of the card origin. I myself have ordered trays of several hundred of the "same" cards, and received a batch containing a handful of slightly different logos, back-of-card adhesives, "made in Taiwan" vs "made in China" vs no label, etc. In this case I'm sure that "AmazonBasics" does not actually manufacture cards, they are simply rebranding one of the other OEMs (Samsung, SanDisk, or Kingston - or maybe none of the above?) In any case there's no real way to tell - reading the card ID data can help (but even so there's no guarantee that data isn't also faked...)

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: davidjf2001 on September 26, 2012, 04:03:38 AM
A possible argument against SD card contribution to the problem of this thread, unless you have a swap file on it, or are doing disk-io, there should not be much activity on the card interface at the time of failure.  The other odd thing is I expect more severe failures if it was a power supply issue.  Perhaps there are floating pins, spurious interrupts etc.  Maybe take the advice of Fadil and look into ksymoops to determine exactly what the messages are indicating.  Arch shows the memtest86+ package available, I have not tried it with arch ARM, maybe it would help.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 26, 2012, 06:17:46 AM
Alright, I've plugged in my second olinuxino-micro.  While the first failed at "top -d 0" within 5-30 seconds every single time with a signal 4 (ILL) or segmentation fault, the second went for over 5 minutes, and still hasn't failed.  I've upgraded my Arch Linux distro with my RTL8187 dongle plugged in (it fails after 5 or 10 minutes, but I know that's a driver issue - the dongle itself is obviously overheating).

I'll run top -d 0 overnight tonight as a full stress test.

So, this does seem to affect some boards and not others.

---

Well, I was continuing to test before I posted this - after 30 minutes of basic activity (mostly pacman plugging away at the upgrade), my second board finally did throw a kernel oops.  So I plugged in my third board, and it acted much like the first - oops after 5-30 seconds of "top -d 0".  Illegal instruction once and unable to handle paging request once. So it looks like this issue affects some boards worse than others.  However, that kernel oops on the second board was shortly after an out of memory exception, so maybe that one was justified.

Still testing... will post my findings as I find more.

David Whittaker
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 26, 2012, 07:30:45 AM
The second micro does seem to be having the same issue after more extensive testing.  I guess I just got lucky at first:


[  181.210000] Unable to handle kernel paging request at virtual address fffffaab
[  181.220000] pgd = c2cbc000
[  181.220000] [fffffaab] *pgd=43ffe831, *pte=00000000, *ppte=00000000
[  181.230000] Internal error: Oops: 17 [#1] ARM
[  181.230000] Modules linked in:
[  181.230000] CPU: 0    Not tainted  (3.6.0-rc2-09647-gddee6b1-dirty #2)
[  181.230000] PC is at fget_raw_light+0x78/0x118
[  181.230000] LR is at lockdep_init_map+0x3c/0x484
[  181.230000] pc : [<c00c6404>]    lr : [<c0056d9c>]    psr: 60000013
sp : c3a2be88  ip : 00000000  fp : 00000041
[  181.230000] r10: ffffff9c  r9 : c3a2a000  r8 : c3b04000
[  181.230000] r7 : c04dfadc  r6 : c3b4f800  r5 : 00000000  r4 : c2c8b3c0
[  181.230000] r3 : c2c8b474  r2 : c2c8b46c  r1 : c041da58  r0 : 00000000
[  181.230000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  181.230000] Control: 0005317f  Table: 42cbc000  DAC: 00000015
[  181.230000] Process top (pid: 371, stack limit = 0xc3a2a270)
[  181.230000] Stack: (0xc3a2be88 to 0xc3a2c000)
[  181.230000] be80:                   c3a2bf78 c3a2bf00 00000001 00000000 c3a2bf78 c00d2998
[  181.230000] bea0: 00000024 3342967e 00000400 be9fef40 00000068 00000000 c3a2bf50 00000000
[  181.230000] bec0: c3828f14 c3a2a000 00000000 00000000 00000002 c3a2bf78 00000001 c3b04000
[  181.230000] bee0: ffffff9c ffffff9c c3a2a000 00000000 0001f510 c00d3094 00000041 000000d0
[  181.230000] bf00: 60000013 00000000 c3a2a000 c3828f04 00000002 00000000 c3828f04 00000000
[  181.230000] bf20: 00000000 c3828f04 c2c89040 00000000 c3828f04 00000000 00000008 00000008
[  181.230000] bf40: 0001f510 c03457b4 c3828ee0 c00de69c c3b04000 00000000 b6f5fb50 c3b04000
[  181.230000] bf60: 00000000 00000008 00000001 c00c41b4 00009bd4 ef000000 00000000 c0050000
[  181.230000] bf80: 00000024 00000100 00bb19a0 b6f5f650 00bb19a0 b6f5fb50 00000005 c000e9c8
[  181.230000] bfa0: 00000000 c000e820 b6f5f650 00bb19a0 b6f5f650 00000000 00000000 b6f5f65d
[  181.230000] bfc0: b6f5f650 00bb19a0 b6f5fb50 00000005 b6f5e8c8 00009bd4 00ba73c8 0001f510
[  181.230000] bfe0: 00000000 be9fef94 b6f4fc5c b6e5a68c 60000010 b6f5f650 43ffe831 43ffec31
[  181.230000] [<c00c6404>] (fget_raw_light+0x78/0x118) from [<be9fef40>] (0xbe9fef40)
[  181.230000] Code: e1560002 2a000017 e5933004 e7934106 (e3540000)
[  181.420000] ---[ end trace e700b39da7a4b8f3 ]---
[ 1235.480000] Unable to handle kernel paging request at virtual address bf836d40
[ 1235.480000] pgd = c2cbc000
[ 1235.480000] [bf836d40] *pgd=00000000
[ 1235.480000] Internal error: Oops: 80000005 [#2] ARM
[ 1235.480000] Modules linked in:
[ 1235.480000] CPU: 0    Tainted: G      D       (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 1235.480000] PC is at 0xbf836d40
[ 1235.480000] LR is at 0xbeaa00d0
[ 1235.480000] pc : [<bf836d40>]    lr : [<beaa00d0>]    psr: 00000013
sp : c384feb0  ip : 00000000  fp : 00000041
[ 1235.480000] r10: ffffff9c  r9 : 41000000  r8 : c39fc000
[ 1235.480000] r7 : 00000400  r6 : 3342967e  r5 : 00000024  r4 : 30610000
[ 1235.480000] r3 : 00000000  r2 : c3bfcbec  r1 : c041da58  r0 : 00000000
[ 1235.480000] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1235.480000] Control: 0005317f  Table: 42cbc000  DAC: 00000015
[ 1235.480000] Process top (pid: 372, stack limit = 0xc384e270)
[ 1235.480000] Stack: (0xc384feb0 to 0xc3850000)
[ 1235.480000] fea0:                                     00000068 00000000 c384ff50 00000000
[ 1235.480000] fec0: c38288f4 c384e000 00000000 00000000 00000002 c384ff78 00000001 c39fc000
[ 1235.480000] fee0: ffffff9c ffffff9c c384e000 00000000 0001f510 c00d3094 00000041 00000000
[ 1235.480000] ff00: 60000013 00000000 00000024 c38288e4 00000002 00000000 c38288e4 00000000
[ 1235.480000] ff20: 00000000 c38288e4 c2c89040 00000000 c38288e4 00000000 00000008 00000008
[ 1235.480000] ff40: 0001f510 c03457b4 c38288c0 c00de69c c39fc000 00000000 b6f41b50 c39fc000
[ 1235.480000] ff60: 00000000 00000008 00000001 c00c41b4 00000000 00000000 00000000 33420000
[ 1235.480000] ff80: 00000024 00000100 001e4490 b6f41650 001e4490 b6f41b50 00000005 c000e9c8
[ 1235.480000] ffa0: 00000000 c000e820 b6f41650 001e4490 b6f41650 00000000 00000000 b6f4165d
[ 1235.480000] ffc0: b6f41650 001e4490 b6f41b50 00000005 b6f408c8 00009bd4 001d93c8 0001f510
[ 1235.480000] ffe0: 00000000 bea9ff94 b6f31c5c b6e3c68c 60000010 b6f41650 55555555 55555555
[ 1235.480000] Code: bad PC value
[ 1235.480000] ---[ end trace e700b39da7a4b8f4 ]---


Whatever is happening, it seems to be pretty universal, at least on the batch I got.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on September 26, 2012, 12:47:28 PM
Thanks to everyone who has contributed to this discussion.  I've had to concentrate the last few days on  reworking our demo hardware so that we can run on the Maxi, which has proven to be very stable.  In fact I am not sure if I've ever seen an oops on the Maxi, and certainly not during heavy development on 2 boards over the last week.  I've just got another 5 Maxis to replace the Micros that I can't use.

I agree that power issues can and do cause problems, and I will try adding some additional low ESR bulk capacitance on the board tonight and leave it running.  But I'm not convinced that is going to fix the issue - there appears to be significant onboard capacitance for the included circuitry.  I've seen the problem on a board straight out of the box with no extra software or hardware attached apart from serial cable and power.

For the same reason I am not sure mmap'd GPIO makes a difference.  I am using that anyway, but as I mentioned I see this problem even without running any additional code.  I guess the demo blinking light in the Olimex rootfs image does use non-mmap'd GPIO.

In regards to microSD cards, I've seen the oops problem occur when using a microSD card from Olimex.  I've had many more serious problems when using some other cheap microSD's - causing boot problems on power up, which require ejecting and reinserting the card then a reset.  Since switching to better quality cards the boot issue has gone away.

I have built  a copy of ksymoops, but I don't have the kernel symbols to make use of it.  Most of the oops that I've recorded are for an older kernel that what I am now developing with on the Maxi.  The oops seem to be totally random, and often I will just get a lock up with no oops data.

Please keep the discussion and testing going.  This forum is great.

Kean
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on September 26, 2012, 01:25:09 PM
Hi
we managed to reproduce this kernel oops issues in our lab and how we are going to investigate for the root of the cause
Tsvetan
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on September 26, 2012, 02:42:37 PM
can you check your boards please: with kernel generated with LTIB we can't see such problems, the problems occur only with the new kernel generated with OE
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 26, 2012, 04:29:07 PM
I downloaded Raivis's original LTIB package from his google drive, dd'd the kernel and unpacked the rootfs then ran top -d 0.

The first thing that I noticed was that this older version of top redraws the whole screen every time, where the newer versions only redraw the lines that change.  This means the older version spends much more time waiting on the serial port, and much less time stressing the processor and memory.

Nevertheless, after a few minutes of stopping and starting top and looking at its settings, I finally tried on "top -d 0 -n 0" which seemed to do exactly what "top -d 0" does.  Either way, this run eventually failed with this Oops:


Internal error: Oops - undefined instruction: 0 [#2] PREEMPT
Modules linked in:
CPU: 0    Tainted: G      D     (2.6.31-626-g602af1c_OLinuXino #6)
PC is at 0xbee2ef9c
LR is at vsnprintf+0xc1c/0xdd8
pc : [<bee2ef9c>]    lr : [<c0141638>]    psr: 20000013
sp : c3873b70  ip : c3873ce4  fp : ffffffff
r10: c3873d30  r9 : c02c94f6  r8 : c3c62000
r7 : 00000000  r6 : 00000001  r5 : c3873d34  r4 : 00000010
r3 : 00000000  r2 : 00000001  r1 : c3c63000  r0 : c3c62000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43c08000  DAC: 00000015
Process top (pid: 675, stack limit = 0xc3872270)
Stack: (0xc3873b70 to 0xc3874000)
3b60:                                     00000010 00000002 ffffffff 0000000a
3b80: ffffffff ffffffff 00000000 00001000 c3c62000 c3c63000 00000000 ffffffff
3ba0: 00000002 ffffffff c3cb30a0 c015807c c3873f88 c3873e60 00000000 c3897c60
3bc0: 00000000 c009c43c 00000001 bee2fac4 00000000 00000000 ffffffff c3873e60
3be0: 00000000 00000001 00000000 00000001 000200da c009cff0 ffffffff 00000000
3c00: c38dc6e0 00000000 00000000 00000000 00000001 00000001 ffffffff c034b38c
3c20: 1e65fb80 000200da 000200da 00000000 c034b944 c034b948 00000001 c384a340
3c40: 00000002 c0070cf8 00000002 00000041 c034b38c 00000002 c3c63000 00000000
3c60: 00000000 000200da 00000002 ffffffff 00000000 37350000 00000036 c3cb3760
3c80: 00000000 00000000 00000000 c034b38c c3872000 00000000 c3873cce c3873f23
3ca0: c3873cce c3873f23 00000086 c00281ec c3873f30 c01408f8 00000000 00000000
3cc0: 00000000 00000000 c3809ff8 c380a000 c380e000 00000010 00000002 ffffffff
3ce0: 0000000a ffffffff ffffffff c02c94f6 c003ea70 c3882860 00000000 00000000
3d00: 00000000 00000000 00000000 00000000 c3849bc0 c00a7418 c0024904 c3873d30
3d20: 00000001 00000000 c00d1318 c02c94f4 00000001 c3873ea4 00000053 00000000
3d40: 00000001 00000001 00000000 ffffffff 00400100 00000077 000023cb 0000000a
3d60: 00000010 00000000 0000005a 00000022 00000055 00000014 00000000 00000001
3d80: 0000001a 00000000 001f9000 0000007f ffffffff 00008000 00090a94 bea1ef00
3da0: bea1eae8 400ae9c8 00000000 00000000 00000000 00084a07 c003ea70 00000000
3dc0: 00000000 00000000 00000000 00000000 00000000 c0070814 00000000 00000000
3de0: 00000000 00000000 c3c44dc0 c009ed90 00000001 c3809880 c3882860 00000001
3e00: 00000014 00000000 00000001 00400100 00000000 0000005a 00000022 00000055
3e20: 0000007f 00008000 00090a94 bea1ef00 00000001 00000000 00000000 001f9000
3e40: 400ae9c8 bea1eae8 c003ea70 ffffffff 00000000 00000053 00000000 00000001
3e60: 00000001 00000001 000023cb 00000010 00000022 00000055 00000000 ffffffff
3e80: 0000001a 00000000 0000000a 00000077 00000000 0000005a cd0a3c00 00000000
3ea0: c034b38c 74696e69 00000000 00000000 00000000 00084a07 00000000 00000000
3ec0: 00000000 a0000013 00000000 c380a000 fffffffd c3882860 c3fb4ca8 000003ff
3ee0: c3873f80 00000000 c3882600 c00d135c 00000001 c032ec44 c3809880 c00ce48c
3f00: c3882600 c3882860 bee2f5d8 00000001 00000000 c00a79d8 000003c3 bee2f5d8
3f20: 00000000 c3882888 bee2f9d8 c034b38c 00000000 00000000 c38011a0 c3882600
3f40: bee2f5d8 c3873f80 00000000 000003ff c3872000 bee2fa49 00000005 c008e098
3f60: 00000000 c3882600 c3882600 fffffff7 00000000 00000000 c0023f64 c008e484
3f80: 00000000 00000000 00000000 00000000 bee2f5d8 ffffffff 00000004 bee2f5d8
3fa0: 00000003 c0023de0 ffffffff 00000004 00000004 bee2f5d8 000003ff 00000000
3fc0: ffffffff 00000004 bee2f5d8 00000003 00003633 00003633 bee2fa49 00000005
3fe0: 00000000 bee2f5c8 00075538 400d17ac 60000010 00000004 aaaaaaaa aaaaaa8a
Code: 00000000 00000000 00000000 00000000 (ffffffff)
---[ end trace 6587df8e926aeb33 ]---


It also managed to turn off echo on the serial port, but I was able to type dmesg blindly to pull up a clean copy of the Oops.

I found a simpler way to reproduce on that kernel as well:

stty rows 5
top -d 0

And got another Oops:


Unable to handle kernel paging request at virtual address bebeb274
pgd = c38e8000
[bebeb274] *pgd=43c2b031, *pte=00000000, *ppte=00000000
Internal error: Oops: 0 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.31-626-g602af1c_OLinuXino #6)
PC is at 0xbebeb274
LR is at vsnprintf+0xc1c/0xdd8
pc : [<bebeb274>]    lr : [<c0141638>]    psr: 20000013
sp : c3843b70  ip : c3843ce4  fp : ffffffff
r10: c3843d30  r9 : c02c94f6  r8 : c3c2e000
r7 : 00000000  r6 : 00000001  r5 : c3843d34  r4 : 00000010
r3 : 00000000  r2 : 00000001  r1 : c3c2f000  r0 : c3c2e000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 438e8000  DAC: 00000015
Process top (pid: 660, stack limit = 0xc3842270)
Stack: (0xc3843b70 to 0xc3844000)
3b60:                                     00000010 00000002 ffffffff 0000000a
3b80: ffffffff ffffffff 00000000 00001000 c3c2e000 c3c2f000 00000000 ffffffff
3ba0: 00000002 ffffffff c3c7f260 c015807c c3843f88 c3843e60 00000000 c3c6e360
3bc0: 00000000 c009c43c 00000001 beb1dac4 00000000 00000000 ffffffff c3843e60
3be0: 00000000 00000001 00000000 00000001 000200da c009cff0 ffffffff 00000000
3c00: c3c90520 00000000 00000000 00000000 00000001 00000001 ffffffff c034b38c
3c20: 23c34600 000200da 000200da 00000000 c034b944 c034b948 00000001 c38f41c0
3c40: 00000002 c0070cf8 00000002 00000041 c034b38c 00000002 c3c2f000 00000000
3c60: 00000000 000200da 00000002 ffffffff 232b29f2 3630005f 00000036 c0351080
3c80: 00000103 00000081 00000001 c3c7f660 00000000 00000000 00000000 00000001
3ca0: c3842000 c380a000 00000086 c00281ec c3842000 c032f3cc c3c7f660 00000000
3cc0: 00000001 00000000 c3809ff8 c380a000 c380e000 00000010 00000002 ffffffff
3ce0: 0000000a ffffffff ffffffff c02c94f6 c003ea70 c3c42360 00000000 00000000
3d00: 00000000 00000000 00000000 00000000 c3c84780 c00a7418 c0024904 c3843d30
3d20: 00000001 00000000 c00d1318 c02c94f4 00000001 c3843ea4 00000053 00000000
3d40: 00000001 00000001 00000000 ffffffff 00400100 00000077 0000246f 0000000a
3d60: 00000010 00000000 0000005a 00000021 00000061 00000014 00000000 00000001
3d80: 0000001a 00000000 001f9000 0000007f ffffffff 00008000 00090a94 bed8df00
3da0: bed8dae8 400ae9c8 00000000 00000000 00000000 00084a07 c003ea70 00000000
3dc0: 00000000 00000000 00000000 00000000 00000000 c0070814 00000000 00000000
3de0: 00000000 00000000 c3c25608 c009ed90 00000001 c3809880 c3c42360 00000001
3e00: 00000014 00000000 00000001 00400100 00000000 0000005a 00000021 00000061
3e20: 0000007f 00008000 00090a94 bed8df00 00000001 00000000 00000000 001f9000
3e40: 400ae9c8 bed8dae8 c003ea70 ffffffff 00000000 00000053 00000000 00000001
3e60: 00000001 00000001 0000246f 00000010 00000021 00000061 00000000 ffffffff
3e80: 0000001a 00000000 0000000a 00000077 00000000 0000005a cda2d280 00000000
3ea0: c034b38c 74696e69 00000000 00000000 00000000 00084a07 00000000 00000000
3ec0: 00000000 a0000013 00000000 c380a000 fffffffd c3c42360 c38ecb08 000003ff
3ee0: c3843f80 00000000 c3c42280 c00d135c 00000001 c032ec44 c3809880 c00ce48c
3f00: c3c42280 c3c42360 beb1d5d8 00000001 00000000 c00a79d8 000003c2 beb1d5d8
3f20: 00000000 c3c42388 beb1d9d8 c034b38c 00000000 00000000 c38011a0 c3c42280
3f40: beb1d5d8 c3843f80 00000000 000003ff c3842000 beb1da49 00000003 c008e098
3f60: 00000000 c3c42280 c3c42280 fffffff7 00000000 00000000 c0023f64 c008e484
3f80: 00000000 00000000 00000000 00000000 beb1d5d8 ffffffff 00000004 beb1d5d8
3fa0: 00000003 c0023de0 ffffffff 00000004 00000004 beb1d5d8 000003ff 00000000
3fc0: ffffffff 00000004 beb1d5d8 00000003 00003633 00003633 beb1da49 00000003
3fe0: 00000000 beb1d5c8 00075538 400d17ac 60000010 00000004 c0248bc4 c0248bd4
Code: bad PC value.
---[ end trace 3788ceb25656dbec ]---


The idea behind that method is to limit the amount of rows top can show so it spends more time stressing the cpu and/or memory and less time shoving bytes down the DUART.  Give that a try and see if you can reproduce on LTIB.

EDIT: With all that said, I think it goes without saying that "use LTIB", a distro that you have considered obsolete from day one, is not a solution.  Good luck on the lab testing - I hope you find a simple solution.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on September 26, 2012, 06:14:54 PM
perhaps you do not understand the meaning of my previous post, I do not say "use LTIB" but we have to identify where the problem comes from, here in the lab MICRO which had kernel oops with ARCH image runs stabile for hours mplayer with the LTIB kernel, so we were thinking if the problem is not related somehow to the kernel image, we will write ARCH image to the SD card which now have LTIB so we are sure this is not related to the SD card media either
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 26, 2012, 08:46:05 PM
You are right, I did misinterpret your message.  Being a software guy, I'm used to people saying "it works fine on my software, so it must be something wrong with YOUR software".  Please forgive me for jumping to that conclusion here.

I also realize I am probably coming across as saying "it is broken on all my software, so it must be something wrong with YOUR hardware".  Please don't take it that way either - there is a lot of shared code between all the kernels, so there is still room for a software issue.  I'm kind of hoping for a software issue that I can just apply a patch and continue on with the hardware I have, but I can't see any common threads that point to a specific software issue, so I'm worried it could be something more fundamental.

I have now seen this issue on LTIB, OE, and linux-mainline.  LTIB does seem to take longer to show an issue, but it also seems to be all-around slower... I can't really put my finger on it, but perhaps it is just stressing the hardware less somehow, or perhaps its default configuration does not include some of the modules that are causing the issue.  Maybe busybox just isn't hitting the same code paths that standard linux packages are hitting.  There are so many variables in a software package the size of linux.

Have you tried taking the arch image which easily showed the Oops and plugging that same SD card into a Maxi?  If it gives no errors for a reasonable amount of time, then that basically narrows the problem to the differences between the Maxi and the Micro:

Board layout - trace lengths, capacitance, EMI between the CPU and memory so close together, dunno what else
The USB Hub/Ethernet chip - maybe something in the bootloader or kernel is attempting to initialize or interact with it and causing errors when it doesn't respond
The USB/Ethernet/Audio headers - it's a difference, don't know how leaving these unplugged would cause a problem
What other differences are there between the maxi and micro?

Any other suggestions on things to try?  I'll help wherever I can, especially on software-related things, although it will probably be tomorrow evening (GMT-6) before I get any more free time.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on September 26, 2012, 10:24:09 PM
@olimex @dpwhittaker

I've been seeing the oops on the Micro using ARCH - so it isn't restricted to OE.  I've not seen a problem on the Maxi using the same image, or even the same SD card, so I think it is related to a hardware difference.  Other than the EMI/DDR traces, maybe a non terminated/floating input ?

The oops (or lockup) will occur even if you don't try "stress" the system (e.g. running top), but it can take maybe 2 or 3 hours.

It is a pity the Micro has this problem, but we love the Maxi!

Kean
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: LubOlimex on September 27, 2012, 04:25:07 PM
Guys,

So far we've been able to increase the stability by decreasing the EMI speed to 96MHz down from 133MHz, using the old fsl image. We are still testing the stability but decreasing the EMI speed actually decreased the load on the chip by 50% (while doing top -d 0).

We will continue testing.

Edit: Thanks for correction, it's EMI speed of course
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 28, 2012, 06:03:08 AM
I believe you mean the EMI (dram) clock, not the CPU clock.  The CPU clock is set at 454Mhz by default (and seems to be hardcoded with no other easily configurable option, at least on Freescale's bootlets bootloader).  On the other hand, the EMI clock can easily be configured to select either 96MHz or 133MHz with a simple define.

While I can't find my way around the fsl toolchain to save my life, I have figured out how to implement this change on koliqi's linux-mainline toolchain.  Simply go into his boot/imx-bootlets-src-10.05.02/boot_prep/init-mx23.c file, and uncomment line 34 (#define EMI_96M).  Switch back to the imx-bootlets-src-10.05.02 and:

make CROSS_COMPILE=arm-linux-gnueabi- clean (or arm-none-eabi- if you are on ubuntu and went that route)
make CROSS_COMPILE=arm-linux-gnueabi-

dd if=sd_mmc_bootstream.raw of=/dev/sdX1  (where X is your sd card's letter)

Put your sd card back in the micro and reboot.

I've been running top -d 0 for over 30 minutes while writing this post, and have not had any failures yet, so, if indeed this is the equivalent action to what you've done only on linux-mainline, then I can corroborate your testing and say that this workaround does indeed seem to increase stability.

I don't necessarily like that I have to run my memory at 75% speed to keep everything stable, but I suppose there is a price to pay for getting linux into this size and price range.  Still, I'm holding out hope that you can find a solution that allows the micro to run at full speed.

EDIT: 1:24 and top is still going strong.  Will leave it running overnight and see if or when it fails in the morning, and again tomorrow evening (GMT -6) if it is still going in the morning.

EDIT 2: 8 Hours and still going strong at 96MHz.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on September 28, 2012, 06:46:10 AM
Looking forward to hearing more reports of testing with the slower EMI DRAM speed.  If someone can make an image available with that, I can do some testing here - I don't have time to rebuild the kernel at the moment.

FWIW, I added 100uF tantalum cap to the 5V innput near the power connectot (ESR of 85mohm).  I also changed the power connections from the lab supply from 24AWG to 16AWG.  I left top running so I could see the uptime when it failed.

It ran for 9 hrs 28 mins before oops.  Not sure if that is a useful datapoint - only one sample.

Kean
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: LubOlimex on September 28, 2012, 10:51:10 AM
I will be testing stability with different capacitors today. As a start will replace some of the 100nF ones with 220nF and see how it goes with different images.

Lub/OLIMEX
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: davidjf2001 on September 28, 2012, 06:28:00 PM
I still find it strange that memory issues can be the direct cause of this.  Memory issues like this typically cause unrecoverable failures.  Maybe crosstalk from the memory bus is influencing other signals to cause the errors that the CPU can recover from.  I would leave the memory at high speed to increase the rate of the errors and do my best at tracing through what the messages indicate.  Also rebuilding the kernel to remove all modules not absolutely necessary may shed some light.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: LubOlimex on September 28, 2012, 07:39:01 PM
A little update: so far increasing the capacitors around the memory didn't improve the stability (problem ~30 min of top -d 0) but removing R17 seems to improve it. So far being running for 4 hours without a problem. Will continue testing tomorrow.

Lub/OLIMEX
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on September 29, 2012, 06:12:52 PM
Alright, I've got some more free time this weekend, and have logged over 36 hours of uptime with top -d 0 running with the memory at 96MHz.  So if nothing else, this seems to solve the problem.  I'll also try popping off R17 and running at full speed.  Hopefully that will prove to be the simple modification that fixes the problem permanently.

EDIT: The hardware mod was easier than I imagined.  R17 is at the end of a row of resistors and capacitors, and even despite its tiny size, was relatively simple to remove with a fine-tip soldering iron.  It may be a little more difficult to put back if these tests fail, but I think I can handle it.

So, I'll work with it today without the resistor.  I'm writing an LRADC driver (from scratch, not based off the IIO version from Marek - I wanted one that would support continuous readings, delay channels, and oversampling, and I don't have time to learn the IIO subsystem, so I'm writing a generic character device driver - /dev/lradc0-8).  When I finish for the day, I'll kick off top and let it continue to stress test the system.  This should give us a mix of real-world and artificial tests to ensure that this fix really does solve the problem before everyone else goes off and mods their board.

top -d 0 has been running continuously for 10 minutes while writing this post, so it looks like we are on the right track.

EDIT 2: I was able to "make scripts" in my kernel source folder without error for the first time with the resistor removed.  I was also able to build two kernel modules onboard.  I had to enable swap to get the module to build (don't worry, I'm using a million write cycle sandisk SD), so I know I'm stressing the memory.  More testing to come, but I think this does the trick.

EDIT 3: After a full day of development with the resistor removed, there were no unexplained kernel oops (several were caused by bugs in my module, but all happened as a direct result of reading or writing to the character device connected to my driver, so they were obviously caused by bugs in my software and not memory timing issues).  Today's test included many iterations of editing, compiling, inserting, and testing my kernel module, so there was a good bit of real-world stress put on the micro, and it worked like a champ.  I'll run top -d 0 overnight to get a good artificial test as well, but I'd be willing to bet it will still be going strong in the morning.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: jap on September 30, 2012, 04:30:50 PM
archlinux was constantly crashing on the micro board. Removing R17 fixed this issue for me. No oops was seen after the fix.

Thank you!
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: dpwhittaker on October 01, 2012, 12:52:50 AM
15 hours of top -d 0 with the resistor removed.  This seems to be a good fix.  I'd recommend doing this on any future production runs.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on October 01, 2012, 07:18:25 AM
This sounds like an easy fix, and I will test it tomorrow.

@Lub/OLIMEX - can you explain what R17 is for?  It is also used on the Maxi and doesn't cause problems.

Kean
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: LubOlimex on October 01, 2012, 10:47:36 AM
Hey guys,

At the end of differential clock signals it is usually good to have a terminating resistor to stop signal bouncing and interference (electrical termination).

As far as I can see everybody here agreed that removing R17 improved the stability issues, and will suggest to the board designers to remove it in the next revision of the board.

Thank you guys for the help and the testing!

Cheers,
Lub

Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: fred on October 01, 2012, 06:31:40 PM
Hey guys from olimex,

Is that all ?

- Is the board really stable ?

- Who informs the other users regarding this error ?

- What happens to the boards at the resellers ?

...



Fred
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: stefan on October 02, 2012, 10:49:08 AM
An Application Note from Micron states that terminating resistor is not required on single DDR IC.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on October 02, 2012, 11:20:10 AM
@Fred we still do tests to be sure that removing the termination resistor solves this problem
once this is identified we will change the user manual, wiki and the schematic in the next run
the boards which are by the distributors will be delivered with R17 of course and customer should do small intervention to remove it if this is the cause of the problem
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: andersop on October 05, 2012, 02:40:14 AM
After I removed R17 I have not seen the kernel oops. I am running a stress tester on my application and the board does occasionally restart - but it simply reboots with no oops message, so I am not 100% sure if this is a hardware stability problem or some issue with my application... just my experience so far.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Kean on October 05, 2012, 06:29:58 PM
Since removing R17 on one of my MICRO boards, I can report 2.5 days perfect uptime.  Fantastic!
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: cnoviello on October 09, 2012, 12:39:05 PM
Hi,
I'm experiencing stability issues with some Olinuxino-Maxi boards. I exclude that problem is related to software, since it runs correctly on other boards. It seems that kernel freeze and there is no way to restart board. The only solution is powering off it. Do you think that removing R17 resistor can improve stability of Maxi too?

Thanks
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: andersop on October 09, 2012, 11:48:41 PM
Quote from: cnoviello on October 09, 2012, 12:39:05 PM
Hi,
I'm experiencing stability issues with some Olinuxino-Maxi boards. I exclude that problem is related to software, since it runs correctly on other boards. It seems that kernel freeze and there is no way to restart board. The only solution is powering off it. Do you think that removing R17 resistor can improve stability of Maxi too?

Thanks

It can't hurt to try. I think at this point we (the community / olimex) are not completely sure about this fix so more real-world test data would always be helpful. Removing R17 is very simple with a hot-air station, and moderately easy with a decent soldering iron/wick, so give it a try and post back with your findings!
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: albertdh on October 10, 2012, 04:08:55 AM
I too have been seeing stability problems with my iMX233-OLinuXino-MICRO.  My problems manifested as crashes soon after the USB host driver would shut down the USB system.  Sometimes the kernel would manage to dump an OOPS and other times not.  The module would not run linux for more than 20 minutes before trouble showed up.

I got the same sort of behavior from the 2.6.35-x kernels and the 3.6 kernel on github [https://github.com/koliqi/imx23-olinuxino].  Thank you to 'kolqui' on github for the port! The same problems happened for the Debian, Arch Linux, and OpenEmbedded images provided by Olimex

On Friday I removed R17 with a pair of soldering irons. I have been running gpsd against a USB GPS reciever with local and remote clients and an ssh client over a USB to ethernet link [3]. Uptime says the combination has been running for 3.9 days[2] and the kernel logs are still clean. This is with currently up to date Arch Linux with gpsd and a few more networking packages than originally provided.

So, I think the removal of the resistor has made the MICRO reliable enough to develop into a kite aerial photography controller and wireless relay.  Thanks Olimex for the chance to try!

    Albert.

[1] https://github.com/koliqi/imx23-olinuxino

[2] 19:47:49 up 3 days, 22:59,  3 users,  load average: 0.98, 0.88, 0.87

[3] a dm9601 USB 1.1 to 100BaseT cheap unit. I had to configure it into the kernel myself, but there are good recipes on the net for that.

Edited to change MINI to MICRO. AdH
Title: Re: iMX233-OLinuXino-MAXI
Post by: petri on December 13, 2012, 07:04:02 PM
I had a probelems also with early MAXI version of the board. I just removed R17 and it *seems* to be fixed instability problems. I'll report if it didn't, but for now with uptime of 22min (record) it really looks good!

:D :D
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: ritzBerl on June 10, 2013, 02:23:15 PM
any updates from olimex if removing R17 solves this issue (stable and reliable)?
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: Chax on June 24, 2013, 07:45:31 PM
I also had the strangest problems, gcc failing to compile a simple program on the Olinuxino Maxi.

I removed R17 from my Maxi board but I *THINK* the main reason for the failures had something to do with u-boot. I used some vanilla 2013.1 version but now the wiki by Robert Nelson (http://www.eewiki.net/display/linuxonarm/iMX233-OLinuXino) also includes some voltage fixes for the iMX23 and u-boot. A couple of days ago I recompiled uboot and this seems to solve the strange problems for me.

I have not run any stress tests yet. I will do this in a couple of weeks with the micro and the maxi board I own.
Title: Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
Post by: olimex on June 24, 2013, 08:32:02 PM
to make it crystal clear:

1. now the imx233 hardware have no stability issues whatsoever

2. if you encounter stability issues -> go to Wiki and download the official imx233 Linux Kernel 2.6 image and these stability issues will disappear

I will lock this tread as this becomes ridiculous, we have boards from all models running for weeks without hang up with Kernel 2.6.x and FSC elftosb

if I have to say it in Marex style "do not blame the hardware for the crappy software" good uboot people please fix your memory initializations :)