Welcome, Guest

Author Topic: iMX233-OLinuXino-MICRO stability issues (kernel oops)  (Read 31000 times)

Kean

  • Jr. Member
  • **
  • Posts: 91
  • Karma: +3/-0
iMX233-OLinuXino-MICRO stability issues (kernel oops)
« on: September 23, 2012, 02:06:28 PM »
Hi All,

Just wondering how many of you are working with the iMX233-OLinuXino-MICRO ?

I've got a couple of iMX233-OLinuXino-MAXI boards that we've used for initial development, and after a few minor issues have them working well running our code.  I've even re-compiled the kernel on the MAXI, which took a couple of days, and plenty of swap space.  But we need a smaller low cost solution, with just SPI, I2C, and a USB WiFi adapter, so we are now setting up everything to run on the MICRO.

The issue: We've found that the exact same SD card image that works great on the MAXI - running for days at a time with no issues - only runs our code for an hour or two before we get an Oops message on the console, and the program hangs (usually ends up a zombie, and can't be killed).

Just to confirm it isn't anything I've caused by modifying the board or SD image, I've taken a new MICRO out of the box, and used an new Olimex microSD as shipped from Olimex via Mouser.  We had the same issue, and that wasn't even running any of our code - just the standard image with just power and serial console connected.  This is 3 different MICRO boards, running the old Olimex distribution as well as ARCH.

I am using a good quality lab power supply, and custom made FTDI console cables with a Schottky diode to stop leakage current (killed an SD card on the first day and learnt my lesson there).  We have even tried enabling swap space on the SD card to ensure it isn't an out of memory condition.

A typical Oops looks like this (but every one is different):
Code: [Select]
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c385c000
[00000000] *pgd=43dc4031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio65/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXinoR4 #11)
PC is at kmem_cache_alloc_node+0x8/0x7c
LR is at prepare_creds+0x28/0x130
pc : [<c0090134>]    lr : [<c0058d9c>]    psr: 40000013
sp : c3dbff68  ip : 00000000  fp : bed1db44
r10: 4a3d7574  r9 : c3dbe000  r8 : c0026f04
r7 : ffffff9c  r6 : 00000004  r5 : c3d321c0  r4 : 00000000
r3 : c038834c  r2 : ffffffff  r1 : 000000d0  r0 : 00000000
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 4385c000  DAC: 00000015
Process sleep (pid: 1892, stack limit = 0xc3dbe270)
Stack: (0xc3dbff68 to 0xc3dc0000)
ff60:                   c038834c 00000000 c3d321c0 c0058d9c 00000000 4a3cc23c
ff80: 00000004 c009141c ffffff9c 4a3cc23c 00000000 4a3cc23c 4a3d7954 00000021
ffa0: c0026f04 c0026d80 00000000 4a3cc23c 4a3cc23c 00000004 00000000 4a3d6d70
ffc0: 00000000 4a3cc23c 4a3d7954 00000021 00000001 4a3d7968 4a3d7574 bed1db44
ffe0: 4a3d7954 bed1da84 4a3b27ec 4a3c813c 60000010 4a3cc23c 00000000 00000000
[<c0090134>] (kmem_cache_alloc_node+0x8/0x7c) from [<c0058d9c>] (prepare_creds+0x28/0x130)
[<c0058d9c>] (prepare_creds+0x28/0x130) from [<c009141c>] (sys_faccessat+0x1c/0x178)
[<c009141c>] (sys_faccessat+0x1c/0x178) from [<c0026d80>] (ret_fast_syscall+0x0/0x2c)
Code: c0389a58 c02f2abe e92d4038 e1a04000 (e5900000)
---[ end trace e714e7a61f35d43d ]---
Segmentation fault

Although there are a number of differences between the MICRO and MAXI, the primary one that I suspect is causing this is the routing of the EMI (DDR RAM) signals.  As mentioned in this Olimex blog post http://olimex.wordpress.com/2012/05/28/imx233-olinuxino-micro-doube-side-design-works-at-full-speed/

1) Is anyone else seeing these issues ?
2) Is it possible to reduce the DDR RAM clock speed to improve reliability ?

I've got a big demo planned pretty soon and really need something working...

Kean

davidjf2001

  • Newbie
  • *
  • Posts: 37
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #1 on: September 23, 2012, 05:24:15 PM »

If you suspect DRAM timing is on some critical edge trying heating and cooling the board. Does this make any difference? Otherwise could be a power supply issue.  Despite your lab supply, solder some bulk capacitance on the board itself.


Fadil Berisha

  • Full Member
  • ***
  • Posts: 124
  • Karma: +7/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #2 on: September 23, 2012, 07:36:02 PM »
 
 
Quote
We had the same issue, and that wasn't even running any of our code - just the standard image with just power and serial console connected.

Probably problem is with power supply. Nobody has reported similar problem. Also, did you use ksymoops - a utility to decode Linux kernel Oops?  Maybe can help to identify source of failure.

Regards

Fadil Berisha

redfox74

  • Newbie
  • *
  • Posts: 7
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #3 on: September 24, 2012, 03:29:27 PM »
Hi had same problem on Olimex Micro a lot of problem on Maxi quite well. I see that the main problem is the usb device that need a lot of power to work as for example the 3G dongle.
So i think that the circuit use the same 5v for the micro and for the dongle so the main problem is that it share same power source.
When a 3g dongle require more power a glictch on power produce in extreme situation a reset in other could be cause a kernel panic because the dram change his value .
Now i solve main problem put on usb a powered usb hub
Best
Roberto

olimex

  • Administrator
  • Hero Member
  • *****
  • Posts: 810
  • Karma: +22/-3
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #4 on: September 24, 2012, 04:35:47 PM »
note also that some GSM/3G modems radiate lot of power which cause inductive feed back on the tracks and may drive the processor crazy, we do some examples now with MOD-GSM and found that during the call MOD-GSM if placed at less 10 cm from the iMX233 board cause re-boot

redfox74

  • Newbie
  • *
  • Posts: 7
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #5 on: September 24, 2012, 07:13:30 PM »
@Kean,
i see the same kernel panic with wifi dongle . The main reason is that the wifi gateway is ,too far.
The wifi dongle request more energy and so happen something could be a problem on usb vcc . Or a problem of RF ,too near micro that produce some problem.
If your wifi gateway is near the board don't have problem.
Check it  , that is my problem
Best
Roberto

andersop

  • Newbie
  • *
  • Posts: 8
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #6 on: September 25, 2012, 12:19:34 AM »
I used to get a lot of these, here are some samples:

This one actually happened during the boot process (after udev but before runlevel 5)
Code: [Select]
Internal error: Oops - undefined instruction: 0 [#1] PREEMPT
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at 0xc1b3b7c8
LR is at __find_get_block+0x258/0x27c
pc : [<c1b3b7c8>]    lr : [<c00b43b0>]    psr: 60000013
sp : c3c8fdd0  ip : 00000000  fp : c388c780
r10: c344e800  r9 : 00000280  r8 : 00000000
r7 : 00000000  r6 : 00000000  r5 : c344ea80  r4 : c3cc5490
r3 : c3cc546c  r2 : c344eabc  r1 : c3cc5408  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43be4000  DAC: 00000015
Process rc (pid: 367, stack limit = 0xc3c8e270)
Stack: (0xc3c8fdd0 to 0xc3c90000)
fdc0:                                     000000d0 00000008 00000000 c387eeb8
fde0: 00000280 00000024 c3871b04 00011dfe c3871a90 c3d26140 c3871b04 c3888380
fe00: 00000000 c3871a90 c38e9000 c00e787c c3871a90 c3738418 c3c8fec8 c3c8fe60
fe20: c3d26140 c0097370 0000000f c3c8fec8 00000000 c3c8e000 c3bea00b 00000001
fe40: c3871a90 00000006 0000000b c0098018 000f47ec 8165e699 0000000f c3bea00b
fe60: c3888380 c38e9000 000f47ec c3c8fec8 c3c8fe90 c3c8e000 c3bea000 00000000
fe80: c3c8e000 000cf640 000e9a68 c00985a0 c3888380 c3879000 ffffff9c c3c8fec8
fea0: c3bea000 00000000 ffffff9c 00000001 000cf640 c00989f8 c3bea000 bebcc388
fec0: c3c8ff38 c0099408 c3888380 c38e9000 c3c8ffb0 c0026644 00000011 c3888380
fee0: c3879000 00000001 00000001 00000000 0000081f c0361ddc 000f47ec c3c8ffb0
ff00: ffffffff 00000000 ffffffff c002321c 00000007 00000008 c3c8ff60 c3c8e000
ff20: c3c8ff50 bebcc388 00000004 000000c3 c0023f04 c0092708 ffffff9c 000f7848
ff40: c3c8ff50 bebcc388 bebcc388 c00928e8 4eac2000 000000ae c0023f04 bebcbdac
ff60: 00000014 00000002 000f2e24 000000af c3c8e000 c0045de8 00000000 c0045f00
ff80: 00000000 c00486cc 00010000 00000000 00000000 00000000 00000008 ffffffff
ffa0: 000f7848 c0023d80 000f7848 bebcc388 000f7848 bebcc388 bebcc388 00000065
ffc0: 000f7848 bebcc388 00000004 000000c3 00000002 00000000 000cf640 000e9a68
ffe0: 000cb3f4 bebcc350 00086b68 4ea4fad4 20000010 000f7848 00000000 00000000
[<c00b43b0>] (__find_get_block+0x258/0x27c) from [<c3871a90>] (0xc3871a90)
Code: ffffffff ffffffff ffffffff ffffffff (ffffffff)
---[ end trace c4fd3b1bd08bfafd ]---

another one which is more typical of those I encounter while running my application:
Code: [Select]
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3ce4000
[00000000] *pgd=43eae031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio92/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at __remove_hrtimer+0x9c/0xa4
LR is at __hrtimer_start_range_ns+0x68/0x27c
pc : [<c0053208>]    lr : [<c0053774>]    psr: 40000013
sp : c3ea9ea8  ip : 0000c350  fp : 00016af8
r10: 001e8480  r9 : 00000000  r8 : 00000000
r7 : c3ea9f40  r6 : 00000001  r5 : c3ea9f40  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 : c3ea9f40
Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43ce4000  DAC: 00000015
Process myproc.bin (pid: 773, stack limit = 0xc3ea8270)
Stack: (0xc3ea9ea8 to 0xc3eaa000)
9ea0:                   c0366fc0 c3ea9f40 001e8480 3b9aca00 001e8480 c0053774
9ec0: a0000093 c01757a4 fffffdfd c3caf1c0 bed92b54 c3ea9f40 c3ea8000 00000001
9ee0: 3b9aca00 001e8480 001e8480 c00539d0 0000c350 00000001 00000001 00000000
9f00: c3ea9f40 c0271b5c 0000c350 00000001 001f47d0 0000c350 00000000 00000000
9f20: 00000000 00000001 c3ea8000 c0054098 0000c350 00000000 001f47d0 00000000
9f40: 00000000 00000000 00000000 00000000 001f47d0 00000000 001e8480 00000000
9f60: c0052fe4 c0366fc0 00000000 00000000 c3caf740 00016af8 00000000 00016af8
9f80: 00016b08 000000a2 c0023f04 0000000f 00016af8 c00541a4 00000000 001e8480
9fa0: 00000000 c0023d80 00000000 00016af8 bed92b58 00000000 001e8480 00000000
9fc0: 00000000 00016af8 00016b08 000000a2 00000000 00000000 0000000f 00016af8
9fe0: 00000000 bed92b58 4ea59dcc 4ea2e31c 60000010 bed92b58 00000000 00000000
[<c0053208>] (__remove_hrtimer+0x9c/0xa4) from [<c0053774>] (__hrtimer_start_range_ns+0x68/0x27c)
[<c0053774>] (__hrtimer_start_range_ns+0x68/0x27c) from [<c00539d0>] (hrtimer_start_range_ns+0x20/0x28)
[<c00539d0>] (hrtimer_start_range_ns+0x20/0x28) from [<c0271b5c>] (do_nanosleep+0x7c/0xf4)
[<c0271b5c>] (do_nanosleep+0x7c/0xf4) from [<c0054098>] (hrtimer_nanosleep+0x94/0x118)
[<c0054098>] (hrtimer_nanosleep+0x94/0x118) from [<c00541a4>] (sys_nanosleep+0x88/0xa0)
[<c00541a4>] (sys_nanosleep+0x88/0xa0) from [<c0023d80>] (ret_fast_syscall+0x0/0x2c)
Code: e1a00007 e2861008 eb040377 e5878028 (e8bd81f0)
---[ end trace a3e7968b8036dc63 ]---
./run_app: line 11:   773 Segmentation fault      /usr/bin/myproc.bin -nowdt

I generally use a Belkin F2U047 USB-ethernet adapter for development (using kernel driver compiled with "CONFIG_USB_NET_AX8817X=y") and I thought that might be the problem, so I ran without it. But I still got the errors. I kept seeing the "last sysfs file" line relating to the GPIO file; at the time my user application was doing shell calls to query GPIO status (read and parse the /sys/class/gpio/* files), which required a number of child processes.

After I switched to the MMAP'd GPIO interface (posted on the mailing list a few weeks ago) I have seen this error happen much less often - it is NOT gone completely, but definitely less often.

I am running on a custom baseboard PCB that takes 9-24VAC/DC and drops it to 5VDC using a switching regulator, I have used the same regulator topology in other products (many with similar ARM chips) and never encountered this type of issue so I am not certain it is power related. Also note that it will happen regardless of the USB ethernet is connected or not.

Some other examples I recorded:

In this case my app did not seem to be affected, it was still responding after the error and I could shut it down cleanly:
Code: [Select]
Unable to handle kernel paging request at virtual address c383bc40
pgd = c3d2c000
[c383bc40] *pgd=4380041e(bad)
Internal error: Oops: 80d [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio5/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at fput+0x148/0x228
LR is at security_file_free+0x14/0x1c
pc : [<c008fdc4>]    lr : [<c012d204>]    psr: 80000013
sp : c3ce5f60  ip : 0000003f  fp : 00000000
r10: c3bd0960  r9 : c3ce4000  r8 : c383bc40
r7 : 00000000  r6 : 00000008  r5 : c3cb9618  r4 : c385abc0
r3 : 00002000  r2 : 00000000  r1 : c3838be0  r0 : c03953e0
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 43d2c000  DAC: 00000015
Process sh (pid: 1584, stack limit = 0xc3ce4270)
Stack: (0xc3ce5f60 to 0xc3ce6000)
5f60: 00000000 00000000 00000000 c385abc0 00000000 c385c4a0 00000006 c0023f04
5f80: 4e995000 c008d33c c385c4a0 c385abc0 000200f4 c008d40c 000d09e0 000cf634
5fa0: 00000000 c0023d80 000cf634 00000000 00000005 00000005 00000000 000d09e0
5fc0: 000cf634 00000000 000200f4 00000006 00000000 00000000 4e995000 00000000
5fe0: 00000000 bec3cbb8 0001e954 4ea50d3c 60000010 00000005 00000000 00000000
[<c008fdc4>] (fput+0x148/0x228) from [<c008d33c>] (filp_close+0x64/0x70)
[<c008d33c>] (filp_close+0x64/0x70) from [<c008d40c>] (sys_close+0xc4/0x11c)
[<c008d40c>] (sys_close+0xc4/0x11c) from [<c0023d80>] (ret_fast_syscall+0x0/0x2c)
Code: e3530a02 1a000003 e59500f0 e3500000 (0a000000)
---[ end trace d220d5c97d19c019 ]---

Will post any others that I am able to capture as well as the circumstances that caused them.

For the time being my suggestion would be to move to the MMIO implementation instead of the /sys/class/gpio/* files and see if that helps. Maybe it's just placebo effect but I really think that helped my situation.

---
If you suspect DRAM timing is on some critical edge trying heating and cooling the board. Does this make any difference? Otherwise could be a power supply issue.  Despite your lab supply, solder some bulk capacitance on the board itself.

Any suggestions on where to place this? I'm having some other serious issues with noise on the audio output (will post a thread on here shortly) and I'm interesting in trying to add some more capacitance to see if would help.


davidjf2001

  • Newbie
  • *
  • Posts: 37
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #7 on: September 25, 2012, 02:57:34 AM »
Lead inductance is something to be concerned about with modern processors and high transient currents.  You may have a supply with high output current but the inductance of wires or traces can still wreck havoc with high speed transients. I would try putting bulk capacitors as close to the sdram bypass caps on the olimex board.  Before doing this though, do you have issues with other software, can you compile large files with GCC?  Did you try temperature tests? Add a fan, check with heat gun?


dpwhittaker

  • Newbie
  • *
  • Posts: 38
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #8 on: September 25, 2012, 05:03:42 PM »
I wanted to mention that I have the same problem.  9 times out of 10, the kernel recovers, and I can continue working without a reboot.  However, I have never been able to successfully compile the kernel on the micro.  I can't even get it to run under load long enough to "make scripts" in the kernel sources.

I have a 5V 2A radio shack adjustable wall wart power supply, so I don't think it has anything to do with lack of power.

I've let the micro sit idle for the past 30 minutes.  During that time, several Oops have happened:

Code: [Select]
[ 3500.470000] Unable to handle kernel paging request at virtual address bec469f8
[ 3500.470000] pgd = c3a78000
[ 3500.480000] [bec469f8] *pgd=43ac3831, *pte=00000000, *ppte=00000000
[ 3500.480000] Internal error: Oops: 80000005 [#1] ARM
[ 3500.480000] Modules linked in:
[ 3500.480000] CPU: 0    Not tainted  (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 3500.480000] PC is at 0xbec469f8
[ 3500.480000] LR is at 0x0
[ 3500.480000] pc : [<bec469f8>]    lr : [<00000000>]    psr: 00000013
[ 3500.480000] sp : c3a2de9c  ip : 00000018  fp : 00000000
[ 3500.480000] r10: 00000008  r9 : ffc4ab77  r8 : 00000000
[ 3500.480000] r7 : 501ea721  r6 : 00000000  r5 : 50525977  r4 : 00000000
[ 3500.480000] r3 : 00000008  r2 : ffffff88  r1 : c3a2df20  r0 : bec4ab70
[ 3500.480000] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 3500.480000] Control: 0005317f  Table: 43a78000  DAC: 00000015
[ 3500.480000] Process crond (pid: 326, stack limit = 0xc3a2c270)
[ 3500.480000] Stack: (0xc3a2de9c to 0xc3a2e000)
[ 3500.480000] de80:                                                                bec4ab10
[ 3500.480000] dea0: 50525977 00000000 00001000 bec4ab10 00000068 00000000 c3a2df50 c00c9000
[ 3500.480000] dec0: 0000b304 00000000 00000000 00004372 000081a4 00000002 00000000 00000000
[ 3500.480000] dee0: 00000000 00000000 00000000 00000000 00000de7 00000000 00001000 00000000
[ 3500.480000] df00: 00000008 00000000 50525977 00000000 501ea721 00000000 5051ecec 00000000
[ 3500.480000] df20: 00004372 00000000 c3a2df50 bec4ab10 00000000 00000000 000000c3 c000e9c8
[ 3500.480000] df40: c3a2c000 00000000 bec4ab9c c00c9480 00004372 00000000 0b300004 c3a281a4
[ 3500.480000] df60: 00000002 00000000 00000000 00000000 00000de7 00000000 50525977 00000000
[ 3500.480000] df80: 501ea721 00000000 5051ecec 00000000 00001000 c000e918 00000008 00000000
[ 3500.480000] dfa0: b6f52228 c000e820 b6f52228 00000000 b6f52228 bec4ab10 bec4ab10 00000000
[ 3500.480000] dfc0: b6f52228 00000000 00000000 000000c3 00000001 b6f66000 ffffba92 bec4ab9c
[ 3500.480000] dfe0: 00000000 bec4aab0 b6ec2328 b6ef23b8 20000010 b6f52228 00000000 00000000
[ 3500.660000] Code: 00000000 00000000 00000000 00000000 (00000000)
[ 3500.660000] ---[ end trace 7f11451ac6811422 ]---
1969 Dec 31 19:03:47 micro [ 3500.480000] Internal error: Oops: 80000005 [#1] ARM
1969 Dec 31 19:03:47 micro [ 3500.480000] Process crond (pid: 326, stack limit = 0xc3a2c270)
1969 Dec 31 19:03:47 micro [ 3500.480000] Stack: (0xc3a2de9c to 0xc3a2e000)
1969 Dec 31 19:03:47 micro [ 3500.480000] de80:                                                                bec4ab10
1969 Dec 31 19:03:47 micro [ 3500.480000] dea0: 50525977 00000000 00001000 bec4ab10 00000068 00000000 c3a2df50 c00c9000
1969 Dec 31 19:03:47 micro [ 3500.480000] dec0: 0000b304 00000000 00000000 00004372 000081a4 00000002 00000000 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dee0: 00000000 00000000 00000000 00000000 00000de7 00000000 00001000 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df00: 00000008 00000000 50525977 00000000 501ea721 00000000 5051ecec 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df20: 00004372 00000000 c3a2df50 bec4ab10 00000000 00000000 000000c3 c000e9c8
1969 Dec 31 19:03:47 micro [ 3500.480000] df40: c3a2c000 00000000 bec4ab9c c00c9480 00004372 00000000 0b300004 c3a281a4
1969 Dec 31 19:03:47 micro [ 3500.480000] df60: 00000002 00000000 00000000 00000000 00000de7 00000000 50525977 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] df80: 501ea721 00000000 5051ecec 00000000 00001000 c000e918 00000008 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dfa0: b6f52228 c000e820 b6f52228 00000000 b6f52228 bec4ab10 bec4ab10 00000000
1969 Dec 31 19:03:47 micro [ 3500.480000] dfc0: b6f52228 00000000 00000000 000000c3 00000001 b6f66000 ffffba92 bec4ab9c
1969 Dec 31 19:03:47 micro [ 3500.480000] dfe0: 00000000 bec4aab0 b6ec2328 b6ef23b8 20000010 b6f52228 00000000 00000000
1969 Dec 31 19:03:47 micro [ 3500.660000] Code: 00000000 00000000 00000000 00000000 (00000000)
[ 4256.890000] Internal error: Oops - undefined instruction: 0 [#2] ARM
[ 4256.890000] Modules linked in:
[ 4256.890000] CPU: 0    Tainted: G      D       (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 4256.890000] PC is at 0xc19d9c88
[ 4256.890000] LR is at 0xb6f35a10
[ 4256.890000] pc : [<c19d9c88>]    lr : [<b6f35a10>]    psr: 20000093
[ 4256.890000] sp : c3bc3fb0  ip : b6f35a40  fp : b6f39814
[ 4256.890000] r10: b6f422a0  r9 : b6f39820  r8 : b6f422a0
[ 4256.890000] r7 : 00000000  r6 : ffffffff  r5 : 20000010  r4 : b6f35a10
[ 4256.890000] r3 : ffffffff  r2 : 00000010  r1 : 00000009  r0 : c05021ac
[ 4256.890000] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 4256.890000] Control: 0005317f  Table: 43b9c000  DAC: 00000015
[ 4256.890000] Process mandb (pid: 15501, stack limit = 0xc3bc2270)
[ 4256.890000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
[ 4256.890000] 3fa0:                                     ffffffff 00000009 00000010 00b13188
[ 4256.890000] 3fc0: 00000000 00d1ccb0 00000f9e 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
[ 4256.890000] 3fe0: b6f35a40 beaab290 b6e16f98 b6f35a10 20000010 ffffffff 532e0a2e 74222053
[ 4256.890000] Code: 0004017b 00000000 00000017 00000035 (ffffffff)
[ 4256.890000] ---[ end trace 7f11451ac6811423 ]---
1969 Dec 31 19:16:23 micro [ 4256.890000] Internal error: Oops - undefined instruction: 0 [#2] ARM
1969 Dec 31 19:16:23 micro [ 4256.890000] Process mandb (pid: 15501, stack limit = 0xc3bc2270)
1969 Dec 31 19:16:23 micro [ 4256.890000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fa0:                                     ffffffff 00000009 00000010 00b13188
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fc0: 00000000 00d1ccb0 00000f9e 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
1969 Dec 31 19:16:23 micro [ 4256.890000] 3fe0: b6f35a40 beaab290 b6e16f98 b6f35a10 20000010 ffffffff 532e0a2e 74222053
1969 Dec 31 19:16:23 micro [ 4256.890000] Code: 0004017b 00000000 00000017 00000035 (ffffffff)
[ 4431.640000] Unable to handle kernel paging request at virtual address 20000010
[ 4431.640000] pgd = c3bf8000
[ 4431.640000] [20000010] *pgd=00000000
[ 4431.640000] Internal error: Oops: 805 [#3] ARM
[ 4431.640000] Modules linked in:
[ 4431.640000] CPU: 0    Tainted: G      D       (3.6.0-rc2-09647-gddee6b1-dirty #2)
[ 4431.640000] PC is at 0xc339e02c
[ 4431.640000] LR is at 0xb6f35a10
[ 4431.640000] pc : [<c339e02c>]    lr : [<b6f35a10>]    psr: 20000093
[ 4431.640000] sp : c3bc3fb0  ip : 00000000  fp : b6f39814
[ 4431.640000] r10: b6f422a0  r9 : b6f39820  r8 : b6f422a0
[ 4431.640000] r7 : 3f0c2356  r6 : ffffffff  r5 : 20000010  r4 : b6f35a10
[ 4431.640000] r3 : ffffffff  r2 : 00000000  r1 : 00000001  r0 : c05021ac
[ 4431.640000] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 4431.640000] Control: 0005317f  Table: 43bf8000  DAC: 00000015
[ 4431.640000] Process mandb (pid: 19738, stack limit = 0xc3bc2270)
[ 4431.640000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
[ 4431.640000] 3fa0:                                     ffffffff 00000001 00000000 00d285a8
[ 4431.640000] 3fc0: 00000000 00d89898 00002296 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
[ 4431.640000] 3fe0: 00000000 beaab5b8 b6f359a0 b6f35a10 20000010 ffffffff 65637865 20736465
[ 4431.640000] Code: 43093631 675f5f09 705f756e 3a736462 (7465643a)
[ 4431.640000] ---[ end trace 7f11451ac6811424 ]---
1969 Dec 31 19:19:18 micro [ 4431.640000] Internal error: Oops: 805 [#3] ARM
1969 Dec 31 19:19:18 micro [ 4431.640000] Process mandb (pid: 19738, stack limit = 0xc3bc2270)
1969 Dec 31 19:19:18 micro [ 4431.640000] Stack: (0xc3bc3fb0 to 0xc3bc4000)
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fa0:                                     ffffffff 00000001 00000000 00d285a8
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fc0: 00000000 00d89898 00002296 00000000 b6f422a0 b6f39820 b6f422a0 b6f39814
1969 Dec 31 19:19:18 micro [ 4431.640000] 3fe0: 00000000 beaab5b8 b6f359a0 b6f35a10 20000010 ffffffff 65637865 20736465
1969 Dec 31 19:19:18 micro [ 4431.640000] Code: 43093631 675f5f09 705f756e 3a736462 (7465643a)

These seem to be the usual Oops I get.  Since they happen both at idle and under load, they seem to be only marginally related to temperature (higher loads do seem to increase the frequency).  I am a little worried about pointing my 300 degree C hot air station at it to confirm or deny the impact of heat... maybe I'll try a hair dryer.

I did find that when I unplugged my RTL8187B wlan adapter, the frequency and severity of the Oops decreased greatly.  However, I don't see any correlation between /sys/class/gpio usage or my mmap gpio and the number of Oops.

After switching to only DUART for development, leaving nothing plugged in to USB, I still get messages similar to the above on a fairly consistent basis, both idle and under load, so I think there is still an issue.

I am running an up-to-date Arch Linux Arm distro with the usb ethernet inet scripts for the maxi disabled, and a 3.6rc2 mainline kernel with patches from koliqi (sp?).  I do have several additional features configured in the kernel, but almost all were built as modules and none are loaded at the time of the Oops.  I saw it on the old 2.6 kernel too.

I have 2 more micros that I haven't pulled out of the box yet.  When I get an opportunity (or when I get frustrated enough by the one I'm working on), I'll pull them out and see if they fare any better with the same SD card.

I also suspect a hardware issue somewhere, given the seemingly random nature of the Oops.  Is there an easy way to turn down the memory clock in linux, or am I going to need to mmap the clock registers and backdoor it to test that theory?

I've got plenty of small ceramic and larger electrolytic capacitors here, but not sure where to solder them to test the capacitance theories... if anybody can give me a pointer, I'll try that one as well.  I'm more of a software guy.  If I understand "try putting bulk capacitors as close to the sdram bypass caps on the olimex board" correctly, I think I could solder one of my 10 uF electrolytic capacitors in parallel across C36 (making sure I get the negative lead on the VSS - have to look at the board layout to see which side that is).  Is that correct?

Any other ideas?  I should have some time to try in the next week or two, if olimex doesn't do it themselves before I do.  Has anyone at olimex seen this issue?

fred

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #9 on: September 25, 2012, 10:11:58 PM »
Same Problems on 2 iMX233-OLinuXino-MICRO.

I am using the last Arch Linux Arm distro.

Compiling the kernel on the iMX233-OLinuXino-MICRO
is not possible ( a swapfile was enabled ).

A "top -d 0" is crashing after some minutes.


Power supply and serial connection adapter are working correct
on another system.


The iMX233-OLinuXino-MICRO board works not correct.
Are there any activity from Olimex for solving this problem ?


Code: [Select]

PowerPrep start initialize power...
Battery Voltage = 0.68V
No battery or bad battery                                       
detected!!!.Disabling battery   voltage measurements./r/nLLCAug 22 201215:25:39
EMI_CTRL 0x1C084040
FRAC 0x92926192
init_ddr_mt46v32m16_133Mhz
power 0x00820710
Frac 0x92926192
start change cpu freq
hbus 0x00000003
cpu 0x00010001
LLLLLLLFCLJUncompressing Linux... done, booting the kernel.
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.35-6-ARCH+ (kiril@plug) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 PREEMPT Fri Aug 31 14:22:01 EEST 2012
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
CPU: VIVT data cache, VIVT instruction cache
Machine: iMX233-OLinuXino low cost board
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16256
Kernel command line: console=ttyAMA0,115200 root=/dev/mmcblk0p2 rw rootwait ssp1=mmc lcd_panel=tvenc_pal no_console_suspend
PID hash table entries: 256 (order: -2, 1024 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
allocated 327680 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Memory: 64MB = 64MB total
Memory: 56392k/56392k available, 9144k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xfde00000 - 0xffe00000   (  32 MB)
    vmalloc : 0xc4800000 - 0xf0000000   ( 696 MB)
    lowmem  : 0xc0000000 - 0xc4000000   (  64 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .init : 0xc0008000 - 0xc0028000   ( 128 kB)
      .text : 0xc0028000 - 0xc03b3000   (3628 kB)
      .data : 0xc03ce000 - 0xc03f9140   ( 173 kB)
Hierarchical RCU implementation.
        RCU-based detection of stalled CPUs is disabled.
        Verbose stalled-CPUs detection is disabled.
NR_IRQS:224
Console: colour dummy device 80x30
console [ttyAMA0] enabled
Calibrating delay loop... 226.09 BogoMIPS (lpj=1130496)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
CPU: Testing write buffer coherency: ok
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
regulator: vddd: 800 <--> 1575 mV at 1550 mV fast normal
regulator: vdddbo: 800 <--> 1575 mV fast normal
regulator: vdda: 1500 <--> 2275 mV at 1750 mV fast normal
regulator: vddio: 2800 <--> 3575 mV at 3300 mV fast normal
regulator: overall_current: fast normal
regulator: mxs-duart-1: fast normal
regulator: mxs-bl-1: fast normal
regulator: mxs-i2c-1: fast normal
regulator: mmc_ssp-1: fast normal
regulator: mmc_ssp-2: fast normal
regulator: charger-1: fast normal
regulator: power-test-1: fast normal
regulator: cpufreq-1: fast normal
i.MX IRAM pool: 28 KB@0xc4808000
usb: DR gadget (utmi) registered
bio: create slab <bio-0> at 0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Advanced Linux Sound Architecture Driver Version 1.0.23.
Switching to clocksource mxs clock source
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 4096K
Bus freq driver module loaded
mxs_cpu_init: cpufreq init finished
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
msgmni has been set to 118
alg: No test for stdrng (krng)
cryptodev: driver loaded.
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Console: switching to colour frame buffer device 90x36
mxs-duart.0: ttyAMA0 at MMIO 0x80070000 (irq = 0) is a DebugUART
mxs-auart.1: ttySP1 at MMIO 0x8006c000 (irq = 24) is a mxs-auart.1
Found APPUART 3.0.0
brd: module loaded
loop: module loaded
usbcore: registered new interface driver smsc95xx
usbmon: debugfs is not available
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
fsl-ehci fsl-ehci: Freescale On-Chip EHCI Host Controller
fsl-ehci fsl-ehci: new USB bus registered, assigned bus number 1
fsl-ehci fsl-ehci: irq 11, io base 0x80080000
fsl-ehci fsl-ehci: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Freescale On-Chip EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.35-6-ARCH+ ehci_hcd
usb usb1: SerialNumber: fsl-ehci
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver libusual
ARC USBOTG Device Controller driver (1 August 2005)
udc: request mem region for fsl-usb2-udc failed
fsl-usb2-udc: probe of fsl-usb2-udc failed with error -16
mice: PS/2 mouse device common for all mice
MXS RTC driver v1.0 hardware v2.0.0
mxs-rtc mxs-rtc.0: rtc core: registered mxs-rtc as rtc0
i2c /dev entries driver
WARNING : No battery connected !
Aborting power driver initialization
mxs-battery: probe of mxs-battery.0 failed with error 1
mxs watchdog: initialized, heartbeat 19 sec
mxs-mmc: MXS SSP Controller MMC Interface driver
ssp_set_rate: error -110
mxs-mmc mxs-mmc.0: mmc0: MXS SSP MMC DMAIRQ 14 ERRIRQ 15
dcp dcp.0: DCP crypto enabled.!
mxs-adc-audio mxs-adc-audio.0: MXS ADC/DAC Audio Codec
No device for DAI mxs adc/dac
No device for DAI mxs adc/dac
asoc: mxs adc/dac <-> mxs adc/dac mapping ok
ALSA device list:
  #0: MXS EVK (mxs adc/dac)
TCP cubic registered
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
registered taskstats version 1
mxs-rtc mxs-rtc.0: setting system clock to 1970-01-01 00:00:10 UTC (10)
RAMDISK: Couldn't find valid RAM disk image starting at 0.
Waiting for root device /dev/mmcblk0p2...
mmc0: new high speed SDHC card at address e624
mmcblk0: mmc0:e624 SU16G 14.8 GiB
 mmcblk0: p1 p2
EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext2 filesystem) on device 179:2.
devtmpfs: mounted
Freeing init memory: 128K
INIT: version 2.88 booting

 > Arch Linux ARM

 > http://www.archlinuxarm.org

   ------------------------------
:: Mounting Root Read-Only                                               [BUSY] EXT4-fs (mmcblk0p2): re-mounted. Opts: barrier=1,data=ordered
                                                                         [DONE]
:: Adjusting system time and setting kernel timezone                     [DONE]
:: Starting UDev Daemon                                                  [BUSY] <30>systemd-udevd[63]: starting version 186
                                                                         [DONE]
:: Triggering UDev uevents                                               [DONE]
:: Loading User-specified Modules                                        [DONE]
:: Waiting for UDev uevents to be processed                              [DONE]
:: Configuring Virtual Consoles                                          [DONE]
:: Bringing up loopback interface                                        [DONE]
:: Unlocking encrypted volumes                                           [DONE]
:: Checking Filesystems                                                  [DONE]
:: Remounting Root and API filesystems                                   [BUSY] EXT4-fs (mmcblk0p2): re-mounted. Opts: barrier=1,data=ordered
                                                                         [DONE]
:: Mounting Local Filesystems                                            [DONE]
:: Activating Swap                                                       [DONE]
:: Configuring Time Zone                                                 [DONE]
:: Initializing Random Seed                                              [DONE]
:: Removing Leftover Files                                               [DONE]
:: Setting Hostname: alarm                                               [DONE]
:: Saving dmesg Log                                                      [DONE]
INIT: Entering runlevel: 3
:: Starting Syslog-NG                                                    [DONE]
:: Starting Network                                                      [BUSY]
Error: unknown interface in /etc/rc.conf: `usb0'
                                                                         [DONE]
:: Mounting Network Filesystems                                          [DONE]
:: Starting crond daemon                                                 [DONE]
:: Starting Secure Shell Daemon                                          [DONE]

Arch Linux 2.6.35-6-ARCH+ (ttyAMA0)

alarm login: root
Password:



[root@alarm ~]# cat /proc/cpuinfo
Processor       : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 226.09
Features        : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5

Hardware        : iMX233-OLinuXino low cost board
Revision        : 0000
Serial          : 0000000000000000
[root@alarm ~]# cat /proc/version
Linux version 2.6.35-6-ARCH+ (kiril@plug) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 PREEMPT Fri Aug 31 14:22:01 EEST 2012
[root@alarm ~]# cat /proc/meminfo
MemTotal:          60616 kB
MemFree:           35924 kB
Buffers:            3996 kB
Cached:            12780 kB
SwapCached:            0 kB
Active:             7864 kB
Inactive:          11900 kB
Active(anon):       2996 kB
Inactive(anon):       48 kB
Active(file):       4868 kB
Inactive(file):    11852 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:          3000 kB
Mapped:             4084 kB
Shmem:                60 kB
Slab:                  0 kB
SReclaimable:          0 kB
SUnreclaim:            0 kB
KernelStack:         288 kB
PageTables:          204 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       30308 kB
Committed_AS:       7224 kB
VmallocTotal:     712704 kB
VmallocUsed:          52 kB
VmallocChunk:     712260 kB



...


[root@alarm linux-2.6.35.3]# make clean
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = c0004000
[0000000c] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT
last sysfs file: /sys/class/gpio/gpio65/value
Modules linked in:
CPU: 0    Not tainted  (2.6.35-6-ARCH+ #1)
PC is at unmap_vmas+0x290/0x6a8
LR is at unmap_vmas+0x240/0x6a8
pc : [<c0094104>]    lr : [<c00940b4>]    psr: 60000013
sp : c1615e90  ip : 00040495  fp : 405ad3cf
r10: c0b3bcc0  r9 : c1655ec4  r8 : 00000000
r7 : 00004000  r6 : c03d2260  r5 : c3bbcf48  r4 : 405b0000
r3 : 00000000  r2 : 405ad3cf  r1 : 405b0000  r0 : c041c5a0
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 4165c000  DAC: 00000015
Process as (pid: 1606, stack limit = 0xc1614270)
Stack: (0xc1615e90 to 0xc1616000)
5e80:                                     c041e780 00000000 405cb000 405cb000
5ea0: 405cb000 c1614000 c165d014 c1615f10 c165d014 c1655ec0 405ac3cf 00000000
5ec0: 00000000 ffffffff 00000001 405cafff 00100100 00000000 fffffffd 00000000
5ee0: c38922f8 c0b3bcc0 00000000 c0b3bcc0 c3cc1200 00000001 c1614000 03c1a13f
5f00: 0009030c c0098820 c1615f14 00000000 c03d2260 000003ba c0b3bcc0 00000000
5f20: 00000000 c0b3bcf4 00000001 c0040e7c 00000001 c0b3bcc0 c0b3b700 c004671c
5f40: c0b3b700 c03dd7d8 c0b3b700 c0b3b700 c1614000 00000000 000000f8 c00468c4
5f60: 00000000 00095a70 c0b3b864 c0b3ba80 00000000 c1614000 000000f8 c0028f44
5f80: c1614000 03c1a13f 0009030c c00470a8 00095a84 00000000 4025173c c00470f4
5fa0: 00000000 c0028dc0 00095a84 00000000 00000000 00095a70 00000008 00000000
5fc0: 00095a84 00000000 4025173c 000000f8 0008fc90 0008fcf0 03c1a13f 0009030c
5fe0: 00000001 bed1d9f0 40154f9c 401bd614 60000010 00000000 40c0a9a0 00000000
[<c0094104>] (unmap_vmas+0x290/0x6a8) from [<c0098820>] (exit_mmap+0xd0/0x204)
[<c0098820>] (exit_mmap+0xd0/0x204) from [<c0040e7c>] (mmput+0x38/0x108)
[<c0040e7c>] (mmput+0x38/0x108) from [<c004671c>] (exit_mm+0x13c/0x140)
[<c004671c>] (exit_mm+0x13c/0x140) from [<c00468c4>] (do_exit+0x1a4/0x64c)
[<c00468c4>] (do_exit+0x1a4/0x64c) from [<c00470a8>] (do_group_exit+0xac/0xe8)
[<c00470a8>] (do_group_exit+0xac/0xe8) from [<c00470f4>] (__wake_up_parent+0x0/0x18)
Code: e59b2014 e5981008 e1520001 3a000067 (e598100c)
---[ end trace 5fb667f282fd4036 ]---


olimex

  • Administrator
  • Hero Member
  • *****
  • Posts: 810
  • Karma: +22/-3
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #10 on: September 25, 2012, 11:03:28 PM »
Hi
may I ask what SD card do you use with MICRO? The cards we sell or your own?
there are many fake SD cards on the market and here is blog from Chumby author which also uses iMX233 about this http://www.bunniestudios.com/blog/?p=918
let me know what we have to do to re-produce your results as we never encounter such problems in our lab
Tsvetan

dpwhittaker

  • Newbie
  • *
  • Posts: 38
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #11 on: September 26, 2012, 12:38:07 AM »
Admittedly, I am using some older microSD cards pulled from some old smart phones.  However, I have tried 2 different microSD cards with the same issues.  One older 2GB one that was in service for about 2 years before being basically shelved for a year, and one newer 16GB one that was only used for a few months.  I'll add the brands when I get home tonight.  What brand are the cards you sell?  Would you recommend AmazonBasics?

I bought 3 of the very first olinuxino-micro's released, before the switch to hardware i2c default jumper positions.  Were there any other hardware changes in that batch?

I use the DUART for all my communication needs - even ZMODEM for file transfers.  Though again, the system was sitting idle without even any console messages when the Oops in my previous post occurred.

andersop

  • Newbie
  • *
  • Posts: 8
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #12 on: September 26, 2012, 02:12:34 AM »
may I ask what SD card do you use with MICRO? The cards we sell or your own?
there are many fake SD cards on the market and here is blog from Chumby author which also uses iMX233 about this http://www.bunniestudios.com/blog/?p=918

Yes I am familiar with the topic, I have been buying SDcards in bulk for several products for some time now. In the course of collecting the data to post I got another interesting kernel panic - this time actually during the boot process (just after init):
Code: [Select]
INIT: version 2.88 booting
Unable to handle kernel NULL pointer dereference at virtual address 0000016a
pgd = c3be4000
[0000016a] *pgd=438cb031, *pte=00000000, *ppte=00000000
Internal error: Oops: 1 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Not tainted  (2.6.35.3_OLinuXino #1)
PC is at cfq_should_idle+0x68/0xb4
LR is at cfq_dispatch_requests+0x550/0x7d4
pc : [<c014908c>]    lr : [<c014c0e0>]    psr: 00000093
sp : c38c3f68  ip : c38bb028  fp : 00000000
r10: 00000001  r9 : 00000000  r8 : 00000002
r7 : c3bec1b0  r6 : ffff8c8a  r5 : c3bec1b0  r4 : c3bec1b0
r3 : 000001a1  r2 : 00000001  r1 : 00000002  r0 : c3bec1b0
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 43be4000  DAC: 00000017
Process mmcqd (pid: 360, stack limit = 0xc38c2270)
Stack: (0xc38c3f68 to 0xc38c4000)
3f60:                   c3bec0c8 c014c0e0 c014bb90 c38c2000 c38b79a0 00000000
3f80: c38c2000 c38b79a0 00000000 00000000 00000000 c01419d4 c38b79a0 00000001
3fa0: c38c2000 c38bcc84 c38bcc8c c01419fc c38c2000 c01e198c 00000000 c3c85e30
3fc0: c01e1918 c38bcc84 00000013 00000000 00000000 c004f95c 00000000 00000000
3fe0: c38c3fe0 c38c3fe0 c3c85e30 c004f8e4 c00247c4 c00247c4 00000000 00000000
[<c014908c>] (cfq_should_idle+0x68/0xb4) from [<c014c0e0>] (cfq_dispatch_requests+0x550/0x7d4)
[<c014c0e0>] (cfq_dispatch_requests+0x550/0x7d4) from [<c01419d4>] (blk_peek_request+0x194/0x1b4)
[<c01419d4>] (blk_peek_request+0x194/0x1b4) from [<c01419fc>] (blk_fetch_request+0x8/0x1c)
[<c01419fc>] (blk_fetch_request+0x8/0x1c) from [<c01e198c>] (mmc_queue_thread+0x74/0x108)
[<c01e198c>] (mmc_queue_thread+0x74/0x108) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
Code: e5913004 e3130020 0a000006 e5901000 (e5911168)
---[ end trace 8b42fe0adb163e2c ]---
note: mmcqd[360] exited with preempt_count 1
Unable to handle kernel NULL pointer dereference at virtual address 00000234
pgd = c0004000
[00000234] *pgd=00000000
Internal error: Oops: 17 [#2] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0    Tainted: G      D      (2.6.35.3_OLinuXino #1)
PC is at cfq_set_request+0x38/0x4e0
LR is at elv_set_request+0x1c/0x2c
pc : [<c014aed0>]    lr : [<c013f040>]    psr: 20000013
sp : c3c21df8  ip : 00000000  fp : 00000000
r10: 00000001  r9 : c3bec558  r8 : 010c0001
r7 : 00000001  r6 : c38bb020  r5 : c3bec558  r4 : c38b79a0
r3 : 00000000  r2 : 00000010  r1 : 010c0001  r0 : 00000010
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 43be4000  DAC: 00000017
Process sync_supers (pid: 138, stack limit = 0xc3c20270)
Stack: (0xc3c21df8 to 0xc3c22000)
1de0:                                                       00200200 00000001
1e00: 010c0001 c3c20000 00040001 c38b79a0 c3bec558 00000000 00000001 010c0001
1e20: 00000010 00000001 00000000 c013f040 00011200 c014120c c38b79a0 c38b79a0
1e40: 00040001 00000000 00040001 00000000 000f1f00 c01414c8 c3bec078 c3bec4e0
1e60: c3c21ea4 000085e2 00000000 c013e9b8 c38b79a0 c38b79a0 000004c1 00000400
1e80: 00000000 00000000 00000000 000f1f00 00000000 c0141e50 c38b79a0 c3bec4e0
1ea0: 000f1efe c0141b84 c3bec4e0 c38b79a0 00000002 c3c20000 00000002 c01403c0
1ec0: 000e98b0 00000000 00000002 00000000 00000002 c3c40dc0 c3c52b00 00011210
1ee0: c3c21f18 00000010 00000000 c0050038 00000000 00000000 00011200 c3bec4e0
1f00: c3bec4e0 00000001 00000001 00000000 00000000 00000000 00000000 c0140528
1f20: c03655b0 c3886330 c3c40da0 000004c1 00000001 00000010 00000000 c00b6e68
1f40: c3bec4e0 c00b7690 00000010 0000000f c3886330 c3886330 c3bec4e0 000004c1
1f60: 00000001 00000000 00000000 c00b1a74 c3886330 00000002 c3400400 c00b5b88
1f80: 00000000 c3886330 c3c8c640 c00eb520 00000001 c3c8c640 c3c20000 c3c8c680
1fa0: c036c8a4 c00eb5b4 c3c8c640 c009122c c3c7d4c0 c3c20000 00000001 00000000
1fc0: 00000013 c007c9d8 00000000 c3c0ff78 c007c9a4 c004f95c 00000000 00000000
1fe0: c3c21fe0 c3c21fe0 c3c0ff78 c004f8e4 c00247c4 c00247c4 00000000 00000000
[<c014aed0>] (cfq_set_request+0x38/0x4e0) from [<c013f040>] (elv_set_request+0x1c/0x2c)
[<c013f040>] (elv_set_request+0x1c/0x2c) from [<c014120c>] (get_request.isra.35+0x1cc/0x2a8)
[<c014120c>] (get_request.isra.35+0x1cc/0x2a8) from [<c01414c8>] (get_request_wait.isra.36+0x20/0x120)
[<c01414c8>] (get_request_wait.isra.36+0x20/0x120) from [<c0141e50>] (__make_request+0x2cc/0x434)
[<c0141e50>] (__make_request+0x2cc/0x434) from [<c01403c0>] (generic_make_request+0x22c/0x298)
[<c01403c0>] (generic_make_request+0x22c/0x298) from [<c0140528>] (submit_bio+0xfc/0x118)
[<c0140528>] (submit_bio+0xfc/0x118) from [<c00b1a74>] (submit_bh+0x17c/0x1b4)
[<c00b1a74>] (submit_bh+0x17c/0x1b4) from [<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c)
[<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c) from [<c00eb520>] (ext4_commit_super+0x100/0x17c)
[<c00eb520>] (ext4_commit_super+0x100/0x17c) from [<c00eb5b4>] (ext4_write_super+0x18/0x24)
[<c00eb5b4>] (ext4_write_super+0x18/0x24) from [<c009122c>] (sync_supers+0xb4/0x114)
[<c009122c>] (sync_supers+0xb4/0x114) from [<c007c9d8>] (bdi_sync_supers+0x34/0x48)
[<c007c9d8>] (bdi_sync_supers+0x34/0x48) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
Code: e58d3004 e5963000 e58d1008 e1a00002 (e5931234)
---[ end trace 8b42fe0adb163e2d ]---
------------[ cut here ]------------
WARNING: at kernel/exit.c:896 do_exit+0x30/0x674()
Modules linked in:
[<c00280fc>] (unwind_backtrace+0x0/0xe0) from [<c0039f00>] (warn_slowpath_common+0x4c/0x64)
[<c0039f00>] (warn_slowpath_common+0x4c/0x64) from [<c0039f30>] (warn_slowpath_null+0x18/0x1c)
[<c0039f30>] (warn_slowpath_null+0x18/0x1c) from [<c003d164>] (do_exit+0x30/0x674)
[<c003d164>] (do_exit+0x30/0x674) from [<c0026f2c>] (die+0x2c4/0x304)
[<c0026f2c>] (die+0x2c4/0x304) from [<c026ec04>] (__do_kernel_fault.part.4+0x54/0x74)
[<c026ec04>] (__do_kernel_fault.part.4+0x54/0x74) from [<c0028fc4>] (do_page_fault+0x1ec/0x204)
[<c0028fc4>] (do_page_fault+0x1ec/0x204) from [<c002321c>] (do_DataAbort+0x34/0x98)
[<c002321c>] (do_DataAbort+0x34/0x98) from [<c002392c>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc3c21db0 to 0xc3c21df8)
1da0:                                     00000010 010c0001 00000010 00000000
1dc0: c38b79a0 c3bec558 c38bb020 00000001 010c0001 c3bec558 00000001 00000000
1de0: 00000000 c3c21df8 c013f040 c014aed0 20000013 ffffffff
[<c002392c>] (__dabt_svc+0x4c/0x60) from [<c014aed0>] (cfq_set_request+0x38/0x4e0)
[<c014aed0>] (cfq_set_request+0x38/0x4e0) from [<c013f040>] (elv_set_request+0x1c/0x2c)
[<c013f040>] (elv_set_request+0x1c/0x2c) from [<c014120c>] (get_request.isra.35+0x1cc/0x2a8)
[<c014120c>] (get_request.isra.35+0x1cc/0x2a8) from [<c01414c8>] (get_request_wait.isra.36+0x20/0x120)
[<c01414c8>] (get_request_wait.isra.36+0x20/0x120) from [<c0141e50>] (__make_request+0x2cc/0x434)
[<c0141e50>] (__make_request+0x2cc/0x434) from [<c01403c0>] (generic_make_request+0x22c/0x298)
[<c01403c0>] (generic_make_request+0x22c/0x298) from [<c0140528>] (submit_bio+0xfc/0x118)
[<c0140528>] (submit_bio+0xfc/0x118) from [<c00b1a74>] (submit_bh+0x17c/0x1b4)
[<c00b1a74>] (submit_bh+0x17c/0x1b4) from [<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c)
[<c00b5b88>] (sync_dirty_buffer+0xa0/0x12c) from [<c00eb520>] (ext4_commit_super+0x100/0x17c)
[<c00eb520>] (ext4_commit_super+0x100/0x17c) from [<c00eb5b4>] (ext4_write_super+0x18/0x24)
[<c00eb5b4>] (ext4_write_super+0x18/0x24) from [<c009122c>] (sync_supers+0xb4/0x114)
[<c009122c>] (sync_supers+0xb4/0x114) from [<c007c9d8>] (bdi_sync_supers+0x34/0x48)
[<c007c9d8>] (bdi_sync_supers+0x34/0x48) from [<c004f95c>] (kthread+0x78/0x80)
[<c004f95c>] (kthread+0x78/0x80) from [<c00247c4>] (kernel_thread_exit+0x0/0x8)
---[ end trace 8b42fe0adb163e2e ]---

Anyway: a quick and easy way to access this data while running the card in the OLinuxIno is to check under /sys/block/mmcblk0/device, you will see a number of files which you can cat to read back the data. The two cards I am using are labelled as SanDisk, 512MB. Here's what I get from reading the card data:

Card #1: (primary development card, sporadic crashes, frequency varies)
date: 11/2007
fwrev: 0x0
hwrev: 0x8
manfid: 0x000003
name: SU512
oemid: 0x5344 ("SD" in ASCII)
scr: 0125000000000000
serial: 0x2088ebae

Card #2: backup card - similar markings, the back side looks a bit different, both say "Made in Taiwan". Haven't used this card much, but it did crash on the first load when I put it in to read the data...
date: 06/2008
fwrev: 0x0
hwrev: 0x8
manfid: 0x000003
name: SU512
oemid: 0x5344
scr: 0225000000000000
serial: 0x501681c4

Comparing to the info in the BunnieStudios post, the OEMID and names seem to suggest more or less genuine SanDisk cards (compare to his "Sample #6", but then again even he's not sure on that count...

Tsvetan (or anyone else who has one) what is the manuf. data on the cards Olimex sells?

I'll add the brands when I get home tonight.  What brand are the cards you sell?  Would you recommend AmazonBasics?

As the linked article at BunnieStudios demonstrates, the brand printed on the card is not a reliable indicator of the card origin. I myself have ordered trays of several hundred of the "same" cards, and received a batch containing a handful of slightly different logos, back-of-card adhesives, "made in Taiwan" vs "made in China" vs no label, etc. In this case I'm sure that "AmazonBasics" does not actually manufacture cards, they are simply rebranding one of the other OEMs (Samsung, SanDisk, or Kingston - or maybe none of the above?) In any case there's no real way to tell - reading the card ID data can help (but even so there's no guarantee that data isn't also faked...)


davidjf2001

  • Newbie
  • *
  • Posts: 37
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #13 on: September 26, 2012, 04:03:38 AM »
A possible argument against SD card contribution to the problem of this thread, unless you have a swap file on it, or are doing disk-io, there should not be much activity on the card interface at the time of failure.  The other odd thing is I expect more severe failures if it was a power supply issue.  Perhaps there are floating pins, spurious interrupts etc.  Maybe take the advice of Fadil and look into ksymoops to determine exactly what the messages are indicating.  Arch shows the memtest86+ package available, I have not tried it with arch ARM, maybe it would help.

dpwhittaker

  • Newbie
  • *
  • Posts: 38
  • Karma: +0/-0
Re: iMX233-OLinuXino-MICRO stability issues (kernel oops)
« Reply #14 on: September 26, 2012, 06:17:46 AM »
Alright, I've plugged in my second olinuxino-micro.  While the first failed at "top -d 0" within 5-30 seconds every single time with a signal 4 (ILL) or segmentation fault, the second went for over 5 minutes, and still hasn't failed.  I've upgraded my Arch Linux distro with my RTL8187 dongle plugged in (it fails after 5 or 10 minutes, but I know that's a driver issue - the dongle itself is obviously overheating).

I'll run top -d 0 overnight tonight as a full stress test.

So, this does seem to affect some boards and not others.

---

Well, I was continuing to test before I posted this - after 30 minutes of basic activity (mostly pacman plugging away at the upgrade), my second board finally did throw a kernel oops.  So I plugged in my third board, and it acted much like the first - oops after 5-30 seconds of "top -d 0".  Illegal instruction once and unable to handle paging request once. So it looks like this issue affects some boards worse than others.  However, that kernel oops on the second board was shortly after an out of memory exception, so maybe that one was justified.

Still testing... will post my findings as I find more.

David Whittaker