Hi,
Sometime ago I reported stability issues related to Olinuxino-maxi. After a long period of test, I reached to conclusion that my problems are related to ethernet adapter.
My application is a python application running cherrypy web server. After several hours of execution, this oops appears on kernel ring buffer:
0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 18, btch: 3 usd: 2
active_anon:2466 inactive_anon:2207 isolated_anon:0
active_file:2101 inactive_file:5738 isolated_file:0
unevictable:0 dirty:8 writeback:0 unstable:0
free:792 slab_reclaimable:523 slab_unreclaimable:441
mapped:1163 shmem:62 pagetables:90 bounce:0
DMA free:2140kB min:188kB low:232kB high:280kB active_anon:2152kB inactive_anon:336kB active_file:1172kB inactive_file:580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12192kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4kB slab_reclaimable:360kB slab_unreclaimable:48kB kernel_stack:16kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 51 51
Normal free:1028kB min:824kB low:1028kB high:1236kB active_anon:7712kB inactive_anon:8492kB active_file:7232kB inactive_file:22372kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52832kB mlocked:0kB dirty:32kB writeback:0kB mapped:4652kB shmem:244kB slab_reclaimable:1732kB slab_unreclaimable:1716kB kernel_stack:488kB pagetables:344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 151*4kB 124*8kB 32*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2140kB
Normal: 257*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1028kB
7902 total pagecache pages
16384 pages of RAM
911 free pages
1268 reserved pages
814 slab pages
7792 pages shared
0 pages swap cached
smsc95xx 1-1.1:1.0: usb0: kevent 2 may have been dropped
python: page allocation failure. order:3, mode:0x4020
[<c002f100>] (unwind_backtrace+0x0/0xe0) from [<c00797d0>] (__alloc_pages_nodemask+0x4c0/0x528)
[<c00797d0>] (__alloc_pages_nodemask+0x4c0/0x528) from [<c0079848>] (__get_free_pages+0x10/0x3c)
[<c0079848>] (__get_free_pages+0x10/0x3c) from [<c025965c>] (__alloc_skb+0x50/0xe4)
[<c025965c>] (__alloc_skb+0x50/0xe4) from [<c01df3fc>] (rx_submit+0x24/0x1d4)
[<c01df3fc>] (rx_submit+0x24/0x1d4) from [<c01e0014>] (usbnet_bh+0x178/0x22c)
[<c01e0014>] (usbnet_bh+0x178/0x22c) from [<c0048024>] (tasklet_action+0x80/0xd4)
[<c0048024>] (tasklet_action+0x80/0xd4) from [<c0048518>] (__do_softirq+0x90/0x124)
[<c0048518>] (__do_softirq+0x90/0x124) from [<c004896c>] (irq_exit+0x44/0xa0)
[<c004896c>] (irq_exit+0x44/0xa0) from [<c002a06c>] (asm_do_IRQ+0x6c/0x88)
[<c002a06c>] (asm_do_IRQ+0x6c/0x88) from [<c002ab88>] (__irq_usr+0x48/0xc0)
Exception stack(0xc347bfb0 to 0xc347bff8)
bfa0: 0000000c 004401e8 0000000b 401851e8
bfc0: 403d1440 00440338 0044033c 0044033c 00000000 40165200 004401f8 006ba698
bfe0: 401851e8 431b80f0 4008f484 400eb618 20000010 ffffffff
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 18, btch: 3 usd: 2
After two or three of these messages, main python process crashes, causing my application to stop.
This is the kernel release I'm using:
Linux version 2.6.35.3-10.12.01+yocto+g0ea8cb9 (admininstrator@igloo.airqnetworks.com) (gcc version 4.6.4 20120303 (prerelease) (GCC) ) #3 PREEMPT Sat Nov 3 15:45:01 CET 2012
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
Any hints?
Thanks ;)
Do you see the same issue with a 3.7 or 3.8-rc4 kernel?
Regards,
Fabio Estevam
Hi Fabio,
As I said I'm using an old kernel (2.6.35). Do you suggest me to try to compile a 3.x kernel?
Exactly, please try the latest kernel.
Ok I'll try even if it's not a simple task since I'm adapting an existing sysfs to Olinuxino. We actually have hardware solutions based on a different ARM and I would like to move to Olinuxino, but it's not simple to adapt existing OS configuration to archlinux & co.
I'll try and let you know.
Really thanks for your support.
Carmine Noviello
It was simpler than I expected. Just a couple of changes. I'm starting tests right now.
With the latest kernel i get still BUG and Oops related to smsc95xx USB ethernet.
When using a different USB ethernet the maxi works fine.
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 3.8.0-rc4+ (earny@olofi) (gcc version 4.7.2 (Gentoo 4.7.2 p1.3, pie-0.5.5) ) #9 Sun Jan 20 12:40:46 UTC 2013
[ 0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
[ 0.000000] CPU: VIVT data cache, VIVT instruction cache
[ 0.000000] Machine: Freescale i.MX23 (Device Tree), model: i.MX23 Olinuxino Low Cost Board
[ 0.000000] Memory policy: ECC disabled, Data cache writeback
[ 0.000000] On node 0 totalpages: 16384
[ 0.000000] free_area_init_node: node 0, pgdat c04b2854, node_mem_map c0503000
[....]
[ 0.930000] Freeing init memory: 112K
[ 1.090000] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
[ 1.090000] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 1.110000] hub 1-1:1.0: USB hub found
[ 1.120000] hub 1-1:1.0: 3 ports detected
[ 1.410000] usb 1-1.1: new high-speed USB device number 3 using ci_hdrc
[ 1.530000] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[ 1.530000] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 1.560000] smsc95xx v1.0.4
[ 1.650000] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-ci_hdrc.0-1.1, smsc95xx USB 2.0 Ethernet, 9a:71:92:da:2b:82
[ 6.080000] udevd[150]: starting version 171
[ 29.490000] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
[ 30.320000] BUG: scheduling while atomic: swapper/0/0x40000100
[ 30.320000] Modules linked in:
[ 30.320000] [<c001359c>] (unwind_backtrace+0x0/0xe0) from [<c034b340>] (__schedule_bug+0x48/0x5c)
[ 30.320000] [<c034b340>] (__schedule_bug+0x48/0x5c) from [<c03500e4>] (__schedule+0x44/0x430)
[ 30.320000] [<c03500e4>] (__schedule+0x44/0x430) from [<c003ed64>] (__cond_resched+0x24/0x34)
[ 30.320000] [<c003ed64>] (__cond_resched+0x24/0x34) from [<c0350564>] (_cond_resched+0x3c/0x44)
[ 30.320000] [<c0350564>] (_cond_resched+0x3c/0x44) from [<c00162c4>] (do_alignment+0x5c/0x39c)
[ 30.320000] [<c00162c4>] (do_alignment+0x5c/0x39c) from [<c0008594>] (do_DataAbort+0x34/0x98)
[ 30.320000] [<c0008594>] (do_DataAbort+0x34/0x98) from [<c000e1b8>] (__dabt_svc+0x38/0x60)
[ 30.320000] Exception stack(0xc048be08 to 0xc048be50)
[ 30.320000] be00: c2d40036 00000178 00000005 c2d40036 c04af4c4 00000000
[ 30.320000] be20: c2cef6c0 c2cef6c0 c3a7c000 00000008 00000000 c04927cc c04af4c4 c048be50
[ 30.320000] be40: c02e913c c030790c 80000013 ffffffff
[ 30.320000] [<c000e1b8>] (__dabt_svc+0x38/0x60) from [<c030790c>] (ip_rcv+0x12c/0x52c)
[ 30.320000] [<c030790c>] (ip_rcv+0x12c/0x52c) from [<c02e913c>] (__netif_receive_skb+0x4dc/0x5d0)
[ 30.320000] [<c02e913c>] (__netif_receive_skb+0x4dc/0x5d0) from [<c02e92ac>] (process_backlog+0x7c/0x140)
[ 30.320000] [<c02e92ac>] (process_backlog+0x7c/0x140) from [<c02eb3a8>] (net_rx_action+0x64/0x1ec)
[ 30.320000] [<c02eb3a8>] (net_rx_action+0x64/0x1ec) from [<c0020e68>] (__do_softirq+0xac/0x20c)
[ 30.320000] [<c0020e68>] (__do_softirq+0xac/0x20c) from [<c002127c>] (irq_exit+0x40/0x8c)
[ 30.320000] [<c002127c>] (irq_exit+0x40/0x8c) from [<c000f48c>] (handle_IRQ+0x64/0x84)
[ 30.320000] [<c000f48c>] (handle_IRQ+0x64/0x84) from [<c00086c0>] (icoll_handle_irq+0x30/0x38)
[ 30.320000] [<c00086c0>] (icoll_handle_irq+0x30/0x38) from [<c000e220>] (__irq_svc+0x40/0x4c)
[ 30.320000] Exception stack(0xc048bf68 to 0xc048bfb0)
[ 30.320000] bf60: 00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[ 30.320000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[ 30.320000] bfa0: c000f514 c000f520 60000013 ffffffff
[ 30.320000] [<c000e220>] (__irq_svc+0x40/0x4c) from [<c000f520>] (default_idle+0x2c/0x34)
[ 30.320000] [<c000f520>] (default_idle+0x2c/0x34) from [<c000f65c>] (cpu_idle+0x6c/0xbc)
[ 30.320000] [<c000f65c>] (cpu_idle+0x6c/0xbc) from [<c046c6f4>] (start_kernel+0x244/0x284)
[ 42.900000] BUG: scheduling while atomic: swapper/0/0x40000100
[ 42.900000] Modules linked in:
[ 42.900000] [<c001359c>] (unwind_backtrace+0x0/0xe0) from [<c034b340>] (__schedule_bug+0x48/0x5c)
[ 42.900000] [<c034b340>] (__schedule_bug+0x48/0x5c) from [<c03500e4>] (__schedule+0x44/0x430)
[ 42.900000] [<c03500e4>] (__schedule+0x44/0x430) from [<c003ed64>] (__cond_resched+0x24/0x34)
[ ... repeats every now and then until ... ]
[76526.130000] Internal error: Oops: 817 [#1] ARM
[76526.130000] Modules linked in:
[76526.130000] CPU: 0 Tainted: G W (3.8.0-rc4+ #9)
[76526.130000] PC is at process_backlog+0x100/0x140
[76526.130000] LR is at tcp_rcv_state_process+0xb94/0xc20
[76526.130000] pc : [<c02e9330>] lr : [<c031dde0>] psr: 80000093
[76526.130000] sp : c048beb0 ip : c0589930 fp : c0496930
[76526.130000] r10: 00000000 r9 : 00744fd7 r8 : c04b2e94
[76526.130000] r7 : c04b2e80 r6 : 00000003 r5 : 00000002 r4 : c04b2ec0
[76526.130000] r3 : 00000000 r2 : 00200200 r1 : 00100100 r0 : 00100100
[76526.130000] Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[76526.130000] Control: 0005317f Table: 42114000 DAC: 00000017
[76526.130000] Process swapper (pid: 0, stack limit = 0xc048a1b8)
[76526.130000] Stack: (0xc048beb0 to 0xc048c000)
[76526.130000] bea0: c02e9230 c04b2ec0 c04b2e80 c048a000
[76526.130000] bec0: 0000012c 00000040 c04b2e88 c02eb3a8 c04d5ecc 00744fd7 c04d5ecc 00000001
[76526.130000] bee0: 0000000c c048a000 00000003 00000100 c04d5ea0 00000009 c04d5ec0 c0020e68
[76526.130000] bf00: c386ad80 c3979700 001195fb c386ad80 00000000 00200000 c048bf9c 00000082
[76526.130000] bf20: 00000000 c04b4c4c c048bf9c 40004000 41069265 40482d24 00000000 c002127c
[76526.130000] bf40: 00000000 c000f48c c000f514 f5000000 c048bf68 c00086c0 c000f520 60000013
[76526.130000] bf60: ffffffff c000e220 00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[76526.130000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[76526.130000] bfa0: c000f514 c000f520 60000013 ffffffff c000f4f4 c000f65c c0492378 ffffffff
[76526.130000] bfc0: c0484230 c046c6f4 ffffffff ffffffff c046c284 00000000 00000000 c0484230
[76526.130000] bfe0: 00000000 00053175 c049201c c048422c c0494814 40008040 00000000 00000000
[76526.130000] [<c02e9330>] (process_backlog+0x100/0x140) from [<c02eb3a8>] (net_rx_action+0x64/0x1ec)
[76526.130000] [<c02eb3a8>] (net_rx_action+0x64/0x1ec) from [<c0020e68>] (__do_softirq+0xac/0x20c)
[76526.130000] [<c0020e68>] (__do_softirq+0xac/0x20c) from [<c002127c>] (irq_exit+0x40/0x8c)
[76526.130000] [<c002127c>] (irq_exit+0x40/0x8c) from [<c000f48c>] (handle_IRQ+0x64/0x84)
[76526.130000] [<c000f48c>] (handle_IRQ+0x64/0x84) from [<c00086c0>] (icoll_handle_irq+0x30/0x38)
[76526.130000] [<c00086c0>] (icoll_handle_irq+0x30/0x38) from [<c000e220>] (__irq_svc+0x40/0x4c)
[76526.130000] Exception stack(0xc048bf68 to 0xc048bfb0)
[76526.130000] bf60: 00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[76526.130000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[76526.130000] bfa0: c000f514 c000f520 60000013 ffffffff
[76526.130000] [<c000e220>] (__irq_svc+0x40/0x4c) from [<c000f520>] (default_idle+0x2c/0x34)
[76526.130000] [<c000f520>] (default_idle+0x2c/0x34) from [<c000f65c>] (cpu_idle+0x6c/0xbc)
[76526.130000] [<c000f65c>] (cpu_idle+0x6c/0xbc) from [<c046c6f4>] (start_kernel+0x244/0x284)
[76526.130000] Code: e1530002 2a000007 e8940006 e59f0034 (e5812004)
[76526.130000] ---[ end trace 56ec990f0236b239 ]---
[76526.130000] Kernel panic - not syncing: Fatal exception in interrupt
[78791.350000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped
[78792.290000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped
[78792.700000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped
After two days of test using kernel 3.7.1 it seems that problem is resolved. I'll keep you updated in the next days ;)