Kernel oops related to ethernet

Started by cnoviello, January 23, 2013, 07:16:27 PM

Previous topic - Next topic

cnoviello

Hi,
Sometime ago I reported stability issues related to Olinuxino-maxi. After a long period of test, I reached to conclusion that my problems are related to ethernet adapter.
My application is a python application running cherrypy web server. After several hours of execution, this oops appears on kernel ring buffer:

  0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   18, btch:   3 usd:   2
active_anon:2466 inactive_anon:2207 isolated_anon:0
active_file:2101 inactive_file:5738 isolated_file:0
unevictable:0 dirty:8 writeback:0 unstable:0
free:792 slab_reclaimable:523 slab_unreclaimable:441
mapped:1163 shmem:62 pagetables:90 bounce:0
DMA free:2140kB min:188kB low:232kB high:280kB active_anon:2152kB inactive_anon:336kB active_file:1172kB inactive_file:580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12192kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4kB slab_reclaimable:360kB slab_unreclaimable:48kB kernel_stack:16kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 51 51
Normal free:1028kB min:824kB low:1028kB high:1236kB active_anon:7712kB inactive_anon:8492kB active_file:7232kB inactive_file:22372kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52832kB mlocked:0kB dirty:32kB writeback:0kB mapped:4652kB shmem:244kB slab_reclaimable:1732kB slab_unreclaimable:1716kB kernel_stack:488kB pagetables:344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 151*4kB 124*8kB 32*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2140kB
Normal: 257*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1028kB
7902 total pagecache pages
16384 pages of RAM
911 free pages
1268 reserved pages
814 slab pages
7792 pages shared
0 pages swap cached
smsc95xx 1-1.1:1.0: usb0: kevent 2 may have been dropped
python: page allocation failure. order:3, mode:0x4020
[<c002f100>] (unwind_backtrace+0x0/0xe0) from [<c00797d0>] (__alloc_pages_nodemask+0x4c0/0x528)
[<c00797d0>] (__alloc_pages_nodemask+0x4c0/0x528) from [<c0079848>] (__get_free_pages+0x10/0x3c)
[<c0079848>] (__get_free_pages+0x10/0x3c) from [<c025965c>] (__alloc_skb+0x50/0xe4)
[<c025965c>] (__alloc_skb+0x50/0xe4) from [<c01df3fc>] (rx_submit+0x24/0x1d4)
[<c01df3fc>] (rx_submit+0x24/0x1d4) from [<c01e0014>] (usbnet_bh+0x178/0x22c)
[<c01e0014>] (usbnet_bh+0x178/0x22c) from [<c0048024>] (tasklet_action+0x80/0xd4)
[<c0048024>] (tasklet_action+0x80/0xd4) from [<c0048518>] (__do_softirq+0x90/0x124)
[<c0048518>] (__do_softirq+0x90/0x124) from [<c004896c>] (irq_exit+0x44/0xa0)
[<c004896c>] (irq_exit+0x44/0xa0) from [<c002a06c>] (asm_do_IRQ+0x6c/0x88)
[<c002a06c>] (asm_do_IRQ+0x6c/0x88) from [<c002ab88>] (__irq_usr+0x48/0xc0)
Exception stack(0xc347bfb0 to 0xc347bff8)
bfa0:                                     0000000c 004401e8 0000000b 401851e8
bfc0: 403d1440 00440338 0044033c 0044033c 00000000 40165200 004401f8 006ba698
bfe0: 401851e8 431b80f0 4008f484 400eb618 20000010 ffffffff
Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   18, btch:   3 usd:   2


After two or three of these messages, main python process crashes, causing my application to stop.
This is the kernel release I'm using:

Linux version 2.6.35.3-10.12.01+yocto+g0ea8cb9 (admininstrator@igloo.airqnetworks.com) (gcc version 4.6.4 20120303 (prerelease) (GCC) ) #3 PREEMPT Sat Nov 3 15:45:01 CET 2012
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177


Any hints?

Thanks  ;)

Fabio Estevam

Do you see the same issue with a 3.7 or 3.8-rc4 kernel?

Regards,

Fabio Estevam

cnoviello

Hi Fabio,
As I said I'm using an old kernel (2.6.35). Do you suggest me to try to compile a 3.x kernel?

Fabio Estevam

Exactly, please try the latest kernel.

cnoviello

Ok I'll try even if it's not a simple task since I'm adapting an existing sysfs to Olinuxino. We actually have hardware solutions based on a different ARM and I would like to move to Olinuxino, but it's not simple to adapt existing OS configuration to archlinux & co.

I'll try and let you know.

Really thanks for your support.

Carmine Noviello

cnoviello

It was simpler than I expected. Just a couple of changes. I'm starting tests right now.


earny

With the latest kernel i get still BUG and Oops related to smsc95xx USB ethernet.
When using a different USB ethernet the maxi works fine.


[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 3.8.0-rc4+ (earny@olofi) (gcc version 4.7.2 (Gentoo 4.7.2 p1.3, pie-0.5.5) ) #9 Sun Jan 20 12:40:46 UTC 2013
[    0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
[    0.000000] CPU: VIVT data cache, VIVT instruction cache
[    0.000000] Machine: Freescale i.MX23 (Device Tree), model: i.MX23 Olinuxino Low Cost Board
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] On node 0 totalpages: 16384
[    0.000000] free_area_init_node: node 0, pgdat c04b2854, node_mem_map c0503000

[....]

[    0.930000] Freeing init memory: 112K
[    1.090000] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
[    1.090000] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    1.110000] hub 1-1:1.0: USB hub found
[    1.120000] hub 1-1:1.0: 3 ports detected
[    1.410000] usb 1-1.1: new high-speed USB device number 3 using ci_hdrc
[    1.530000] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    1.530000] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    1.560000] smsc95xx v1.0.4
[    1.650000] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-ci_hdrc.0-1.1, smsc95xx USB 2.0 Ethernet, 9a:71:92:da:2b:82
[    6.080000] udevd[150]: starting version 171
[   29.490000] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
[   30.320000] BUG: scheduling while atomic: swapper/0/0x40000100
[   30.320000] Modules linked in:
[   30.320000] [<c001359c>] (unwind_backtrace+0x0/0xe0) from [<c034b340>] (__schedule_bug+0x48/0x5c)
[   30.320000] [<c034b340>] (__schedule_bug+0x48/0x5c) from [<c03500e4>] (__schedule+0x44/0x430)
[   30.320000] [<c03500e4>] (__schedule+0x44/0x430) from [<c003ed64>] (__cond_resched+0x24/0x34)
[   30.320000] [<c003ed64>] (__cond_resched+0x24/0x34) from [<c0350564>] (_cond_resched+0x3c/0x44)
[   30.320000] [<c0350564>] (_cond_resched+0x3c/0x44) from [<c00162c4>] (do_alignment+0x5c/0x39c)
[   30.320000] [<c00162c4>] (do_alignment+0x5c/0x39c) from [<c0008594>] (do_DataAbort+0x34/0x98)
[   30.320000] [<c0008594>] (do_DataAbort+0x34/0x98) from [<c000e1b8>] (__dabt_svc+0x38/0x60)
[   30.320000] Exception stack(0xc048be08 to 0xc048be50)
[   30.320000] be00:                   c2d40036 00000178 00000005 c2d40036 c04af4c4 00000000
[   30.320000] be20: c2cef6c0 c2cef6c0 c3a7c000 00000008 00000000 c04927cc c04af4c4 c048be50
[   30.320000] be40: c02e913c c030790c 80000013 ffffffff
[   30.320000] [<c000e1b8>] (__dabt_svc+0x38/0x60) from [<c030790c>] (ip_rcv+0x12c/0x52c)
[   30.320000] [<c030790c>] (ip_rcv+0x12c/0x52c) from [<c02e913c>] (__netif_receive_skb+0x4dc/0x5d0)
[   30.320000] [<c02e913c>] (__netif_receive_skb+0x4dc/0x5d0) from [<c02e92ac>] (process_backlog+0x7c/0x140)
[   30.320000] [<c02e92ac>] (process_backlog+0x7c/0x140) from [<c02eb3a8>] (net_rx_action+0x64/0x1ec)
[   30.320000] [<c02eb3a8>] (net_rx_action+0x64/0x1ec) from [<c0020e68>] (__do_softirq+0xac/0x20c)
[   30.320000] [<c0020e68>] (__do_softirq+0xac/0x20c) from [<c002127c>] (irq_exit+0x40/0x8c)
[   30.320000] [<c002127c>] (irq_exit+0x40/0x8c) from [<c000f48c>] (handle_IRQ+0x64/0x84)
[   30.320000] [<c000f48c>] (handle_IRQ+0x64/0x84) from [<c00086c0>] (icoll_handle_irq+0x30/0x38)
[   30.320000] [<c00086c0>] (icoll_handle_irq+0x30/0x38) from [<c000e220>] (__irq_svc+0x40/0x4c)
[   30.320000] Exception stack(0xc048bf68 to 0xc048bfb0)
[   30.320000] bf60:                   00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[   30.320000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[   30.320000] bfa0: c000f514 c000f520 60000013 ffffffff
[   30.320000] [<c000e220>] (__irq_svc+0x40/0x4c) from [<c000f520>] (default_idle+0x2c/0x34)
[   30.320000] [<c000f520>] (default_idle+0x2c/0x34) from [<c000f65c>] (cpu_idle+0x6c/0xbc)
[   30.320000] [<c000f65c>] (cpu_idle+0x6c/0xbc) from [<c046c6f4>] (start_kernel+0x244/0x284)
[   42.900000] BUG: scheduling while atomic: swapper/0/0x40000100
[   42.900000] Modules linked in:
[   42.900000] [<c001359c>] (unwind_backtrace+0x0/0xe0) from [<c034b340>] (__schedule_bug+0x48/0x5c)
[   42.900000] [<c034b340>] (__schedule_bug+0x48/0x5c) from [<c03500e4>] (__schedule+0x44/0x430)
[   42.900000] [<c03500e4>] (__schedule+0x44/0x430) from [<c003ed64>] (__cond_resched+0x24/0x34)

[ ... repeats every now and then until  ... ]

[76526.130000] Internal error: Oops: 817 [#1] ARM
[76526.130000] Modules linked in:
[76526.130000] CPU: 0    Tainted: G        W     (3.8.0-rc4+ #9)
[76526.130000] PC is at process_backlog+0x100/0x140
[76526.130000] LR is at tcp_rcv_state_process+0xb94/0xc20
[76526.130000] pc : [<c02e9330>]    lr : [<c031dde0>]    psr: 80000093
[76526.130000] sp : c048beb0  ip : c0589930  fp : c0496930
[76526.130000] r10: 00000000  r9 : 00744fd7  r8 : c04b2e94
[76526.130000] r7 : c04b2e80  r6 : 00000003  r5 : 00000002  r4 : c04b2ec0
[76526.130000] r3 : 00000000  r2 : 00200200  r1 : 00100100  r0 : 00100100
[76526.130000] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[76526.130000] Control: 0005317f  Table: 42114000  DAC: 00000017
[76526.130000] Process swapper (pid: 0, stack limit = 0xc048a1b8)
[76526.130000] Stack: (0xc048beb0 to 0xc048c000)
[76526.130000] bea0:                                     c02e9230 c04b2ec0 c04b2e80 c048a000
[76526.130000] bec0: 0000012c 00000040 c04b2e88 c02eb3a8 c04d5ecc 00744fd7 c04d5ecc 00000001
[76526.130000] bee0: 0000000c c048a000 00000003 00000100 c04d5ea0 00000009 c04d5ec0 c0020e68
[76526.130000] bf00: c386ad80 c3979700 001195fb c386ad80 00000000 00200000 c048bf9c 00000082
[76526.130000] bf20: 00000000 c04b4c4c c048bf9c 40004000 41069265 40482d24 00000000 c002127c
[76526.130000] bf40: 00000000 c000f48c c000f514 f5000000 c048bf68 c00086c0 c000f520 60000013
[76526.130000] bf60: ffffffff c000e220 00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[76526.130000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[76526.130000] bfa0: c000f514 c000f520 60000013 ffffffff c000f4f4 c000f65c c0492378 ffffffff
[76526.130000] bfc0: c0484230 c046c6f4 ffffffff ffffffff c046c284 00000000 00000000 c0484230
[76526.130000] bfe0: 00000000 00053175 c049201c c048422c c0494814 40008040 00000000 00000000
[76526.130000] [<c02e9330>] (process_backlog+0x100/0x140) from [<c02eb3a8>] (net_rx_action+0x64/0x1ec)
[76526.130000] [<c02eb3a8>] (net_rx_action+0x64/0x1ec) from [<c0020e68>] (__do_softirq+0xac/0x20c)
[76526.130000] [<c0020e68>] (__do_softirq+0xac/0x20c) from [<c002127c>] (irq_exit+0x40/0x8c)
[76526.130000] [<c002127c>] (irq_exit+0x40/0x8c) from [<c000f48c>] (handle_IRQ+0x64/0x84)
[76526.130000] [<c000f48c>] (handle_IRQ+0x64/0x84) from [<c00086c0>] (icoll_handle_irq+0x30/0x38)
[76526.130000] [<c00086c0>] (icoll_handle_irq+0x30/0x38) from [<c000e220>] (__irq_svc+0x40/0x4c)
[76526.130000] Exception stack(0xc048bf68 to 0xc048bfb0)
[76526.130000] bf60:                   00000000 0005317f 0005217f 60000013 c048a000 c04b49e8
[76526.130000] bf80: c0494820 c05871c0 40004000 41069265 40482d24 00000000 600000d3 c048bfb0
[76526.130000] bfa0: c000f514 c000f520 60000013 ffffffff
[76526.130000] [<c000e220>] (__irq_svc+0x40/0x4c) from [<c000f520>] (default_idle+0x2c/0x34)
[76526.130000] [<c000f520>] (default_idle+0x2c/0x34) from [<c000f65c>] (cpu_idle+0x6c/0xbc)
[76526.130000] [<c000f65c>] (cpu_idle+0x6c/0xbc) from [<c046c6f4>] (start_kernel+0x244/0x284)
[76526.130000] Code: e1530002 2a000007 e8940006 e59f0034 (e5812004)
[76526.130000] ---[ end trace 56ec990f0236b239 ]---
[76526.130000] Kernel panic - not syncing: Fatal exception in interrupt
[78791.350000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped
[78792.290000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped
[78792.700000] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped

cnoviello

After two days of test using kernel 3.7.1 it seems that problem is resolved. I'll keep you updated in the next days  ;)