Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => A20 => Topic started by: bart on October 26, 2016, 02:32:20 PM

Title: A20 Lime CAN: controller-problem{rx-overflow}
Post by: bart on October 26, 2016, 02:32:20 PM
Hi everyone,

after some trial and error, I got the sun7i_can Kernel module working with an A20 Lime1 Board. I am using the sunxi Kernel, version 3.4.104+.

But the driver seems very flaky. There are two issues:

1. I connect everything correctly to a CAN bus where another device is send data. I run the following command:
ip link set can0 down
ip link set can0 type can bitrate 500000 loopback off
ip link set can0 up
candump -cae can0,0:0,#FFFFFFFF

Nothing happens, until I run
cansend can0 5A1#A5 (just a dummy value)
in another terminal. Then the data starts pouring out.

2. This is the more serious issue:
After I start receiving data as described above, the connection works for a while. Then there are more and more errors like this:
can0  20000004   [8]  00 01 00 00 00 00 00 00   ERRORFRAME
        controller-problem{rx-overflow}
This seems to be same problem described here:
https://sourceforge.net/p/can4linux/discussion/1013310/thread/2eeb9098/#de53

In particular, the issue occurs more often if I dump the output to the console instead of a file, but it occurs in both cases. Once the issue arises, no data is received anymore, only overflow errors.

500kbit/s does not seems like a lot of data for a 1Ghz CPU, how can I avoid these overflow issues?


Thank you very much,
Bart
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: KeesZagers on October 26, 2016, 10:15:23 PM
1. Everything connected correctly. How many nodes are on the bus? Are all CAH Hi's connected together? The same for the CAN Lo's and the CAN Gnd's? Are both ends of the bus terminated by 120 ohm resisters?

2. The bitrate of 500 kB/sec is not relevant in this case. The CAN controller is handling these bits. Relevant is how many messages are sent on the bus per second. The CPU handles the message when it is received completely and correct. It is always nice to test a new node, if you already have a working network. If you have only one other node, it cannot send its message until your test node is connected. It will repeat trying until it receives an ACK from your node. In the document you referred to the user sent 2000 messages per second. This is a quite high busload. Are you also testing with so many messages per second. In that case I can imagine that you get overflow errors. If the FIFO is full and you keep sending the messages at this speed, it will never get the chance to get empty again, so I can imagine that only overflow errors keep coming.

3. I think that one of the developers of CAN4LINUX (Heinz) is also watching this forum. Maybe he has more information.
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: bart on October 27, 2016, 05:00:48 PM
Thank you very much for your reply, I appreciate the input.

1. There a handful of nodes on the bus, no more than 10 though. The physical connection should be ok, since I can receive messages, shouldn't it?

2. I would assume the bitrate is relevant, since it limits the message rate. If 2000 messages per second can be sent without saturating the bus, than the CAN receiver should be able to process them, shouldn't it? In either case, the target application is connecting to a machine to read telemetry data and we have no influence on the message rate.

3. I would very much appreciate any help.
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: KeesZagers on October 27, 2016, 08:19:56 PM
1. OK

2. What I meant to say is: You can have a 500 kbit/sec network with only one message per second. The driver should not have any problem with that, because it will be activated only once per second. The CANcontroller hardware will receive a burst of about 100 bits during the message and after the 200 uSec it will set the message available in a FIFO. The driver has in this case 1 second the time to read it.

3. Unfortunately I don't know the details of the CAN4Linux driver, but I will kick Heinz through my hotline :)
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: JohnS on October 27, 2016, 09:33:32 PM
bart - which CAN transceiver are you using?

Also, how many nodes?

(Maybe I'm not understanding the setup?)

John
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: bart on October 28, 2016, 09:51:07 AM
The transceiver is the Olimex board https://www.olimex.com/wiki/A20-CAN (MCP2551).

Currently we connect that to a rack that simulates a car with a few controllers on it, so I'm not 100% sure of the number of nodes on the bus, but it's definitely less than 10. The bus works fine otherwise, the existing nodes show no errors, it's only reading that fails.

Thank you very much,
Bart
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: JohnS on October 28, 2016, 11:55:41 AM
Thanks.

Yes that ought to work so you may have found a (software) bug.  I would have suggested contacting the driver maintainer (Heinz) directly but seems already to be in progress.

If there are any non-default values you've used then I guess Heinz would want to know them.

EDIT: er, I'm not sure it's Heinz.  He does can4linux but I see you're using the mainline driver.

Are you able to at least add some debug printk's etc to the driver?  Or try a changed version if Heinz (or anyone) had one to try?
(Essentially, this means rebuilding the kernel which can sound scary but isn't really.)

John
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: bart on October 28, 2016, 11:59:07 AM
Yes, I've been compiling the kernel anyway. We are still on the sunxi branch kernel, so the CAN driver is not included, I'm using the one from https://github.com/btolfa/sunxi-can-driver (as suggested by the Olimex documentation). I suspect that driver may be broken?

I'm trying to build the mainline kernel right now, to see if that works.
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: JohnS on October 28, 2016, 12:22:54 PM
You might try an email to the person shown in the source (Peter Chen).

BTW there's a fair chance the driver gets an interrupt about the problem but is perhaps not handling it properly (or at all?).

John
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: bart on October 28, 2016, 12:39:44 PM
I'm not very familiar with drivers, but from what I understand the driver does handle the error interrupt:

    if (isrc & DATA_ORUNI) {
                /* data overrun interrupt */
                netdev_dbg(dev, "data overrun interrupt\n");
                cf->can_id |= CAN_ERR_CRTL;
                cf->data[1] = CAN_ERR_CRTL_RX_OVERFLOW;
                stats->rx_over_errors++;
                stats->rx_errors++;
                sun7i_can_write_cmdreg(priv, CLEAR_DOVERRUN);        /* clear bit */
}


It looks like maybe clearing the flag doesn't always work though?

In any case, I'm away on a holiday for a while, so my colleagues will take it from here.

Again, thank you very much!
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: JohnS on October 28, 2016, 01:16:32 PM
hmm, your reported error does not appear to match that code fragment.

Have a good hol!

John
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: KeesZagers on October 28, 2016, 03:06:27 PM
Heinz is informed and will probably react soon.

I'm a bit confused after looking at the sunxi driver. I'm missing the link to CAN4Linux, however in your first message you referred to the Sourceforge issues about CAN4Linux with the same problem.

In the mean time it would be good to know how many messages per second are coming over the bus. E.g. if this is more than 1000 messages per second, I can imagine that the driver will be overloaded. And if you get an overflow and the bus remains at that high speed, all the next messages will get that overflow also.

Have a nice holiday.

Kees
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: Heinz on October 29, 2016, 01:22:08 PM
Hello,

thanks to Kees I'm now following your discussion.

William, who contacted me as can4linux maintainer,  is using the A20 and he tried both, can4linux and SocketCAN but both failed on receiving. After some more or less long time the receiver stops because the CAN controller gets no interrupts any more after some time.
We did exchange a lot of messages and did change the can4linux code, but without success.
This is what he wrote in his last email:

"After spending about many hours on this problem, I've basically all but
confirmed this is a hardware problem."

I will point William to this forum, may be he will share more information with you.

Heinz
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: JohnS on October 29, 2016, 01:42:41 PM
You don't think it's that an interrupt came in but was not handled fully so after that does not come in again?

John
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: Heinz on November 01, 2016, 07:41:33 PM
@JohnS
if you were asking me?
No that is may be to trivial error. As far as I know the last test used heavy traffic and the "hanging" happens not so often.
If the receive interrupt is not reset correctly it should happen more often, may by already at the second message on the bus.
Title: Re: A20 Lime CAN: controller-problem{rx-overflow}
Post by: KeesZagers on November 02, 2016, 10:32:19 AM
I read back the whole discussion and what was interesting is the statement in the first message of Bart:

In particular, the issue occurs more often if I dump the output to the console instead of a file, but it occurs in both cases.

So it may be not influenced by the CAN load, however it seems to be influenced by the system load. Terminal and file communication are also interrupt driven processes. So some kind of conflict in handling interrupts, probably with different priorities? I'm not familiar with the A20 structure, but I know it is a hell of a job to handle the multiple interrupts of a PIC32 controller in a correct way.