Using MPLAB X IDE to program PIC32-HMZ144, but produces very slow code.

Started by tonybarry, March 31, 2018, 04:01:12 AM

Previous topic - Next topic

tonybarry

Greetings all,

I have a PIC32-HMZ144 board from Olimex, and can program the board using MPLAB X IDE and PICKIT-3.  Minimal pain to get the demo "Blink" program running. Nice.

However, the code is very slow. Running Blink produces just a 155kHz output as measured by external oscilloscope when the delays are removed.

I would have thought that a 200MHz clock would produce code that goes faster. This is 16MHz Arduino speeds.

[The usual header for the Blink program is here, but not shown for conciseness]

#define SCOPE_TRIS  TRISHbits.TRISH1
#define SCOPE_LAT   LATHbits.LATH1

int main()
{
    SCOPE_TRIS = 0;
   
    while (1)
    {
        SCOPE_LAT = !SCOPE_LAT;
        SCOPE_LAT = !SCOPE_LAT;
    }
    return (EXIT_SUCCESS);
}

KeesZagers

Are you using the free version of the compiler? As far as I understood "built-in" delays are used in the free version. Optimizer removes the delays if you pay for it. I never tried the standard I/O ports, but I had much better timing results by writing to and reading from the CAN registers directly. This was on the PIC32MX, but will probably also the case on the PIC32MZ.

JohnS

You can probably list the ASM code to see what it does.

If the code runs from flash it will have quite a number of Wait States I think - do check how many and if so maybe try RAM?

(The equivalent loop on the RPi is about 22MHz...)

John

tonybarry

Thank you gentlemen.  I shall see what the asm looks like.

Regards,
Tony Barry

tonybarry

Some further updates on making the HMZ work a bit faster than an Arduino Uno.

1. The headers for the Blink code appear to be less than optimal.
    Original headers appear to be for the ECG instead of EFG device.
1A. The timing headers appear to be less than optimal - the datasheet for the EFG provides the following :-

// DEVCFG2
#include <xc.h>                         //########## from datasheet table A2 page 702

//#pragma config FPLLIDIV = DIV_8         // System PLL Input Divider (8x Divider)
#pragma config FPLLIDIV = DIV_3         
//#pragma config FPLLRNG = RANGE_34_68_MHZ  // System PLL Input Range (34-68 MHz Input)
#pragma config FPLLRNG = RANGE_5_10_MHZ
//#pragma config FPLLICLK = PLL_FRC       // System PLL Input Clock Selection (FRC is input to the System PLL)
#pragma config FPLLICLK = PLL_POSC     
//#pragma config FPLLMULT = MUL_128       // System PLL Multiplier (PLL Multiply by 128)
#pragma config FPLLMULT = MUL_50       
//#pragma config FPLLODIV = DIV_32        // System PLL Output Clock Divider (32x Divider)
#pragma config FPLLODIV = DIV_2         
#pragma config UPLLFSEL = FREQ_24MHZ    // USB PLL Input Frequency Selection (USB PLL input is 24 MHz)

#define SYSFREQ (200000000L)            //from datasheet

The commented lines indicate the original header values, the active pragmas are from the datasheet, table A2 p 702.

The #define SYSFREQ was not in the original headers for Blink.  This appears to be the most important improvement.

The relevant code (arranged as an infinite loop) is the following two C commands (they are probably macros).

        SCOPE_LAT = !SCOPE_LAT;
        SCOPE_LAT = !SCOPE_LAT;

With these mods, the operation improved from 155kHz to 3.8MHz ... nicer.
The asm for the the commands was eight instructions per C line of code.  No obvious NOPs. But some testing and ANDing and ORing to exclude bits from the write.

OK, how fast can we go here ? LAT can be done as a single write with minimal testing.

    int pOn = 0xFFFF;
    int pOff = 0x0;
   
    while (1)
    {   
        LATH = pOn;
        LATH = pOff;
    }

This produces three lines of asm per line of C code, and runs at 25.6MHz.  Somewhat better than 155kHz ...

But still not fast enough for me - I need about 60 to 100 MHz to get the data moved around.

If anyone has any ideas please advise.  Perhaps an FPGA is in order ...

Regards,
Tony Barry


JohnS

You can use the INV port along the lines of

while (1)
LATINV = 0xffff;

But the looping (jumping in ASM) will probably be quite slow.

I mentioned WS (wait states) & RAM before - as you go faster they'll matter more.

You can potentially unroll the loop but you may well get a square wave with a longer pulse at each loop end.

Some chips allow DMA to I/O ports if that would be suitable for what you really have in mind...

John

tonybarry

Thank you for the ideas, JohnS.

The DMA from a dedicated port would be idea, except I need to sync with an external clock to read in the actual data.

I shall continue to report on the situation as I learn more.

Regards,
Tony Barry

kyrk.5

while (1)
    {
        SCOPE_LAT = !SCOPE_LAT;
        SCOPE_LAT = !SCOPE_LAT;
    }

This is the problem. Accessing the ports is a read modify write operation, where the port is much more slower than the CPU. So it might take some extra cycles to access the port and read a data then write it back. Check the clock settings for the peripheries. There are also some limitation.
I am not sure if what the maximum frequency is for port accessing, but last time I was testing an SD Card with bitbang SDIO interface versus the simple hw driven SPI interface. The speed was the same becasue the SDIO was 4 bits width but slower due to slow port access. But I do not remember the frequencies exactly.

kyrk.5

I think (but I am still not sure) that around 20MHz you should be able to toggle the PIN. But you have to check that the core is clocket at 200MHz and tha the peripheries are also clocked at max. I think it is not possible to clock the peripheries with 200MHz only slower. That is way the IOs are limited somewhere - i guess - around 20 or 40 MHz. Due to this I think the DMA also wont be much more faster.

If you need faster peripheries, you need to use some HW stuff. Not sure if the device is having QSPI.