OLinuXino-MICRO running way slower than I expected

Started by mremmers, March 30, 2016, 01:26:50 AM

Previous topic - Next topic

mremmers

I am running an essentially bare metal program (my own RTOS), on an OLinuXino-MICRO. I have two parts in my bd file.  First boot_prep, to setup DRam and CPU speed, etc.  Second, my program at 0x40000000.  Everything seems fine, EXECPT, it looks like it is running way too slow.  Running a complex computation, that takes about one second on a 700MHz Raspberry Pi, takes about 45 seconds.  I thought if this CPU was running at 454 MHz, it should be around 1.5 seconds.  Any ideas?

Marty Remmers

JohnS

Depends what it does (there's always a degenerate case) but start by avoiding bare metal.

Run some suitable ordinary benchmarks,  Dhrystone?  Whetstone?  Etc.

John

mremmers

The computation is a 4096 bit mod-exponent, using a large exponent.  Specifically, public-key decryption, using the private key.  I will boot up the Linux SD (version 2.6.35-8-ARCH+), and copy on my code, to compare.  It may take a little while though.

If that also takes 45 seconds, then there is nothing to do, but if it takes 1.5 seconds, then I need to investigate further.

By the way, I did not include the power_prep in my bd file, because I had problems generating it on my Windows cross compiler, and Raspberry Pi cross compiler.

Is there anywhere I can find compiled versions of both power_prep and boot_prep.  I might want to try known working versions of both in my bd file, just to rule mine out.

Thanks for the reply.

Marty Remmers

swahren

Quote from: mremmers on March 30, 2016, 04:55:38 AM
Is there anywhere I can find compiled versions of both power_prep and boot_prep.  I might want to try known working versions of both in my bd file, just to rule mine out.

Hi Marty,

maybe this is helpful:

https://community.freescale.com/thread/326136

BR Stefan

mremmers

Well, I booted my Linux SD (version 2.6.35-8-ARCH+), and copied my code there, compiled it, and it runs fast, just as I thought it would.  About 1.7 seconds.

Now I really want working copies of power_prep and boot_prep binaries to test.

Has anyone compiled the Linux OS for the OlinuXino-MICRO, and still have these two files?

Or

Where could I locate the source, and generate them myself (right on the OlinuXino-MINI I assume)


mremmers

Well, I that was the bootlets source I already downloaded.  But now that I booted up the board in Linux, I am able to compile the bootlets (sort of).

Question 1:  What is the difference between elftosb (I am using version 2.6.1 on my windows system) and elftosb2, which the makefile is trying to use on the Linux board?  It is not there.

Anyway, I copied power_prep and boot_prep onto my windows system, and my elftosb was complaining they were not single segments.  I forced them into single segments, by modifying their link.lds files, then it liked it.  Unfortunately it is still running slow.  When it boots Linux, versus my image, there is a slight different in what it displays.  For the Linux, a series of L's etc. just before executing power_prep and board_prep, even the Linus image.

Question 2: What are those L's, etc?

Displayed when my image starts:

PowerPrep start initialize power...
Battery Voltage = 0.72V
No battery or bad battery                                       detected!!!.Disa
bling battery                                   voltage measurements./r/nMar 30
201618:59:40
EMI_CTRL 0x1C084040
FRAC 0x92926192
init_ddr_mt46v32m16_133Mhz
power 0x00820710
Frac 0x92926192
start change cpu freq
hbus 0x00000003
cpu 0x00010001
ModPow2 version 1.00.00
Enter Priv or Pub:


Displayed when Linux starts:
HTLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLFC
PowerPrep start initialize power...
Battery Voltage = 0.76V
No battery or bad battery                                       detected!!!.Disa
bling battery                                   voltage measurements./r/nLLCSep
19 201201:24:06
EMI_CTRL 0x1C084040
FRAC 0x92926192
init_ddr_mt46v32m16_133Mhz
power 0x00820710
Frac 0x92926192
start change cpu freq
hbus 0x00000003
cpu 0x00010001
LLLLLLLFCLJUncompressing Linux... done, booting the kernel.
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.35-8-ARCH+ (nobody@fermium) (gcc version 4.7.1 20120721 (prere
lease) (GCC) ) #1 PREEMPT Fri Sep 21 17:02:25 UTC 2012
CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
CPU: VIVT data cache, VIVT instruction cache
Machine: iMX233-OLinuXino low cost board
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 16256
Kernel command line: console=ttyAMA0,115200 root=/dev/mmcblk0p2 rw rootwait ssp1
=mmc lcd_panel=tvenc_pal no_console_suspend
PID hash table entries: 256 (order: -2, 1024 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
allocated 327680 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Memory: 64MB = 64MB total
Memory: 56404k/56404k available, 9132k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xfde00000 - 0xffe00000   (  32 MB)
    vmalloc : 0xc4800000 - 0xf0000000   ( 696 MB)
    lowmem  : 0xc0000000 - 0xc4000000   (  64 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .init : 0xc0008000 - 0xc0028000   ( 128 kB)
      .text : 0xc0028000 - 0xc03af000   (3612 kB)
      .data : 0xc03ca000 - 0xc03f6000   ( 176 kB)
SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Hierarchical RCU implementation.
        RCU-based detection of stalled CPUs is disabled.
        Verbose stalled-CPUs detection is disabled.
NR_IRQS:224
Console: colour dummy device 80x30
console [ttyAMA0] enabled
Calibrating delay loop... 226.09 BogoMIPS (lpj=1130496)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
CPU: Testing write buffer coherency: ok
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
regulator: vddd: 800 <--> 1575 mV at 1550 mV fast normal
regulator: vdddbo: 800 <--> 1575 mV fast normal
regulator: vdda: 1500 <--> 2275 mV at 1750 mV fast normal
regulator: vddio: 2800 <--> 3575 mV at 3300 mV fast normal
regulator: overall_current: fast normal
regulator: mxs-duart-1: fast normal
regulator: mxs-bl-1: fast normal
regulator: mxs-i2c-1: fast normal
regulator: mmc_ssp-1: fast normal
regulator: mmc_ssp-2: fast normal
regulator: charger-1: fast normal
regulator: power-test-1: fast normal
regulator: cpufreq-1: fast normal
i.MX IRAM pool: 28 KB@0xc4808000
usb: DR gadget (utmi) registered
bio: create slab <bio-0> at 0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Advanced Linux Sound Architecture Driver Version 1.0.23.
Switching to clocksource mxs clock source
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 4096K
Bus freq driver module loaded
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
msgmni has been set to 118
alg: No test for stdrng (krng)
cryptodev: driver loaded.
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Console: switching to colour frame buffer device 90x36
mxs-duart.0: ttyAMA0 at MMIO 0x80070000 (irq = 0) is a DebugUART
mxs-auart.1: ttySP1 at MMIO 0x8006c000 (irq = 24) is a mxs-auart.1
Found APPUART 3.0.0
brd: module loaded
loop: module loaded
usbmon: debugfs is not available
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
fsl-ehci fsl-ehci: Freescale On-Chip EHCI Host Controller
fsl-ehci fsl-ehci: new USB bus registered, assigned bus number 1
fsl-ehci fsl-ehci: irq 11, io base 0x80080000
fsl-ehci fsl-ehci: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Freescale On-Chip EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.35-8-ARCH+ ehci_hcd
usb usb1: SerialNumber: fsl-ehci
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver libusual
ARC USBOTG Device Controller driver (1 August 2005)
udc: request mem region for fsl-usb2-udc failed
fsl-usb2-udc: probe of fsl-usb2-udc failed with error -16
mice: PS/2 mouse device common for all mice
MXS RTC driver v1.0 hardware v2.0.0
mxs-rtc mxs-rtc.0: rtc core: registered mxs-rtc as rtc0
i2c /dev entries driver
WARNING : No battery connected !
Aborting power driver initialization
mxs-battery: probe of mxs-battery.0 failed with error 1
mxs watchdog: initialized, heartbeat 19 sec
mxs-mmc: MXS SSP Controller MMC Interface driver
ssp_set_rate: error -110
mxs-mmc mxs-mmc.0: mmc0: MXS SSP MMC DMAIRQ 14 ERRIRQ 15
dcp dcp.0: DCP crypto enabled.!
mxs-adc-audio mxs-adc-audio.0: MXS ADC/DAC Audio Codec
No device for DAI mxs adc/dac
No device for DAI mxs adc/dac
asoc: mxs adc/dac <-> mxs adc/dac mapping ok
ALSA device list:
  #0: MXS EVK (mxs adc/dac)
TCP cubic registered
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
registered taskstats version 1
mxs-rtc mxs-rtc.0: setting system clock to 1970-01-01 00:00:10 UTC (10)
RAMDISK: Couldn't find valid RAM disk image starting at 0.
Waiting for root device /dev/mmcblk0p2...
mmc0: new high speed SD card at address b368
mmcblk0: mmc0:b368 00000 1.86 GiB
mmcblk0: p1 p2 p3
EXT4-fs (mmcblk0p2): recovery complete
EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext2 filesystem) on device 179:2.
devtmpfs: mounted
Freeing init memory: 128K
INIT: version 2.88 booting

> Arch Linux ARM

> http://www.archlinuxarm.org

   ------------------------------
:: Mounting Root Read-Only    [BUSY] EXT4-fs (mmcblk0p2): re-mounted. Opts: barr
ier=1,data=ordered
   [DONE]
:: Adjusting system time and setting kernel timezone    [BUSY]    [DONE]
:: Starting UDev Daemon    [BUSY] <30>systemd-udevd[63]: starting version 186
   [DONE]
:: Triggering UDev uevents    [BUSY]    [DONE]
:: Loading User-specified Modules    [BUSY]    [DONE]
:: Waiting for UDev uevents to be processed    [BUSY]    [DONE]
:: Configuring Virtual Consoles    [BUSY]    [DONE]
:: Bringing up loopback interface    [BUSY]    [DONE]
:: Unlocking encrypted volumes    [BUSY]    [DONE]
:: Checking Filesystems    [BUSY]    [DONE]
:: Remounting Root and API filesystems    [BUSY] EXT4-fs (mmcblk0p2): re-mounted
. Opts: barrier=1,data=ordered
   [DONE]
:: Mounting Local Filesystems    [BUSY]    [DONE]
:: Activating Swap    [BUSY]    [DONE]
:: Configuring Time Zone    [BUSY]    [DONE]
:: Initializing Random Seed    [BUSY]    [DONE]
:: Removing Leftover Files    [BUSY]    [DONE]
:: Setting Hostname: alarm    [BUSY]    [DONE]
:: Saving dmesg Log    [BUSY]    [DONE]
INIT: Entering runlevel: 3
:: Starting Syslog-NG    [BUSY]    [DONE]
:: Starting Network    [BUSY]
Error: unknown interface in /etc/rc.conf: `usb0'
   [DONE]
:: Mounting Network Filesystems    [BUSY]    [DONE]
:: Starting crond daemon    [BUSY]    [DONE]
:: Starting Secure Shell Daemon    [BUSY]    [DONE]

Arch Linux 2.6.35-8-ARCH+ (ttyAMA0)

alarm login:

JohnS

It's usually not too tough to cross-compile so I'd grab your choice of kernel & uboot, build the lot and then work from there.  You can then look at source code / change it if needed.

I suggest Linux on a PC as host.

John

mremmers

Do you know where I can locate the source for Arch Linux 2.6.35-8-ARCH+

Also

Does elftosb2 handle the elf with multiple segments (.text/.bss/.rodata etc.)?
I have been forcing them into one segment with ld, but it looks like no one else does.
The version of elftosb (no 2), that I use complains about that.



mremmers

Progress.

It seems that the Instruction and Data caches were disabled.

I started by enabling the instruction cache (MMU is still off).
This is in System Co-processor (cp15) register c1.

That got me down to 16 seconds.

Enabling the Data cache will take a little more code.
I will need to set it up to not cache addresses in the 0x80000000 range,
to prevent writes to ports from being cached.

That will almost certainly get me down to the 1.7 seconds.

-Marty

JohnS


mremmers

Success!

I set up the MMU with fixed tables, that describe the actual memory locations.

The main translation table has 4096 entries, one for each MB.

I filled in the first entry with a pointer to my Coarse Page Table.
and I filled in 64 entries for the 64 Meg at 0x40000000 (as cachable and bufferable)
and I filled in 2 entries for the ports at 0x80000000 (as not cachable and not bufferable)

The Coarse page table has 256 entries, one for each 4KB

I filled in the first 8 entries for the 32KB at 0x00000000 (as cachable and bufferable)

Then I turned on the MMU, Instruction Cache and Data Cache.

My test Math program now runs at full speed.  About 1.65 seconds.  Maybe just a bit faster than within Linux.

The memory tables are set up so all the memory (VM) appears exactly where it physically is.  There
is an added bonus though.  The 32KB at location zero, was physically shadowed multiple times just
beyond that.  Now it only appears at zero.  Trying to access the other places generates a Data Abort.
The same for the 64MB.  It looked like it was shadowed just beyond as well, but now the same protection
goes.

I was playing with this board to learn, and I am learning a lot.

Next up, I want to learn all I can about USB, and the USB controller in the iMX233.

Thanks for the support.
-Marty

JohnS

#13
Excellent!  Sounds like about 25x faster than the original.

Think whether anyone else might want the code if you're up for sharing as hardly anyone does this stuff and more would benefit from at least reading up on how.

John