Hi everyone
I have a question about LIME vs LIME2. I see one has single RAM and the other two. From what I understand the two DDR3 chips share the address lines and have different data lines. So my guess is that in effect with single chip the CPU works in 16bit mode and with two - 32 bit mode (ram bus). Theoretically that means you have half the bandwidth to the RAM with single chip.
Is that correct? Has anyone done any actual measurements / speed tests on the boards to compare? Or am I wrong and the two chips just give more space rather than speed?
Cheers
Ok, apparently noone has a clue (or interest) on this topic, so I did a bit of testing myself. And got some extremely weird results which I would like to share with whoever lands here.
Hardware - cubieboard2 (my lime2 got fried and haven't replaced it just yet, will test with it too when I get a new one)
Kernel - 4.10.0 mainstream latest
U-boot - 2017.03-rc2
The board has two 512MB DDR3 chips. To test how it performs with a single one I went to the u-boot source board/sunxi/dram_sun5i_auto.c and changed the values for density, io_width and bus_width, setting them to 4096, 16 and 16 respectively. This causes the dram initialization to skip detection and set them to these values. Effectively both u-boot and linux report 512MB ram, so I suppose I have managed to disable the second DDR3 chip. Whether that is actually what happens - I cannot be entirely sure.
Testing memory - single chip vs two chips
1. Using dd with memory mapped disk at /mnt, writing to a test file, reading from /dev/zero
(command: sudo mount -t tmpfs /mnt /mnt)
** Single chip tests:
dd if=/dev/zero of=/mnt/test bs=16 count=100000 -> 1.4MB/s
dd if=/dev/zero of=/mnt/test bs=64 count=100000 -> 5.3MB/s
dd if=/dev/zero of=/mnt/test bs=256 count=100000 -> 18.4MB/s
dd if=/dev/zero of=/mnt/test bs=1024 count=100000 -> 53.6MB/s
dd if=/dev/zero of=/mnt/test bs=2048 count=100000 -> 79.9MB/s
** Two chips tests:
dd if=/dev/zero of=/mnt/test bs=16 count=100000 -> 1.3MB/s
dd if=/dev/zero of=/mnt/test bs=64 count=100000 -> 4.8MB/s
dd if=/dev/zero of=/mnt/test bs=256 count=100000 -> 17.4MB/s
dd if=/dev/zero of=/mnt/test bs=1024 count=100000 -> 48.8MB/s
dd if=/dev/zero of=/mnt/test bs=2048 count=100000 -> 73.1MB/s
2. Testing with mbw tool (installed with apt-get)
** Single chip tests:
mbw 16 | grep AVG
AVG Method: MEMCPY Elapsed: 0.05149 MiB: 16.00000 Copy: 310.751 MiB/s
AVG Method: DUMB Elapsed: 0.03077 MiB: 16.00000 Copy: 519.987 MiB/s
AVG Method: MCBLOCK Elapsed: 0.02800 MiB: 16.00000 Copy: 571.353 MiB/s
mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.10302 MiB: 32.00000 Copy: 310.631 MiB/s
AVG Method: DUMB Elapsed: 0.06150 MiB: 32.00000 Copy: 520.344 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05690 MiB: 32.00000 Copy: 562.425 MiB/s
mbw 128 | grep AVG
AVG Method: MEMCPY Elapsed: 0.41162 MiB: 128.00000 Copy: 310.966 MiB/s
AVG Method: DUMB Elapsed: 0.24556 MiB: 128.00000 Copy: 521.268 MiB/s
AVG Method: MCBLOCK Elapsed: 0.22529 MiB: 128.00000 Copy: 568.169 MiB/s
** Two chips tests:
mbw 16 | grep AVG
AVG Method: MEMCPY Elapsed: 0.05742 MiB: 16.00000 Copy: 278.655 MiB/s
AVG Method: DUMB Elapsed: 0.02418 MiB: 16.00000 Copy: 661.739 MiB/s
AVG Method: MCBLOCK Elapsed: 0.02767 MiB: 16.00000 Copy: 578.336 MiB/s
mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.11365 MiB: 32.00000 Copy: 281.565 MiB/s
AVG Method: DUMB Elapsed: 0.04685 MiB: 32.00000 Copy: 682.983 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05429 MiB: 32.00000 Copy: 589.418 MiB/s
mbw 128 | grep AVG
AVG Method: MEMCPY Elapsed: 0.45452 MiB: 128.00000 Copy: 281.618 MiB/s
AVG Method: DUMB Elapsed: 0.18673 MiB: 128.00000 Copy: 685.473 MiB/s
AVG Method: MCBLOCK Elapsed: 0.21711 MiB: 128.00000 Copy: 589.570 MiB/s
Just a note - these are the controller settings, according to the a10-meminfo tool
** Single chip
dram_clk = 480
mbus_clk = 300
dram_type = 3
dram_rank_num = 1
dram_chip_density = 4096
dram_io_width = 16
dram_bus_width = 16
dram_cas = 9
dram_zq = 0x7b (0x5294a00)
dram_odt_en = 0
dram_tpr0 = 0x42d899b7
dram_tpr1 = 0xa090
dram_tpr2 = 0x22a00
dram_tpr3 = 0x0
dram_emr1 = 0x4
dram_emr2 = 0x10
dram_emr3 = 0x0
dqs_gating_delay = 0x00000606
active_windowing = 0
** Two chips
dram_clk = 480
mbus_clk = 300
dram_type = 3
dram_rank_num = 1
dram_chip_density = 4096
dram_io_width = 16
dram_bus_width = 32
dram_cas = 9
dram_zq = 0x7b (0x5294a00)
dram_odt_en = 0
dram_tpr0 = 0x42d899b7
dram_tpr1 = 0xa090
dram_tpr2 = 0x22a00
dram_tpr3 = 0x0
dram_emr1 = 0x4
dram_emr2 = 0x10
dram_emr3 = 0x0
dqs_gating_delay = 0x06060606
active_windowing = 0
So, if anyone bothers reading the numbers they are quite strange. It appears in some tests the board with single DDR3 chip actually performs 10% faster. My expectation would be that using 16bit bus vs 32 bus would half the performance. Instead I get this. One might notice that in the DUMB test of mbw there is significant difference between the results - about 20% improvement using two chips. That test actually loops over memory array and does a = b copying. Dumb indeed :) However that is still only 20% difference, not double...
There are few possible explanations (I am not very familiar with how these things work so I am guessing here):
- Incorrect test setup - this way I am not actually disabling the second chip. I am planning on physically removing it from the board to check how that goes.
- Incorrect test - there aren't many tools to do memory benchmarks on arm so this is what I've got. Could it be skewing the results?
- Bad board design - is it possible that with single chip the controller doesn't perform some synchronizations between the two banks resulting in less delays somewhere?
- Bad understanding of how these things work in the first place. For instance - this is DDR3 memory so it has double data rate. So with 16bit bus width it actually serves 32bits in single read. So basically unless I operate on large amounts of ram, reading continuously and etc that change from 32 to 16b width doesn't really affect anything.
- All of the above and something else :)
If anyone has a clue what is going on - please feel free to drop a suggestion what is going on here.
Cheers
Very interesting! Anybody test it with GPU loaded?