Answer by OldCurmudgeon for Why are elementwise additions much faster in separate loops than in a combined loop?

Imagine you are working on a machine where n was just the right value for it only to be possible to hold two of your arrays in memory at one time, but the total memory available, via disk caching, was still sufficient to hold all four.

Assuming a simple LIFO caching policy, this code:

for(int j=0;j<n;j++){
    a[j] += b[j];
}
for(int j=0;j<n;j++){
    c[j] += d[j];
}

would first cause a and b to be loaded into RAM and then be worked on entirely in RAM. When the second loop starts, c and d would then be loaded from disk into RAM and operated on.

the other loop

for(int j=0;j<n;j++){
    a[j] += b[j];
    c[j] += d[j];
}

will page out two arrays and page in the other two every time around the loop. This would obviously be much slower.

You are probably not seeing disk caching in your tests but you are probably seeing the side effects of some other form of caching.

There seems to be a little confusion/misunderstanding here so I will try to elaborate a little using an example.

Say n = 2 and we are working with bytes. In my scenario we thus have just 4 bytes of RAM and the rest of our memory is significantly slower (say 100 times longer access).

Assuming a fairly dumb caching policy of if the byte is not in the cache, put it there and get the following byte too while we are at it you will get a scenario something like this:

With

for(int j=0;j<n;j++){
 a[j] += b[j];
}
for(int j=0;j<n;j++){
 c[j] += d[j];
}

cache a[0] and a[1] then b[0] and b[1] and set a[0] = a[0] + b[0] in cache - there are now four bytes in cache, a[0], a[1] and b[0], b[1]. Cost = 100 + 100.
set a[1] = a[1] + b[1] in cache. Cost = 1 + 1.
Repeat for c and d.
Total cost = (100 + 100 + 1 + 1) * 2 = 404

With

for(int j=0;j<n;j++){
 a[j] += b[j];
 c[j] += d[j];
}

cache a[0] and a[1] then b[0] and b[1] and set a[0] = a[0] + b[0] in cache - there are now four bytes in cache, a[0], a[1] and b[0], b[1]. Cost = 100 + 100.
eject a[0], a[1], b[0], b[1] from cache and cache c[0] and c[1] then d[0] and d[1] and set c[0] = c[0] + d[0] in cache. Cost = 100 + 100.
I suspect you are beginning to see where I am going.
Total cost = (100 + 100 + 100 + 100) * 2 = 800

This is a classic cache thrash scenario.

Answer by OldCurmudgeon for Why are elementwise additions much faster in separate loops than in a combined loop?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112