Quantcast
Channel: Why are elementwise additions much faster in separate loops than in a combined loop? - Stack Overflow
Viewing all articles
Browse latest Browse all 12

Answer by gnasher729 for Why are elementwise additions much faster in separate loops than in a combined loop?

$
0
0

To make this code run fast, the CPU will need to do cache prefetching. Basically the CPU learns that you are accessing sequential data, and reads data from RAM before it is actually needed.

The double loop has two input and two output streams, so it needs four separate pre-fetching operations to be fast. The second loops only need two separate pre-fetching operations. If you run this code on a CPU that can prefetch two but not four cache lines automatically, then the first version will be slower.

On an improved CPU the problem would go away. In that case you could change the code to add three arrays to a fourth one, and probably the better CPU can prefetch 4 but not eight streams and will show the exact same effect.


Viewing all articles
Browse latest Browse all 12

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>