Stream Benchmark for Rambus memory on P4
Streams benchmark, gcc (Don Holmgren, Fermilab, u.a.)
Function Rate (MB/s) RMS time Min time Max time
Copy: 1324.0370 0.0492 0.0483 0.0556
Scale: 1336.4782 0.0487 0.0479 0.0552
Add: 1556.6983 0.0621 0.0617 0.0623
Triad: 1541.3021 0.0627 0.0623 0.0628
Portland Compiler Group build (-Mvect=sse)
Function Rate (MB/s) RMS time Min time Max time
Copy: 2072.0057 0.0309 0.0309 0.0311
Scale: 1395.3079 0.0463 0.0459 0.0464
Add: 1907.2235 0.0505 0.0503 0.0509
Triad: 1889.2441 0.0509 0.0508 0.0513
(Most of this boost comes from pgcc's use of the SSE prefetch instructions; some benefit comes
from moving data via the 128-bit wide SSE registers.)