c++ - High interprocess timing variations, but low intraprocess timing variations for the same task -


i running code (full code here: http://codepad.org/5ojblqia) time repeated daxpy function calls , without flushing operands cache beforehand:

#define kb 1024  int main() {     int cache_size = 32*kb;     double alpha = 42.5;      int operand_size = cache_size/(sizeof(double)*2);     double* x = new double[operand_size];     double* y = new double[operand_size];       //95% confidence interval     double max_risk = 0.05;     //interval half width     double w;     int n_iterations = 50000;     students_t dist(n_iterations-1);     double t = boost::math::quantile(complement(dist,max_risk/2));     accumulator_set<double, stats<tag::mean,tag::variance> > unflushed_acc;      for(int = 0; < n_iterations; ++i)     {         fill(x,operand_size);         fill(y,operand_size);         double seconds = wall_time();         daxpy(alpha,x,y,operand_size);         seconds = wall_time() - seconds;         unflushed_acc(seconds);     }      w = t*sqrt(variance(unflushed_acc))/sqrt(count(unflushed_acc));     printf("without flush: time=%g +/- %g ns\n",mean(unflushed_acc)*1e9,w*1e9);      //using clflush instruction     //we need put operands in cache     accumulator_set<double, stats<tag::mean,tag::variance> > clflush_acc;     for(int = 0; < n_iterations; ++i)     {         fill(x,operand_size);         fill(y,operand_size);          flush_array(x,operand_size);         flush_array(y,operand_size);         double seconds = wall_time();         daxpy(alpha,x,y,operand_size);         seconds = wall_time() - seconds;         clflush_acc(seconds);     }      w = t*sqrt(variance(clflush_acc))/sqrt(count(clflush_acc));     printf("with clflush: time=%g +/- %g ns\n",mean(clflush_acc)*1e9,w*1e9);      return 0; } 

this code measures rate , uncertainty averaged on given number of iterations. averaging on lots of iterations minimizes variance caused contention memory access various cores (discussed in previous question here), average value obtained varies huge amount between separate invocations of same executable:

$ ./variance without flush: time=3107.76 +/- 0.268198 ns clflush: time=5862.33 +/- 9.84313 ns $ ./variance without flush: time=3105.71 +/- 0.237823 ns clflush: time=7802.66 +/- 12.3163 ns 

these run after 1 another. why timings flushed case (but not unflushed case) vary between processes, little within given process?

appendix

code run on mac os x 10.8 on intel xeon 5650.


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -