What is an efficient algorithm to compare two large lists of data in C++? -
i have 2 lists of times in nanoseconds. each list can have 10^12 elements or more. current implementation take subset of both lists, compare times in subset using loops , output correlated times, take subset. each subset comparison runs in approx. (m*n) m size of list 1 subset , n size of list 2 subset, bad algorithm.
i have clock smaller total time of data sets, there rollovers in data concerned @ times.
list 1 has events, , list 2 has secondary events. want know if secondary events happen within time primary events. there lot of noise, need create histogram of correlated times , time there statistically significant signal.
i know if there known efficient algorithm can used in c++ open source library, or efficient algorithm can implement, search times of both lists, , output items fall within window.
here example of brute force function:
int correlate_lists( int window ) { for( int = 0 ; < list1.size() ; i++ ) { for( int j = 0 ; j < list2.size() ; j++ ) { if( list2[j].time() > list1[i].time() && (list2[j].time() - list1[j].time()) < window ) { printf("time: %d\n, list2[j].time() - list[1].time() ); } } } }
if 2 lists sorted time, can walk through lists efficiently:
for( int = 0, j = 0 ; < list1.size() ; ++i ) { while( j < list2.size() && list2[j].time() <= list1[i].time() ) { ++j; } int k = j; while( k < list2.size() && list2[k].time() < list1[i].time() + window) { printf("time: %d\n, list2[k].time() - list1[i].time() ); ++k; } }
Comments
Post a Comment