algorithm - what to choose when dealing with large sequence of input -


i stumbled on question

how find intersection of 2 sequence when each sequence can have duplicate numbers , size big (close 1 million) , data type dealt long.

i thought sorting , finding intersection not viable solution thought hash table not work space consideration has optimum

can suggest better way handle it?

thanks reading post

the question claims “sorting , finding intersection ... not viable solution”. however, standpoint of ease , clarity of coding, sorting among best solutions. one-off problem, spending 10 minutes writing sorting solution more reasonable spending 15 minutes writing hashing solution, or half hour writing special tree program.

sorting million doubles python code shown below takes 1.3 seconds on old pc (amd athlon 5000, 2ghz) , can done 4 5 times faster on current processors. sorting 2 arrays in time o(n lg n) , looking matches in time o(n), required question, might take second or 2 on modern pc.

in [237]: import random  in [238]: v = [random.random() in range(1000000)]  in [239]: %time u = sorted(v) cpu times: user 1.32 s, sys: 0.00 s, total: 1.32 s wall time: 1.33 s 

note, question #8630965 refers sorting million floating point values in 1.168 seconds.


Comments

Popular posts from this blog

ios - iPhone/iPad different view orientations in different views , and apple approval process -

php - HTTP_REFERER woes: How can I allow access to a specific page, only when a visitor has visited another specific page beforehand? -

java Extracting Zip file -