algorithm - what to choose when dealing with large sequence of input -
i stumbled on question
how find intersection of 2 sequence when each sequence can have duplicate numbers , size big (close 1 million) , data type dealt long.
i thought sorting , finding intersection not viable solution thought hash table not work space consideration has optimum
can suggest better way handle it?
thanks reading post
the question claims “sorting , finding intersection ... not viable solution”. however, standpoint of ease , clarity of coding, sorting among best solutions. one-off problem, spending 10 minutes writing sorting solution more reasonable spending 15 minutes writing hashing solution, or half hour writing special tree program.
sorting million doubles python code shown below takes 1.3 seconds on old pc (amd athlon 5000, 2ghz) , can done 4 5 times faster on current processors. sorting 2 arrays in time o(n lg n) , looking matches in time o(n), required question, might take second or 2 on modern pc.
in [237]: import random in [238]: v = [random.random() in range(1000000)] in [239]: %time u = sorted(v) cpu times: user 1.32 s, sys: 0.00 s, total: 1.32 s wall time: 1.33 s note, question #8630965 refers sorting million floating point values in 1.168 seconds.
Comments
Post a Comment