algorithm - Matching CLOSEST file in given ASCII Text Files -
problem:
i have around 20 ascii text files, each having size less 10^9 bytes.another ascii text file (say foo) given. program strategically match contents of foo given 20 files , print name of closest matching file. contents of foo might match partially.
since file size large ,i'm wondering:
1.how use information retrieval(since don't know ir)
2.which data structure should use store such information
3.what best algorithm implement it.
i know i'm asking much, i'm stuck @ problem , not able find out how approach.any appreciated.thanks!
so assume file contain text. can each 1 of file big string. make 20 vectors or arrays. go through file , put each word element in vector. create vectors size of 20 store matching of each of file create word vector given file well. create loop run through these vectors if @ given index found match of these 20 vectors , given vectors. increase value corresponding file in match storing vectors. @ end, highest value in match storing vector indicate file best match.
Comments
Post a Comment