algorithm - Set sizes for large number of sets -

i have use case need store large number of entries unique set sizes each one. if simplify contacts (which problem not). have problem :

given user know how many friends have:

joe - mary, john, bob, tom

mary - carol, suzy, mike, fred, robert

thus friends(joe) = 4 - operation supported addfriend(joe, sam). while mary might friends joe there no need store of related information.

what rather not store of entries in each set, yet bloom filter doesn't quite feel right. there other alternatives?

update: challenge i've got 20m joe/mary/... in top level sets 4m semi-distinct members in each set. quick code example below (python simplicity) - @ scale + persistent storage universe comes end.

class world:     def __init__(self):         self.friends = defaultdict(set)      def addfriend(self, id, member):         self.friends[id].add(member)      def friends(self, id):         return len(friends[id])

since you're considering bloom filter, sounds though approximate answers ok. use small-space cardinality estimator hyperloglog in place of self.friends.

Search This Blog

Bready

algorithm - Set sizes for large number of sets -

Comments

Post a Comment

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

c# - Using multiple datasets in RDLC -