Optimizing php/mysql translation lookup with huge database and hash indexes -
i'm using utf8 mysql database. checks if translation in database , if not, translation , stores in database.
select * `translations` `input_text`=? , `input_lang`=? , `output_lang`=?;
(the other field "output_text".) basic database, first compare, letter letter, input text "input_text" "text" field. long characters matching keep comparing them. if stop matching, go onto next row.
i don't know how databases work @ low level assume basic database, search @ least 1 character every row in database before decides input text isn't in database.
ideally input text converted hash code (e.g. using sha1) , each "input_text" hash. if database sorted rapidly find of rows match hash , check actual text. if there no matching hashes return no results though each row wasn't manually checked.
is there type of mysql storage engine can or there additional php can optimize things? should "input_text" set kind of "index"? (primary/unique/index/fulltext)
is there alternative type of database compatible php far superior mysql?
edit: talks b-tree vs hash indexes mysql:
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
none of limitations hash indexes problem me. says
they used equality comparisons use = or <=> operators (but very fast)
["very" italicized them]
new question:
how set "input_text" text hash index? btw multiple rows contain same "input_text"... alright hash index?
http://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
says "the memory storage engine uses hash indexes default" - mean i've got change storage engine , set column index index?
a normal index
clause should enough (be sure index fields, it'll big on disk, faster). fulltext
indexes when you're using like
clauses ;-)
anyway, kind of lookups, should use nosql store redis
, it's blazingly fast , has in-memory store , data persistence through snapshots.
there extension php here : https://github.com/nicolasff/phpredis
and you'll have redis keys in following form: your_project:input_lang:word:output_lang
better data management, replace each value values , you're go ;)
Comments
Post a Comment