PHP/MySQL - Hashing To BIGINT To Ensure No Duplicates -
i'm scraping virtual currency transaction data off webpage. transactions consist of time/date, description, price, , new balance.
results paginated. can fetch 20 @ time. goal have accurate record of entries in separate database. there large number of transactions occurring, , transactions can occur @ time, including between fetching different pages.
time/date measured minute, multiple transactions can occur in same minute. descriptions can same (for example same item can sold in same quantity same person multiple times). both price , balance overlap.
i storing timestamp, price, balance, , data parsed description in multiple fields. need able tell if entry in database quickly. maximum effect ensure each entry has unique time/data, description, price, , balance. issue composite keys don't want store full description in database. (this double database size.)
my solution came create bigint hash based on fields, used unique field in database. found probability of collision (based on birthday attack formula) less 1% 61 million entries, satisfactory probability, since number of entries i'm planning track in neighbourhood of 40k-2m.
my question is, based on application , goals, hashing algorithm recommend , how can values in bigint size without losing of properties of algorithm? important thing avoid collisions, each 1 affect integrity of data. unless have better idea, plan concatenate data string (with separators between fields) feed function. short code snippets appreciated!
because don't care security, used sha1. generates 20-byte hexadecimal string. bigint 8 bytes in size. therefore, need truncate 16 characters (since each character half byte in hex) using substr, , use base_convert convert base 10 database storage.
function hashtobigint ($string) { return base_convert(substr(sha1($string), 0, 16), 16, 10); }
thanks help!
Comments
Post a Comment