Fuzzy Data Matching, Record Linkage & Data Deducplication(0)
(0 reviews) 5 Maximum Orders Posted one year ago
Programming & Tech / Data Analysis & Reports
I can help you to consolidate your business master data, which is often messy, containing duplicate entries of same entities, missing information, having misspellings and various other errors. In many cases, there is no unique identifier which can be used to unambiguously identify an entity. When such data sources need to be linked or de-duplicated solely based on string similarity, it is called fuzzy data matching.
I will find duplicate items or link together records from various product lists, material lists, customer lists or any other database lacking unique identifiers, based on approximate string similarity, i.e. fuzzy data match.
Provide input datasets to be linked or de-duplicated, as Excel or CSV (comma separated) files, in UTF8 format. Please, briefly explain columns meaning.
Throughly explain data matching task. Do you wish record linkage or data de-duplication? Do you want returned all possible matching combinations or all records from one dataset and only matching records from another dataset? In latter case, do you wish all matches or only best matches? Which is the master dataset and which is the slave dataset?
Would you prefer/tolerate more false positive or more false negative records in result?
Would you prefer strict, normal or loose fuzzy data matching? "Strict" means that only sure matchings are retrieved, "normal" means balanced ration between false positives and false negatives, while "loose" means that all possible matchings will be retrieved, including more false positives.