FUZZY HASH ALGORITHMS TO CALCULATE FILE SIMILARITY

Methods, apparatus, systems and articles of manufacture to classify a first file are disclosed herein. Example apparatus include a feature hash generator to generate respective sets of one or more feature hashes for respective features of the first file. The number of the one or more feature hashes to be generated is based on an ability of the feature to distinguish the first file from a second file. The apparatus also includes a bit setter to set respective bits of a first fuzzy hash value based on respective ones of the one or more feature hashes, a classifier to assign the first file to a class associated with a second file based on a similarity between the first fuzzy hash value and a second fuzzy hash value for a second file.