Hash Large Files with .Net SHA1/MD5

I've been working on an application that checks for duplicate files.  One of the better ways to test whether files are identical is to hash them.  MD5 hashing is common, but it has been known to cause collisions.  I elected to use the SHA1 algorithm instead.

In my testing, I've found that I can has a ~1GB file within about 7 seconds without consuming an equal share of memory.

You will need the following namespaces:

  • System.Security.Cryptography
  • System.IO

SHA1 Hash Example Code:

public string SHA1HashFile(string sPath)
{
    StreamReader sr = new StreamReader(sPath);
    SHA1CryptoServiceProvider sha1h = new SHA1CryptoServiceProvider();
          
    string sHash = "";

    sHash = BitConverter.ToString(sha1h.ComputeHash(sr.BaseStream));

    return sHash;
}

Usage: SHA1HashFile("C:\\Path\\File.iso");

MD5 Hash Example Code:

public string MD5HashFile(string sPath)
{
    StreamReader sr = new StreamReader(sPath);
    MD5CryptoServiceProvider md5h = new MD5CryptoServiceProvider();
   
    string sHash = "";

    sHash = BitConverter.ToString(md5h.ComputeHash(sr.BaseStream));

    return sHash;
}

Usage: MD5HashFile("C:\\Path\\File.iso");

Your rating: None Average: 4 (2 votes)

Hashing Large Files

Justin,

This is very interesting, I would like to hear more about this and maybe ask you some questions. How would you hash the files on a computer and do you have to hash all the files or just the directory with the files you are interested in? Could it be done using a bash file or by running a program by command line?

Jim

Jim, Either method above will

Jim,
Either method above will work for hashing the actual files. Using a stream makes the process faster since it only reads in 4k chunks.

Larger files will obviously still take a considerable amount of time. A project I'm working on actually uses a the above hashing method with the file dates instead of file content.

You could certainly run this via command line. I've done that without issues. I'm not sure how you would use this in bash since that's not Windows based. I've not done much work with Mono (if that's what you're referring to).

Justin