March 22, 2022

Hash Functions: Their Utility for Both Clients and Lawyers

Holland & Knight IP/Decode Blog
Jacob W. S. Schneider
IP/Decode Blog

"You're storing the passwords in plaintext?" My college professor looked at me puzzled. I had to immediately fix this. It was a huge security problem in my senior project, a web-based e-commerce platform. If any hacker tapped into my database, they would have every username and password in their hands. Fast forward a few years, and I am at a software engineering job interview. I am asked to write pseudocode on a whiteboard to make a searching algorithm faster. Both problems had the same solution: use a hash function.

What Are Hash Functions?

Hash functions transform input text into unique strings of output text. For example, the popular md5 hash function would take the previous sentence as input and generate the following as output: 0e3f5c52b0251009db79d7332c50ed9d.

Hash functions have a whole host of useful attributes. First – and this is critical – hash functions are one-way streets. Mathematically, no one can take the string "0e3f5c52b0251009db79d7332c50ed9d" and derive my original sentence. This explains why they are well-suited for storing passwords. The best systems never store your passwords in plaintext, but instead as hash values. If a hacker steals your hash value, they cannot derive your original password and cannot access your account. This also explains why most websites cannot email you your password when you lose it, but instead ask you to create a new password: even they do not know your password.

Second, hash functions often shorten the length of the input strings. For example, "0e3f5c52b0251009db79d7332c50ed9d" is shorter than "Hash functions transform input text into unique strings of output text." This explains how you can usually speed up a search algorithm with hashes. Instead of searching a long string of text, you can instead search shorter hash values that are associated with the original text. Consider this table of long sentences (taken from the KSR Intern. Co. v. Teleflex Inc., 550 US 398 (2007), opinion) and their associated hash values:

 

No.

Original Text

MD5 Hash Value

0

After petitioner KSR developed an adjustable pedal system for cars with cable-actuated throttles and obtained its '986 patent for the design, General Motors Corporation (GMC) chose KSR to supply adjustable pedal systems for trucks using computer-controlled throttles.

538079a395e9ed867ce19d18bcc37317

1

The Federal Circuit addressed the obviousness question in a narrow, rigid manner that is inconsistent with § 103 and this Court's precedents.

c365e2d0d4cdf00a0972bd12a8b0295c

2

In Graham v. John Deere Co. of Kansas City, 383 U.S. 1, 86 S.Ct. 684, 15 L.Ed.2d 545 (1966), the Court set out a framework for applying the statutory language of § 103, language itself based on the logic of the earlier decision in Hotchkiss v. Greenwood, 11 How. 248, 13 L.Ed. 683 (1851), and its progeny. See 383 U.S., at 15-17, 86 S.Ct. 684.

ca41bdd040a90fe4efb80cff1525724f

3

In the years since the Court of Customs and Patent Appeals set forth the essence of the TSM test, the Court of Appeals no doubt has applied the test in accord with these principles in many cases.

421341c96275543986bd746b1a47fe67

In a search algorithm, we could simply search for the original sentence, but searching for longer strings of text will always take longer than searching for shorter strings of text. The smarter algorithm is to instead:

  1. convert the search string to a hash value
  2. search for a matching hash value in the table
  3. if a match is found, reference the corresponding sentence

Third, hash functions are designed to evenly distribute their output strings. This means that the hash function is unlikely to create the same output string, even if the inputs are very similar. For example, this is the hash value when we just drop the period from the end of the sentence that starts this paragraph: b6269f171d9284283072053e2bd1181c. It is a very different value than "0e3f5c52b0251009db79d7332c50ed9d," despite only a slight change in the input.

Hash Functions in Legal Practice

Stumbling across hash functions as I did when building software makes sense, but hash functions surprisingly appear in legal practice as well. Below are a few examples of how lawyers interact with these functions.

First, hash functions are used to maintain the integrity of data. Instead of hashing a single sentence, we can hash an entire file to create a unique fingerprint. If any of that file's name, contents or metadata changes even slightly, then the hash value will change, and we will know that the file has been altered. This is why, when collecting files from clients, document vendors will (or should) first take a snapshot of all collected files' hash values. The resulting catalog is sometimes called a "manifest" of files, and it can be used to ensure that the file later produced matches the file earlier extracted.

Hash values are also used to ensure the integrity of data transmitted over the internet. For example, open source software Cygwin provides hash values for its software. When a user downloads the Cygwin software, they can hash the software, compare the output hash value to what the vendor provided and ensure that what was downloaded was what they provided. If the hash values do not match, then something changed along the way, and the software may be something dangerous. Lawyers can provide hash values along with files they send to others to ensure that what they sent is what they received. For example, hash values could be used when sending a document production to opposing counsel: "Here is the encrypted ZIP file, along with its hash." If opposing counsel hashes the ZIP file and receives a different value, then something has become corrupted along the way.

Second, we should take the advice from my college professor and share it with any client who has password-protected systems: hash all stored passwords. This software design mistake continues to repeat itself, and any client who suffers a data breach that compromises plaintext passwords could lose the trust of its users and be subject to enhanced liability.

Related Insights