An Illustrated Introduction to Risk-Limiting Audits

Posted by Karen McKim · December 19, 2018 2:21 PM

December 19, 2018 — “As the secret ballot transformed elections in the last century,” said Joseph Hall, Chief Technologist for the Center for Democracy and Technology, “risk-limiting audits are going to transform elections in this century.”

In a few years, Americans will look back, aghast, at our current election management. We will be amazed that we ever trusted vote-tabulating computers so much that we routinely declared winners without checking results for evidence of fraud, glitches, or human error. We will consider routine verification to be an indispensable part of managing elections, just as cash-register reconciliation is now for managing the corner convenience store.

In preparation for that day, it’s time to understand: What is a “risk-limiting audit”?

First, a risk-limiting audit is not a specific set of steps or statistical calculations. Like “home-heating system,” the term describes a function, not a method. If a system heats your home, it’s a home-heating system. If it doesn’t, it’s not. A wood-burning stove is a home-heating system. Electrified tile floors are a home-heating system. Triple-pane windows and attic insulation are not.

A risk-limiting audit is any review that imposes a precise limit, such as 10%, on the risk of certifying incorrect results in the event that Election-Night results identified the wrong winner. Any review that accomplishes that is a risk-limiting audit. If it doesn’t, it’s not. For example, pre-election voting-machine tests and hand-counting to verify a few voting machines’ results are good. But even when completed correctly, they do not precisely limit the risk that officials will detect and correct any outcome-altering miscounts.

(Though it’s not part of the official definition of risk-limiting audit, I’ll mention the other side of the coin. In the event that Election-Night results identified the right winner, a risk-limiting audit does not reduce the risk that officials will certify the wrong one. That risk stays at zero.)

You might be surprised that statistical analysis is not a required feature of risk-limiting audits. A full recount uses no statistical methods and if done correctly, limits the risk of declaring the wrong winner to zero. Therefore, it’s a risk-limiting audit. But full recounts require too much effort to be used routinely. So to reduce the time and effort needed to confirm elections, most types of risk-limiting audits use random sampling for selection and statistical processes for analysis.

Another feature of risk-limiting audits is manual inspection of voter-verified paper ballots. Until some as-yet-uninvented technology comes along, we can verify the computers’ verdicts only by comparing them against something that didn’t come from a computer. That is, we need direct human observation of the paper ballots that were marked, or at least verified, by the voters themselves.

Finally, the word ‘audit’ doesn’t mean what you probably think it means. Outside elections, independent auditors perform audits after auditees have completed the work. In contrast, a risk-limiting audit is completed by the responsible officials as part of their work of certifying election results. A post-certification review might provide useful information for the next election, but it cannot limit the risk that the wrong winner was certified in this one. Risk-limiting audits probably should have been called ‘risk-limiting reconciliation’ or ‘risk-limiting verification,’ but it’s too late to change that now.

Try it yourself…

In December 2018, national election-security leaders came together at an Election Audit Summit in Boston, organized by the MIT/Caltech Voting Technology Project. Dr. Philip Stark of the University of California-Berkeley, who originated the concept of risk-limiting audits, led participants through a demonstration. The instructions below are adapted from Dr. Stark’s handout, Audit Simulation for Auditing Summit.

In brief, the exercise uses playing cards to represent ballots containing votes for Black or Red. The cards are sorted into piles representing precincts; a sample is randomly drawn from across all participating precincts. Each card either builds or reduces confidence in the Election-Night results, until a pre-determined acceptable level of confidence is achieved—or is not. To do this exercise, you’ll need:

two decks of playing cards;
scratch paper;
a pencil; and
a random-number generator.

Note: the statistics in this exercise are realistic but not precise; they are for illustration purposes only. A real election audit would use a sample size and confidence level calculated for the results being audited.

Example #1: When Election-Night results are correct

The first illustration will show how a risk-limiting audit plays out when Election-Night results identified the correct winner.

1. Get two decks of playing cards. They don’t need to be the same design, but the same shape and size will make them easier to work with. From one deck, remove the jokers and set the hearts aside. This leaves 39 cards in this deck—26 black ‘votes’ and 13 red.

2. From the other deck, remove the jokers and set aside both the hearts and diamonds. This leaves 26 cards in this deck, all black.

3. Shuffle all 65 cards together, but not thoroughly. Actual ballots will not be thoroughly shuffled; your cards don’t need to be, either.

4. Separate the cards into six stacks to represent precincts. You don’t need to make them the same size. In the real world, some precincts have more voters than others. Label the stacks “Precinct A,” “Precinct B,” and so on up to “Precinct F.”

5. Count the cards in each precinct consecutively and write the numbers on the labels. For example, if Precinct A has 12 cards, you will write “1-12” on that label. Then start Precinct B’s count with 13, so that Precinct B’s label will be something like “13-23.” Precinct C will be something like “24-35,” and so forth. If you’ve counted correctly, the last card in Precinct F will be 65.
Some vocabulary: A list of precincts indicating the number of ballots in each (e.g., Precinct D has 13 ballots) is a “ballot manifest.” When you assign a unique number to each ballot (e.g., Precinct D contains the 36th through the 48th ballot), it becomes a “ballot look-up table.”

Now, imagine you’re the official who is responsible for certifying this election. You know the possibility of an outcome-altering Election-Night miscount is remote. Nevertheless, you want to: 1) deter fraud in future elections, 2) demonstrate to the voters that local elections are secure against even remote risks; and 3) make sure you don’t miss even a remote possibility of declaring the wrong winner.

So you’ve decided to give yourself at least a 90% chance of detecting any outcome-altering miscount before you declare the official winner. In other words, you have decided to impose a 10% limit on the risk that you will fail to notice and correct any such miscount. (You could choose a different level of risk, if you wish.) If the machines identified the correct winner, there is a 0% chance an audit will reverse that.

Initial Election-Night results indicated that Black got 80% of the vote and Red got 20%. While you haven’t looked at any ballots yet, you know how many were cast. Using that information, you consult with a statistician or the risk-limiting audit website to find out how big your manually counted sample needs to be to confirm the right winner or to detect the wrong one.

In a real risk-limiting audit, you would be told a specific number of ballots to draw for the first sample—up to a few hundred, depending on the margin of victory and the number of ballots cast in the race. Your statistician or the RLA website could also generate random numbers for you, to determine which ballots to draw for the sample. For this demonstration, let’s imagine you were told to select 10 ballots.

6. Create a score sheet with columns for the random number, the color of the card, a confidence score for each card, and a running confidence total.

7. Generate ten random numbers between 1 and 65. Write them in the first column. These are the ballots you need to inspect.

8. Then find each card. For example, if the first randomly selected ballot was #38, you would check the ballot look-up table and see that card #38 is the third card in precinct D. Look at that card, record its color in the second column, and replace it.

If the sampled card was black, it builds confidence that the preliminary results were correct when they identified Black as the winner. Note +5 confidence points for that card in the third column, and add 5 points to the confidence total in the fourth column.
If the sampled card was red, it reduces confidence in the preliminary results. Note –10 confidence points in the third column, and subtract 10 points from the confidence total in the fourth column.

9. When you’ve recorded the color of each card in the sample, look at your total confidence score. If it is 25 or higher, you have statistically confirmed, with 90% confidence or more, that no Election-Day miscount identified the wrong winner. You can end the audit and certify the results.

The photo below shows that this audit could have stopped after the eighth card, because the confidence score reached 25 at that point. On average, an audit like this would need to inspect 14 cards to reach a confidence level of 25—if, that is, the Election Night result was correct.

If your confidence score is less than 25, select another random ballot, and another, until the total confidence level reaches 25. (Or until you realize that you messed up the first two steps of this exercise and are working with a deck in which there are actually more red than black cards.)

#2: When Election-Night results are incorrect

The second part of this exercise shows what happens when Election-Night results were wrong.

1. Retrieve the red cards you set aside, so that you are using all 104 cards from both decks. Shuffle them together. Again, this does not need to be a thorough shuffle. Some ‘precincts’ can be mostly red and some mostly black.

2. Separate the cards into 9 stacks of differing sizes to represent precincts. As in step 5 above, count the cards in each stack to create a ballot look-up table.

3. In this exercise, we know the election was a tie. But in a real election, we would not know that, because the computers told us that Black won, and we have not yet inspected any ballots. So we would give the same information to the statistician or RLA website that we did in the previous example—that is, an 80% victory for Black. Given that situation, the instructions we receive would be the same: Select a random sample of 10 ballots by generating 10 random numbers—this time, between 1 and 104, because we have more ‘ballots.’

4. As in the first exercise, use the ballot look-up table to find each card selected for the sample. Note the color of each and confidence points on the tally sheet. Check to see whether the total confidence score reached 25. In this case, it did not.
It is possible that your first sample reached a confidence score of 25 or more. If so, your audit incorrectly confirmed a miscounted election. This can happen—statistical confidence is not the same as rock-solid certainty. A 90% confidence target means that 10% of the time it will be wrong. To calm your nerves, think of this from the point of view of election thieves who see a 90% chance that their handiwork will be noticed and corrected.

5. If the total confidence score is less than 25, you have not yet confirmed the Election-Night results, so you will need to expand the sample by inspecting more random ballots.

When the Election Night results are wrong, the more ballots you sample, the lower your confidence level will sink. As shown in the photo, as more and more ballots are inspected, it becomes more and more apparent that the preliminary results are just not right.

Election officials who conduct risk-limiting audits of real elections adopt written policies that address this possibility. A sensible policy, for example, might dictate that the audit will stop if it fails to confirm the outcome with two additional samples, and the effort will instead be put into a manual count of 100% of the ballots.

About sample size, statistical confidence, and emotional confidence

One concept should now be clear: A random sample of ballots is a miniature version of the entire election. The same winner will emerge from both–if both are accurately counted.

In the first exercise above, Black had more votes in the whole set of ballots from which the sample was drawn. As a result, more of the randomly selected ballots contained votes for Black than for Red. In the second exercise, Black did NOT have more votes than Red, and so we could not confirm a Black victory no matter how many ballots we randomly drew.

In other words, when preliminary election results have identified the correct winner, inspection of a relatively small number of randomly selected ballots provides strong evidence of accuracy. Conversely, if preliminary election results identified the wrong winner, inspecting a random sample of ballots will reveal the problem before officials certify the election.

Once we see that random samples reflect the true results, the next question is what size sample works best. Smaller samples reduce work, but increase uncertainty. Larger samples provide more confidence, but are more work.

This demonstration started with samples of 10 cards, which in the 65-card ‘election’ was a little more than 15% of the ballots. In a real election audit, the initial sample size would not need to be anything close to 15% of total ballots, particularly if the preliminary results indicated an outcome as decisive as this one.

To work through a real-life example, let’s look at the 2018 race for US Congress in Wisconsin’s First Congressional District. This reasonably close and hotly contested race filled the seat being vacated by former Speaker Paul Ryan. The actual results were: 325,003 total ballots; 177,490 votes for Steil; 137,507 for Bryce; and 10,006 for Yorgan.

When you plug these results along with a 10% risk limit into the online tools for ballot-polling RLAs, statistical operations built into that tool predict that you will likely be able to confirm Steil’s victory, if Steil actually won, with an initial sample of only 301 ballots. That’s only one-tenth of one percent of the 325,003 total ballots. If Steil did not truly win, auditing 301 ballots would produce results more like the second example above–forcing officials to keep expanding the sample until it was obvious that the Election-Night results were incorrect. (Instead of using +5, -10, and 25 total points as indicators of confidence, officials would have counted the votes in the sample and could have used the online tools to assess the results. See the section titled “Should more ballots be audited?” at the bottom of this page.)

If election officials did not believe that 301 ballots would provide voters with enough subjective confidence in the result (as opposed to statistical confidence), they could have selected a smaller risk limit. In this congressional election, a 5% risk limit would have needed an initial random sample of 389 ballots; a 1% risk limit, a random sample of 594 ballots. Or, election officials could adopt an audit policy that every initial sample will contain either enough ballots to support a 5% risk limit, or 1,000 ballots, whichever is greater.

Other lessons from this exercise

This exercise highlighted risk-limiting audits’ tightly focused purpose—to detect and correct any outcome-altering miscounts. This purpose is critical for election security and for voter confidence, but does not solve all problems. Election officials must perform other types of reviews to determine the cause of any miscounts and to monitor other important issues, including:

The presence of any miscounts that may have disenfranchised some voters without affecting the outcome;
The accuracy of any single precinct’s or voting machine’s tabulation;
The quality of any of the processes that determine which ballots were cast and accepted, such as issues with voter registration or ID, or whether all and only valid absentee ballots were accepted and counted.

In addition, this exercise simulated only one type of risk-limiting audit, known as a “ballot-polling audit.” Depending upon the size of the election, the type of records created by the voting-equipment, and other factors, election officials might decide to use a different type of RLA, such as a “comparison risk-limiting audit” or a Bayesian audit. Election officials do not need to read and digest the scholarly articles. Federal officials, staff in jurisdictions that have experience with auditing, and other experts are willing to help local election clerks understand the options well enough to make the right choices and get started with election auditing. A local election official can likely find useful help as close as the nearest local government official who has expertise related to auditing or quality assurance.

This exercise also probably raised some implementation questions: How can election officials draw a random sample in a election for which ballots are stored in sealed bags across hundreds of municipalities? How do you know how big your sample needs to be if you want to verify the outcomes in two or more races? Election officials who have started with risk-limiting audits have tackled these questions, and more solutions are being worked out with each new election. The solutions are not always easy or obvious, but local election officials who want to try their hand at risk-limiting audits need only to ask those with experience.

Finally, this exercise demonstrated that even risk-limiting audits might, on occasion, fail to detect miscounted election results. A 90% chance that serious fraud will be detected and corrected is the same thing as a 10% chance it will not be.

That highlights the need to keep two other facts in mind. First, a 90% chance of detecting fraud is better than the 0% chance that non-auditing election officials and their voters now tolerate. Second, the audits’ greatest value is, arguably, deterrence. When would-be election thieves contemplate a 90% risk of getting caught, there’s a good chance that election officials will have no electronic election fraud to detect.elections counting votes Wisconsin risk-limiting audits demonstratio