Photo: Michigan election officials assess the results of a manual count of a sample of ballots for a risk-limiting audit in 2018. Photo credit: Berkeley Institute for Data Science, UC-Berkeley
Think of “risk-limiting audits” as low-effort exit polls.
Exit polls determine who won by asking randomly selected voters “Who did you vote for?” Risk-limiting audits work on the same principle to confirm the correct winners, but they skip the sidewalk conversations and phone calls.
Instead, they pose the question directly to randomly selected paper ballots.
Either way, a small sample can provide statistical proof of who really won the election, independently of the vote-counting computers.
No one in Wisconsin now does risk-limiting audits. Sometimes local officials spot-check a few randomly selected voting machines, but those efforts do not ensure that any outcome-altering miscounts will be detected and corrected before preliminary election results are made final. Risk-limiting audits do.
There’s no one correct way to do a risk-limiting audit. Our election officials could sample individual ballots (less work) or entire polling places (more work). They could do nothing more than confirm the correct winner in one race (less work) or they could answer other questions at the same time (more work).
A risk-limiting audit of a statewide election in Wisconsin
could be this easy and cheap:
1) After they close the polls on Election Night, poll workers would record how many ballots they seal into each bag. Using this information, the municipal clerk would create a “ballot manifest” (e.g., City of Abbotsford: Bag #1 – 234 ballots; Bag #2 – 122 ballots).
It’s unlikely anyone has ever counted, but a fair guess is that a big election produces around 4,750-5,000 sealed ballot bags statewide. One bag can contain a maximum of around 300 ballots but might contain fewer than 10.
2) The day after the election, every municipal clerk would send their ballot manifest to the Wisconsin Elections Commission. The WEC could create an online reporting form to make this task easy and quick. It wouldn’t need bullet-proof security if the municipal clerk also mailed a hard copy of the manifest to WEC, and WEC staff later verified them against each other.
3) The WEC would then assign a number to every ballot in the state. For example, ballot numbers 1-234 would be assigned to the first bag from the City of Abbotsford; numbers 235-356 to the second bag, all the way up to the last bag from the Town of Yuba, which might be assigned the numbers 2,673,149 – 2,673,308.
The size of the sample depends upon the Election-Night margin of victory. If the margin is large or normal, the sample size will be small. For example, the 2018 contest for the US Senate was neither close nor a landslide: 55.4% to 44.5%. A risk-limiting audit of that race would have needed an initial sample size of only 401 ballots across the entire state. However, officials could choose to select a larger sample to provide voters with ’emotional’ confidence in addition to statistical confidence.
An extremely close election such as the 2018 Governor’s race (49.5% to 48.4%) would have needed an initial sample of 37,841 ballots (out of almost 2.7 million cast). But it’s these races that officials legitimately need to be most careful about, and it’s the very close races that, when left unaudited, provoke the most candidate resentment and voter suspicion.
Wisconsin election officials have already demonstrated they can handle larger sample sizes. For comparison, the voting-machine spot-checks conducted after the November 2018 election required officials to count votes from 135,712 ballots — more than 3.5 times the number of ballots they would have needed for a risk-limiting audit. But because of the way WEC selected that sample and their instructions that auditors ignore voter intent, that effort did not confirm the correct winner in any race.
Wisconsin election officials counted 135,712 ballots in the random voting-machine spot-checks after the November 2018 election, but used a method that did not confirm the winner in any race. A risk-limiting audit of the same election would likely have verified the correct winners in the statewide races with only 37,841 ballots.
And because races from the same ballot (as those two races were) do not need separate samples, a risk-limiting audit could have verified all the statewide contests on the ballot in that election–an accomplishment of enormous value to election security and voter confidence.
5) WEC would randomly select ballot numbers and then use the statewide ballot manifest to identify the bag in which each of the selected ballots is stored. For example, if ballot #284 turned up in the random sample, the WEC would know it is in the second bag from the City of Abbotsford. If the random selection turned up ballot #2,673,193, they would know it is in the last bag from the Town of Yuba.
6) At this point, WEC could ignore the hypothetical numbers they assigned to each ballot and tell the municipal clerks only the number of ballots to be randomly selected from each bag. For example, the WEC would tell the City of Abbotsford clerk to randomly select one ballot from the second bag. The instructions for random selection could be something like: “In the presence of observers, pull the ballots out of the bag, set them in a stack on the table, let an observer from each political party cut the stack several times like a deck of cards, cut the stack two more times, and select the ballot at the bottom of the last cut.” Other methods could be prescribed for jurisdictions that use machines that print flimsier forms of paper ballots.
7) The municipal clerk would display the selected ballot to the observers; fax it to the WEC; mark it with red ink indicating it was the ballot selected for the audit; put it on the top of the stack of ballots; and reseal the bag.
8) The WEC would conduct a publicly observed manual count of the faxed ballots and enter the results of that count into the standard risk-limiting audit formulas. If the proportion of votes for the Election-Night winner in the manual count is close enough to the proportion reported on Election Night, the result is confirmed. The audit would be concluded and the county canvasses could conduct their certification process as normal.
If the proportion of votes for the Election-Night winner differed too much, an additional sample would be drawn and counted. That process would be repeated until statistical confidence in a winner was established. The WEC would need to adopt policies to govern what will happen in the rare event that the sample has to be expanded more than twice, or if the confidence level declines as the sample is enlarged. Likely, WEC would stop the audit, declare a lack of confidence in the preliminary Election-Night results, and order a full recount on its own initiative.
Other states’ election officials think their voters’ right to self-government through secure elections is worth at least that much time and effort.
If you think Wisconsin elections are worth the effort it takes to conduct a genuine risk-limiting audit, contact your county clerk and the Wisconsin Elections Commission to tell them so.
December 19, 2018 — “As the secret ballot transformed elections in the last century,” said Joseph Hall, Chief Technologist for the Center for Democracy and Technology, “risk-limiting audits are going to transform elections in this century.”
In a few years, Americans will look back, aghast, at our current election management. We will be amazed that we ever trusted vote-tabulating computers so much that we routinely declared winners without checking results for evidence of fraud, glitches, or human error. We will consider routine verification to be an indispensable part of managing elections, just as cash-register reconciliation is now for managing the corner convenience store.
In preparation for that day, it’s time to understand: What is a “risk-limiting audit”?
First, a risk-limiting audit is not a specific set of steps or statistical calculations. Like “home-heating system,” the term describes a function, not a method. If a system heats your home, it’s a home-heating system. If it doesn’t, it’s not. A wood-burning stove is a home-heating system. Electrified tile floors are a home-heating system. Triple-pane windows and attic insulation are not.
A risk-limiting audit is any review that imposes a precise limit, such as 10%, on the risk of certifying incorrect results in the event that Election-Night results identified the wrong winner. Any review that accomplishes that is a risk-limiting audit. If it doesn’t, it’s not. For example, pre-election voting-machine tests and hand-counting to verify a few voting machines’ results are good. But even when completed correctly, they do not precisely limit the risk that officials will detect and correct any outcome-altering miscounts.
(Though it’s not part of the official definition of risk-limiting audit, I’ll mention the other side of the coin. In the event that Election-Night results identified the right winner, a risk-limiting audit does not reduce the risk that officials will certify the wrong one. That risk stays at zero.)
You might be surprised that statistical analysis is not a required feature of risk-limiting audits. A full recount uses no statistical methods and if done correctly, limits the risk of declaring the wrong winner to zero. Therefore, it’s a risk-limiting audit. But full recounts require too much effort to be used routinely. So to reduce the time and effort needed to confirm elections, most types of risk-limiting audits use random sampling for selection and statistical processes for analysis.
Another feature of risk-limiting audits is manual inspection of voter-verified paper ballots. Until some as-yet-uninvented technology comes along, we can verify the computers’ verdicts only by comparing them against something that didn’t come from a computer. That is, we need direct human observation of the paper ballots that were marked, or at least verified, by the voters themselves.
Finally, the word ‘audit’ doesn’t mean what you probably think it means. Outside elections, independent auditors perform audits after auditees have completed the work. In contrast, a risk-limiting audit is completed by the responsible officials as part of their work of certifying election results. A post-certification review might provide useful information for the next election, but it cannot limit the risk that the wrong winner was certified in this one. Risk-limiting audits probably should have been called ‘risk-limiting reconciliation’ or ‘risk-limiting verification,’ but it’s too late to change that now.
In brief, the exercise uses playing cards to represent ballots containing votes for Black or Red. The cards are sorted into piles representing precincts; a sample is randomly drawn from across all participating precincts. Each card either builds or reduces confidence in the Election-Night results, until a pre-determined acceptable level of confidence is achieved—or is not. To do this exercise, you’ll need:
two decks of playing cards;
a pencil; and
a random-number generator.
Note: the statistics in this exercise are realistic but not precise; they are for illustration purposes only. A real election audit would use a sample size and confidence level calculated for the results being audited.
Example #1: When Election-Night results are correct
The first illustration will show how a risk-limiting audit plays out when Election-Night results identified the correct winner.
1. Get two decks of playing cards. They don’t need to be the same design, but the same shape and size will make them easier to work with. From one deck, remove the jokers and set the hearts aside. This leaves 39 cards in this deck—26 black ‘votes’ and 13 red.
2. From the other deck, remove the jokers and set aside both the hearts and diamonds. This leaves 26 cards in this deck, all black.
3. Shuffle all 65 cards together, but not thoroughly. Actual ballots will not be thoroughly shuffled; your cards don’t need to be, either.
4. Separate the cards into six stacks to represent precincts. You don’t need to make them the same size. In the real world, some precincts have more voters than others. Label the stacks “Precinct A,” “Precinct B,” and so on up to “Precinct F.”
5. Count the cards in each precinct consecutively and write the numbers on the labels. For example, if Precinct A has 12 cards, you will write “1-12” on that label. Then start Precinct B’s count with 13, so that Precinct B’s label will be something like “13-23.” Precinct C will be something like “24-35,” and so forth. If you’ve counted correctly, the last card in Precinct F will be 65. Some vocabulary: A list of precincts indicating the number of ballots in each (e.g., Precinct D has 13 ballots) is a “ballot manifest.” When you assign a unique number to each ballot (e.g., Precinct D contains the 36th through the 48th ballot), it becomes a “ballot look-up table.”
Now, imagine you’re the official who is responsible for certifying this election. You know the possibility of an outcome-altering Election-Night miscount is remote. Nevertheless, you want to: 1) deter fraud in future elections, 2) demonstrate to the voters that local elections are secure against even remote risks; and 3) make sure you don’t miss even a remote possibility of declaring the wrong winner.
So you’ve decided to give yourself at least a 90% chance of detecting any outcome-altering miscount before you declare the official winner. In other words, you have decided to impose a 10% limit on the risk that you will fail to notice and correct any such miscount. (You could choose a different level of risk, if you wish.) If the machines identified the correct winner, there is a 0% chance an audit will reverse that.
Initial Election-Night results indicated that Black got 80% of the vote and Red got 20%. While you haven’t looked at any ballots yet, you know how many were cast. Using that information, you consult with a statistician or the risk-limiting audit website to find out how big your manually counted sample needs to be to confirm the right winner or to detect the wrong one.
In a real risk-limiting audit, you would be told a specific number of ballots to draw for the first sample—up to a few hundred, depending on the margin of victory and the number of ballots cast in the race. Your statistician or the RLA website could also generate random numbers for you, to determine which ballots to draw for the sample. For this demonstration, let’s imagine you were told to select 10 ballots.
6. Create a score sheet with columns for the random number, the color of the card, a confidence score for each card, and a running confidence total.
8. Then find each card. For example, if the first randomly selected ballot was #38, you would check the ballot look-up table and see that card #38 is the third card in precinct D. Look at that card, record its color in the second column, and replace it.
If the sampled card was black, it builds confidence that the preliminary results were correct when they identified Black as the winner. Note +5 confidence points for that card in the third column, and add 5 points to the confidence total in the fourth column.
If the sampled card was red, it reduces confidence in the preliminary results. Note –10 confidence points in the third column, and subtract 10 points from the confidence total in the fourth column.
9. When you’ve recorded the color of each card in the sample, look at your total confidence score. If it is 25 or higher, you have statistically confirmed, with 90% confidence or more, that no Election-Day miscount identified the wrong winner. You can end the audit and certify the results.
The photo below shows that this audit could have stopped after the eighth card, because the confidence score reached 25 at that point. On average, an audit like this would need to inspect 14 cards to reach a confidence level of 25—if, that is, the Election Night result was correct.
If your confidence score is less than 25, select another random ballot, and another, until the total confidence level reaches 25. (Or until you realize that you messed up the first two steps of this exercise and are working with a deck in which there are actually more red than black cards.)
#2: When Election-Night results are incorrect
The second part of this exercise shows what happens when Election-Night results were wrong.
1. Retrieve the red cards you set aside, so that you are using all 104 cards from both decks. Shuffle them together. Again, this does not need to be a thorough shuffle. Some ‘precincts’ can be mostly red and some mostly black.
2. Separate the cards into 9 stacks of differing sizes to represent precincts. As in step 5 above, count the cards in each stack to create a ballot look-up table.
3. In this exercise, we know the election was a tie. But in a real election, we would not know that, because the computers told us that Black won, and we have not yet inspected any ballots. So we would give the same information to the statistician or RLA website that we did in the previous example—that is, an 80% victory for Black. Given that situation, the instructions we receive would be the same: Select a random sample of 10 ballots by generating 10 random numbers—this time, between 1 and 104, because we have more ‘ballots.’
4. As in the first exercise, use the ballot look-up table to find each card selected for the sample. Note the color of each and confidence points on the tally sheet. Check to see whether the total confidence score reached 25. In this case, it did not. It is possible that your first sample reached a confidence score of 25 or more. If so, your audit incorrectly confirmed a miscounted election. This can happen—statistical confidence is not the same as rock-solid certainty. A 90% confidence target means that 10% of the time it will be wrong. To calm your nerves, think of this from the point of view of election thieves who see a 90% chance that their handiwork will be noticed and corrected.
5. If the total confidence score is less than 25, you have not yet confirmed the Election-Night results, so you will need to expand the sample by inspecting more random ballots.
When the Election Night results are wrong, the more ballots you sample, the lower your confidence level will sink. As shown in the photo, as more and more ballots are inspected, it becomes more and more apparent that the preliminary results are just not right.
Election officials who conduct risk-limiting audits of real elections adopt written policies that address this possibility. A sensible policy, for example, might dictate that the audit will stop if it fails to confirm the outcome with two additional samples, and the effort will instead be put into a manual count of 100% of the ballots.
About sample size, statistical confidence, and emotional confidence
One concept should now be clear: A random sample of ballots is a miniature version of the entire election. The same winner will emerge from both–if both are accurately counted.
In the first exercise above, Black had more votes in the whole set of ballots from which the sample was drawn. As a result, more of the randomly selected ballots contained votes for Black than for Red. In the second exercise, Black did NOT have more votes than Red, and so we could not confirm a Black victory no matter how many ballots we randomly drew.
In other words, when preliminary election results have identified the correct winner, inspection of a relatively small number of randomly selected ballots provides strong evidence of accuracy. Conversely, if preliminary election results identified the wrong winner, inspecting a random sample of ballots will reveal the problem before officials certify the election.
Once we see that random samples reflect the true results, the next question is what size sample works best. Smaller samples reduce work, but increase uncertainty. Larger samples provide more confidence, but are more work.
This demonstration started with samples of 10 cards, which in the 65-card ‘election’ was a little more than 15% of the ballots. In a real election audit, the initial sample size would not need to be anything close to 15% of total ballots, particularly if the preliminary results indicated an outcome as decisive as this one.
To work through a real-life example, let’s look at the 2018 race for US Congress in Wisconsin’s First Congressional District. This reasonably close and hotly contested race filled the seat being vacated by former Speaker Paul Ryan. The actual results were: 325,003 total ballots; 177,490 votes for Steil; 137,507 for Bryce; and 10,006 for Yorgan.
When you plug these results along with a 10% risk limit into the online tools for ballot-polling RLAs, statistical operations built into that tool predict that you will likely be able to confirm Steil’s victory, if Steil actually won, with an initial sample of only 301 ballots. That’s only one-tenth of one percent of the 325,003 total ballots. If Steil did not truly win, auditing 301 ballots would produce results more like the second example above–forcing officials to keep expanding the sample until it was obvious that the Election-Night results were incorrect. (Instead of using +5, -10, and 25 total points as indicators of confidence, officials would have counted the votes in the sample and could have used the online tools to assess the results. See the section titled “Should more ballots be audited?” at the bottom of this page.)
If election officials did not believe that 301 ballots would provide voters with enough subjective confidence in the result (as opposed to statistical confidence), they could have selected a smaller risk limit. In this congressional election, a 5% risk limit would have needed an initial random sample of 389 ballots; a 1% risk limit, a random sample of 594 ballots. Or, election officials could adopt an audit policy that every initial sample will contain either enough ballots to support a 5% risk limit, or 1,000 ballots, whichever is greater.
Other lessons from this exercise
This exercise highlighted risk-limiting audits’ tightly focused purpose—to detect and correct any outcome-altering miscounts. This purpose is critical for election security and for voter confidence, but does not solve all problems. Election officials must perform other types of reviews to determine the cause of any miscounts and to monitor other important issues, including:
The presence of any miscounts that may have disenfranchised some voters without affecting the outcome;
The accuracy of any single precinct’s or voting machine’s tabulation;
The quality of any of the processes that determine which ballots were cast and accepted, such as issues with voter registration or ID, or whether all and only valid absentee ballots were accepted and counted.
In addition, this exercise simulated only one type of risk-limiting audit, known as a “ballot-polling audit.” Depending upon the size of the election, the type of records created by the voting-equipment, and other factors, election officials might decide to use a different type of RLA, such as a “comparison risk-limiting audit” or a Bayesian audit. Election officials do not need to read and digest the scholarly articles. Federal officials, staff in jurisdictions that have experience with auditing, and other experts are willing to help local election clerks understand the options well enough to make the right choices and get started with election auditing. A local election official can likely find useful help as close as the nearest local government official who has expertise related to auditing or quality assurance.
This exercise also probably raised some implementation questions: How can election officials draw a random sample in a election for which ballots are stored in sealed bags across hundreds of municipalities? How do you know how big your sample needs to be if you want to verify the outcomes in two or more races? Election officials who have started with risk-limiting audits have tackled these questions, and more solutions are being worked out with each new election. The solutions are not always easy or obvious, but local election officials who want to try their hand at risk-limiting audits need only to ask those with experience.
Finally, this exercise demonstrated that even risk-limiting audits might, on occasion, fail to detect miscounted election results. A 90% chance that serious fraud will be detected and corrected is the same thing as a 10% chance it will not be.
That highlights the need to keep two other facts in mind. First, a 90% chance of detecting fraud is better than the 0% chance that non-auditing election officials and their voters now tolerate. Second, the audits’ greatest value is, arguably, deterrence. When would-be election thieves contemplate a 90% risk of getting caught, there’s a good chance that election officials will have no electronic election fraud to detect.electionscounting votesWisconsinrisk-limiting auditsdemonstratio