DUI & Criminal Defense

2501 North 7th Street
Phoenix, AZ 85006

Field Sobriety Tests in Arizona DUI Cases

Field tests performed in DUI cases do not measure impairment.

Field tests performed in DUI cases do not measure impairment.

Field tests performed in DUI cases do not measure impairment.

Three of the common field sobriety tests used in Arizona DUI cases have gone through “validation studies.”  These three standardized tests are:

  1. Horizontal Gaze Nystagmus
  2. Walk and Turn
  3. One Leg Stand
There are also several non-standardized tests in DUI investigations.
  1. Romberg Modified Test
  2. Finger-to-Nose Test
  3. Vertical Gaze Nystagmus

Is there an alleged accuracy rate associated with HGN?

88% Accuracy – San Diego Study, 1998 (Stuster & Burns)

77% Accuracy – NHTSA .100 Study, 1981 (Moskowitz, Burns & Tharp)

77% Accuracy – Nystagmus Testing in Intoxicated Individuals (Citek) Journal of Optometry 2003.


Size: 290 people tested for HGN

Mean Alcohol Concentration: 0.122

Subjects Below .08 Tested: 81 (27%)

51 / 81 – had less than 4 cues

30 / 81 – had 4 or more cues (37%).

24 of the false positives had an alcohol concentration as low as 0.028 with 6 cues.

However, analyzing accuracy is not done using a single percentage.  There are many factors that must be evaluated.  For example, compare the claim that this study shows a 91% overall accuracy rate to what the data shows for different alcohol concentration ranges.

0.07 – 0.090 = 72% total accuracy (for all 3 tests combined) 0.06 – 0.100 = 75% accuracy rate 0.05 – 0.110 = 79% accuracy rate 0.04 – 0.120 = 82% accuracy rate

However, these numbers significant lower, still look pretty good.   That is until you do a true statistical analysis. 

Specificity looks at a test’s amount of “True Negatives.”  Let’s look at this for the San Diego Study:

0.07 – 0.09 Specificity of .36 (or 64% below would be classified as over 0.08)

0.06 – 0.100 Specificity of .44 (or 56% below would be classified as over 0.08)

0.05 – 0.110 Specificity of .55 (or 45% below would be classified as over 0.08)

0.04 – 0.120 Specificity of .63 (or 37% below 0.08 would be classified as over .08)

This data shows the field tests are better indicators of an alcohol concentration of above 0.05 – than 0.08. In this study, if a driver’s alcohol concentration was below a 0.08 there was a 27% chance of false arrest. (24 / 24+59) and a 37% chance of false arrest using only HGN.

Due to the significant number of people with high alcohol concentrations (.122), the lower the detection limit, the more accurate the tests become.  That data shows the tests being 93% accurate at 0.05 (Same as Colorado) and 99% accurate at 0.01.

More Problems with the San Diego Study:

  • Subject drivers were not pulled over randomly.  They were chosen because officers had already suspected them to be impaired.
  • All of the officers had portable breath test devices and were not supervised.
  • The scoring system was not standardized (some used 12 cues for One Leg Stand and some used 11 cues for the Walk and Turn).
  • The mean alcohol concentration was too high to make a reliable correlation .08.
  • There were not enough subjects in the study with an alcohol concentration under .08.
  • The officers knew the results (were not blinded).
  • No blind controls or true negative controls (.00) used.
  • According to the authors, only the most willing officers were chosen to participate.
  • Police officers were asked to provide data on how accurate they were without any supervision.
  • The officers had the incentive to make the tests appear accurate (they knew the results of the study, and their accuracy, were going to be used and examined by their peers).
  • Several officers failure to follow instruction was a significant problem cited by the author of the study.
The studied was weighted to get the results that law enforcement desired. 

Compare the San Diego Study to the Study performed by Cole and Nowaczyk (1994) called Field Sobriety Tests: Are They Designed for Failure? Cole, Nowaczyk (1994) Perception Motor Skills A review of Moskowitz, Tharp and Burns 1981 Laboratory Study. In that study, 32% of officers watching a video of subjects perform SFST’s were judged to be above 0.100 and 46% said individuals had to much to drink. However, all 21 participants had nothing to drink (0.000).

The Study concedes – HGN is not linked to driving: Many individuals, including some judges, believe that the purpose of a field sobriety test is to measure driving impairment. For this reason, they tend to expect tests to possess “face validity,” that is, tests that appear to be related to actual driving tasks. Tests of physical and cognitive abilities, such as balance, reaction time, and information processing, have face validity, to varying degrees, based on the involvement of these abilities in driving tasks; that is, the tests seem to be relevant “on the face of it.”  Horizontal gaze nystagmus lacks face validity because it does not appear to be linked to the requirements of driving a motor vehicle. The reasoning is correct, but it is based on the incorrect assumption that field sobriety tests are designed to measure driving impairment. Driving a motor vehicle is a very complex activity that involves a wide variety of tasks and operator capabilities. It is unlikely that complex human performance, such as that required to safely drive an automobile, can be measured at roadside. The constraints imposed by roadside testing conditions were recognized by the developers of NHTSA’s SFST battery. As a consequence, they pursued the development of tests that would provide statistically valid and reliable indications of a driver’s BAC, rather than indications of driving impairment. The link between BAC and driving impairment is a separate issue, involving entirely different research methods. ~Dr. Marceline Burns, 1998 San Diego study. NHTSA MANUALS 2015 DWI Instructor’s Manual

More Coming Soon on…

How to analyze an officer’s person HGN Log?

The Minimum Requirements of a Validation Study.

President’s Council of Advisors on Science and Technology – Report on Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, On September 20, 2016, PCAST released a Report to the President on Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods.
Scientific validation studies—intended to assess the validity and reliability of a metrological method for a particular forensic feature-comparison application—must satisfy a number of criteria.  (1) The studies must involve a sufficiently large number of examiners and must be based on sufficiently large collections of known and representative samples from relevant populations to reflect the range of features or combinations of features that will occur in the application. In particular, the sample collections should be: 

(a) representative of the quality of evidentiary samples seen in real cases. (For example, if a method is to be used on distorted, partial, latent fingerprints, one must determine therandom match probability—that is, the probability that the match occurred by chance—for distorted, partial, latent fingerprints; the random match probability for full scanned fingerprints, or even very high quality latent prints would not be relevant.) 

(b) chosen from populations relevant to real cases. For example, for features in biological samples, the false positive rate should be determined for the overall US population and for major ethnic groups, as is done with DNA analysis. 

(c) large enough to provide appropriate estimates of the error rates. 

(2) The empirical studies should be conducted so that neither the examiner nor those with whom the examiner interacts have any information about the correct answer.  (3) The study design and analysis framework should be specified in advance. In validation studies, it is inappropriate to modify the protocol afterwards based on the results. (4) The empirical studies should be conducted or overseen by individuals or organizations that have no stake in the outcome of the studies. (5) Data, software and results from validation studies should be available to allow other scientists to review the conclusions.  (6) To ensure that conclusions are reproducible and robust, there should be multiple studies by separate groups reaching similar conclusions. 

Can you refuse FSTS in Arizona?

Important FSTS Cases

  • The Ultimate Guide for Defeating an Arizona DUI
Ready to fix this?
(602) 494-3444
Ready to get your future back?

Can you send me a brief movie trailer of what you need fixed?

Either my assistant or I will get back to you as soon as possible. 

(602) 494-3444

You are also welcome to call now.