Summary
finding that a reliability coefficient of .79 is a common and acceptable value in the context of a heterogeneous test environment
Summary of this case from Johnson v. City of MemphisOpinion
No. CIV.A.3:99CV359LN.
August 7, 2003.
Edward P. Lobrano, Jr., Lobrano, Butler Kirk, Ridgeland, MS, for Plaintiffs.
Terrell S. Williamson, Phelps Dunbar, Michael Jeffrey Wolf, Page, Kruger Holland, P.A., Jackson, MS, for Defendant.
MEMORANDUM OPINION AND ORDER
Plaintiffs, forty-two black police officers employed by the City of Jackson Police Department, brought this action against the City under Title VII of the Civil Rights Act, 42 U.S.C. § 2000e et seq., and 42 U.S.C. §§ 1981 and 1983 complaining that the procedure used by the City in May 1998 for the selection of individuals for the rank of sergeant in the police department discriminated against black applicants. Plaintiffs assert a disparate impact claim, alleging, in particular, that the written test which was used to qualify applicants to proceed to the second and third rounds of the selection process (simulation exercises and structured interview, respectively), adversely impacted black officers, whose pass rate was significantly below that of the white applicants, and they further charge the City with intentional discrimination in the use of the test results, i.e., disparate treatment, reasoning that an inference of intentional race discrimination arises from the City's use of the test results for its promotion decisions in the face of knowledge of its discriminatory impact on black applicants. The case was tried to the court, and the court, having considered the evidence and arguments presented by the parties, finds and concludes that plaintiffs have failed to establish their claims and that their complaint is therefore due to be dismissed.
In the spring of 1998, the City conducted a three-part test for the purpose of promoting qualified candidates to the rank of sergeant in the Jackson Police Department. The first stage, a written test, was administered in May 1998 to 147 applicants. A total of 147 persons sat for the test, 106 blacks (72%) and 46 whites (28%). Forty-seven of those tested received a passing score, of whom 26 were white and 21 were black. Those 47 candidates progressed to the second stage, which consisted of simulation exercises and, finally, to the third stage, which consisted of a structured interview. Upon completion of all three parts of the test, the test and its results were submitted for approval to the United States Department of Justice pursuant to consent decrees the City had previously entered in 1974 and 1991. By letter dated December 17, 1998, and faxed to the City on that same date, the Justice Department purported to approve the test and its results, but requested that the City "reconsider the procedure for determining the pass point on the police sergeant written examination [since] [t]he City's 1998 police sergeant written examination had an adverse impact on African-American candidates." The promotional list was posted the following day and promotion exercises were held in January 1999, at which time fifteen persons, including eight whites and seven blacks, were promoted to sergeant.
In March 1974, the City entered into a consent decree requiring that the City's use of any employment testing and its consideration of education standards in promotion decisions be supported by (a) a validation study showing that the tests proposed to be used accurately predict job performance, and that the education standards relate to the ability of applicants to do the job; or (b) a showing that the test has no adverse impact on blacks. According to the terms of the consent decree, a test will be considered to meet these standards when it has been approved, in writing, by the Justice Department as so doing, or when the court has so ruled.
On August 21, 1991, the City entered into a supplemental consent decree by which it agreed to formulate non-discriminatory hiring and promotion procedures to be approved by the Justice Department. This decree also provided for creation of the position of a city-wide Equal Employment Opportunity Officer, whose job it would be to ensure compliance with the consent decrees and Title VII, and to coordinate the City's efforts to develop lawful selection procedures.
The letter recited,
The United States does not object to the City making promotions to police sergeant and police lieutenant based on the City's 1998 promotional procedures.
The letter continued:
Although the United States does not object to the City's sergeant promotional procedure, we would like the City to reconsider the procedure for determining the pass point on the police sergeant written examination . . . The pass rates for whites and African-Americans were 23.3%, 21 out of 94, and 56.5%, 26 out of 47, respectively. The mean scores for white and African-American candidates was 70.8% and 63.3%, respectively. Only 33.6% of all of the sergeant candidates passed the written examination. The passing rate seems unusually low. We would request that the City collect additional data as to what may be an appropriate passing score on the written examination. We also suggest that the City collect performance data on police sergeants and lieutenants.
On May 22, 1998, shortly after the test as administered, the plaintiffs herein filed an EEOC charge alleging disparate impact. On February 17, 1999, not long after the promotion were made, the EEOC issued its notice of right-to-sue letter, and on May 21, 1999, within ninety days of receipt of their right-to-sue notice, plaintiffs filed this lawsuit which first included only their disparate impact claim, but was subsequently amended to include their claim that the City engaged in intentional discrimination by using a test that was known to have a disparate impact on black applicants for the position of sergeant.
DISPARATE IMPACT:
The disparate impact theory is used to challenge a facially neutral employment policy that falls more harshly on a protected class of employees and cannot be justified on business necessity. Allison v. Citgo Petroleum Corp., 151 F.3d 402, 409 (5th Cir. 1998). In a disparate impact case, the plaintiff must demonstrate that the "respondent uses a particular employment practice that causes a disparate impact on the basis of race . . . and the respondent [must fail] to demonstrate that the challenged practice is job related for the position in question and consistent with business necessity." 42 U.S.C. § 2000e-2(k)(1)(A)(i). Thus, the plaintiff bears the initial burden of establishing a prima facie case by showing that the promotion method in question had an disparate impact on minorities, and if he sustains that burden, the burden then shifts to the employer to show that the method of selection is valid by establishing that it is "job related" and "consistent with business necessity." Id.; see also Frazier v. Garrison I.S.D., 980 F.2d 1514, 1525 (5th Cir. 1993). If the employer carries its burden by successfully validating the selection method, the burden shifts back to the plaintiff to prove that there was another available method of evaluation which was equally valid and less discriminatory that the employer refused to use. 42 U.S.C. § 2000e-2(k)(1)(A)(ii); see also Frazier, 980 F.2d at 1525.
In this case, based on the statistical disparity between the passing rates of blacks and whites, the City has stipulated that the May 1998 written sergeant's exam had an adverse impact on black applicants. Thus, the City must show that the test is job related for the position in question and consistent with business necessity. Plaintiffs contend that the City has failed to sustain its burden of showing that the examination was job related, but further claim that even if the City has shown the test was valid through an appropriate validation study, plaintiffs have shown that there were equally valid, less discriminatory alternative selection methods which the City refused to use.
At this, what has been termed "the justification stage" of a disparate-impact case, "the dispositive issue is whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer." Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642, 659, 109 S.Ct. 2115, 2126, 104 L.Ed.2d 733 (1989). "The touchstone of this inquiry is a reasoned review of the employer's justification for his use of the challenged practice," id., which requires proof that the challenged practice was job related, in the sense that it measures traits that are significantly related to the applicant's ability to perform the job." Gillespie v. State of Wisconsin, 771 F.2d 1035, 1040 (7th Cir. 1985) (citing Griggs v. Duke Power Co., 401 U.S. 424, 436, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971)); see Griggs, 401 U.S. at 436, 91 S.Ct. 849, 28 L.Ed.2d 158 ("[A]ny given requirement must have a manifest relationship to the employment in question"); Albemarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280, 304 (1975) (quoting the EEOC's Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607.4(c) (1974)) ("The message of these Guidelines is the same as that of the Griggs case — that discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be `predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job'").
A test or other selection method may be "validated," or shown to be sufficiently job-related to comply with the requirements of Title VII, under any one of three validation methods: criterion related, content validity or construct validity. See Uniform Guidelines on Employee Selections Procedures, 29 C.F.R. § 1607.5(B). These methods have been concisely described as follows:
See Ensley Branch of the N.A.A.C.P. v. Seibels, 616 F.2d 812, 816 n. 11 (5th Cir. 1980) (defining "validation").
A criterion-related validation study determines whether the test is adequately correlated with the applicant's future job performance. Criterion-related tests are constructed to measure certain traits or characteristics thought to be relevant to future job performance. An example of an employment test that would be validated by the criterion-related validation method is an intelligence test. The content validation strategy is utilized when a test purports to measure existing job skills, knowledge or behaviors. "The purpose of content validity is to show that the test measures the job or adequately reflects the skills or knowledge required by the job." For example, a typing test given to prospective typists would be validated by the content validation method. Construct validity is used to determine the extent to which a test may be said to measure a theoretical construct or trait. For example, if a psychologist gave vocabulary, analogies, opposites and sentence completion tests to a group of subjects and found that the tests have a high correlation with one another, he might infer the presence of a construct — a verbal comprehension factor.Gillespie, 771 F.2d at 1040 n. 3 (citations omitted). See also Ensley Branch of the N.A.A.C.P. v. Seibels, 616 F.2d 812, 816 n. 11 (5th Cir. 1980) (explaining that "criterion" validity is demonstrated "by identifying criteria that indicate successful job performance and then correlating test scores and the criteria so identified"; "construct" validity is "demonstrated by examinations structured to measure the degree to which job applicants have identifiable characteristics that have been determined to be important in successful job performance"; and "content" validity is demonstrated by tests whose content closely approximate tasks to be performed on the job by the applicant") (quoting Washington v. Davis, 426 U.S. 229, 247, 96 S.Ct. 2040, 2051, 48 L.Ed.2d 597 n. 13 (1976)). The EEOC's Uniform Guidelines provide that a selection procedure may be used "if it represents a critical work behavior (i.e., a behavior which is necessary for performance of the job) or work behaviors which constitute most of the important parts of the job." 29 C.F.R. § 1607.14(C)(8). The Guidelines further state that "[a]ny validity study should be based upon a review of information about the job for which the selection procedure is to be used." 29 C.F.R. § 1607.14(A).
In the case at bar, the City has sought to establish the validity of the 1998 sergeant's examination by showing that the test was content valid. "A test will have content validity if there is a direct relationship between the test contents and the job contents." Williams v. Ford Motor Co., 187 F.3d 533, 540 (6th Cir. 1999) (citing Police Officers for Equal Rights v. City of Columbus, 644 F.Supp. 393, 414 (S.D.Ohio 1985)). See also Uniform Guidelines, 29 C.F.R. § 1607.5 ("Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated."); 29 C.F.R. § 1607.14(C)(1) ("A selection procedure can be supported by a content validity strategy to the extent that it is a representative sample of the content of the job"). A content validity study should measure knowledge, skills, or abilities that are "necessary prerequisite[s]" for the "performance of critical or important work behavior(s)" for the job. Uniform Guidelines 29 C.F.R. § 1607.14(C)(4). Thus, in order to construct a content valid examination under the Uniform Guidelines, a detailed job analysis must be performed for the job in question that focuses on the work behaviors necessary for successful performance of the job and also on the tasks associated with those behaviors. Id. § 1607.14(C)(2).
Section 1607.14(C)(4) of the EEOC Guidelines provides:
[T]o be content valid, a selection procedure measuring a skill or ability should either closely approximate an observable work behavior, or its product should closely approximate an observable work product. If a test purports to sample a work behavior or to provide a sample of a work product, the manner and setting of the selection procedure and its level and complexity should closely approximate the work situation. The closer the content and the context of the selection procedure are to work samples or work behaviors, the stronger is the basis for showing content validity. As the content of the selection procedure less resembles a work behavior, or the setting and manner of the administration of the selection procedure less resemble the work situation, or the result less resembles the work product, the less likely the selection procedure is to be content valid, and the greater the need for other evidence of validity.
In this case, the record establishes that when first hired by the City in 1993 to develop a promotional process for the rank of sergeant in the police department, Semko and Associates (Semko), an employment testing firm, conducted a job analysis of the sergeant position for the City. Semko, whose team consisted of Dr. Elizabeth Semko, Dr. John Wade and William Cooley, investigated the job systematically to identify the tasks, duties and responsibilities that comprise the position and to ascertain the knowledge, skills and abilities (KSAs) that one would need in order to perform those tasks satisfactorily. In that job analysis, Semko initially interviewed the vast majority of persons then occupying the rank of sergeant, as well as a number of lieutenants (to whom sergeants report in the chain of command), and based on those interviews, the Semko team developed lists of tasks and KSAs. Those lists were then provided to a panel of "subject matter experts" — the panel being comprised of the seven sergeants who had not been interviewed initially — so that the tasks could be rated on the frequency that a sergeant would perform each task and the tasks and KSAs could be rated as to their relative importance to the job of sergeant. The tasks were then linked to KSAs, which were divided or grouped into twelve job "dimensions" which were defined by the KSAs that fell under it, and the dimensions themselves were weighted, or rated, according to their relative importance to the job by evaluating the number and relative importance of the KSAs comprising the dimension. This job analysis, which was first prepared and used by Semko to develop a sergeant's promotional procedure in 1994, was used again in connection with Semko's development of promotional procedures in 1996 and the 1998 procedure that is at issue, after Semko determined, upon an updated evaluation, that the job analysis still accurately represented the position since the job had not changed task-wise or KSA-wise from 1993 to 1998.
In 1998, as in the two previous promotional procedures, Semko developed a three-part selection process, consisting, first, of a written screening test to qualify a limited number of candidates who could go on to the next two stages of the promotion procedure, which were an assessment center and a panel interview. As reflected in Semko's proposal to the City, and as confirmed at trial by both Semko witnesses and City officials, the determination was made "to utilize the written test as a screening device, as well as incorporating it in the final scores that determined the promotion list" based largely on cost considerations, since the assessment center and panel interview were lengthy procedures requiring considerable work and expense and City officials did not consider it financially feasible for all candidates to go through the full three-step process. However, in conjunction with these cost considerations, Semko also determined that the written test would best evaluate two of the twelve dimensions identified in the job analysis, technical knowledge and legal knowledge, both of which were considered relatively important job dimensions. Semko thus set about the task of developing a suitable written test that would cover knowledge of laws, regulations, policies and procedures that sergeants in the Jackson Police Department should know. In addition to reliance on the job analysis, the Semko team identified and reviewed source materials that a sergeant must know to do the job, which included the Jackson Police Department General Orders, portions of the Mississippi Code and City ordinances and Civil Service Orders and Regulations for the City. Then, members of the team, working individually and together, devised a total of 99 test questions, which were written to sample the job knowledge in the source materials and materials identified in the job analysis as important for a sergeant to know. Semko relied on the job analysis information to determine the number of questions to be included from the various sources, and designed the test items to measure those KSAs that were determined by the job analysis as important to the successful performance of the sergeant's job.
The test items written by the Semko team were presented to a panel of three subject matter experts — two black and one white sergeant — which rated each test item on relevance and discussed the quality of the test items. Based on the panel's comments, eight items were excluded. Four of the questions had relevancy ratings that were not sufficiently high to warrant inclusion on the test; as to two more questions, the subject matter experts reported that common practice differed from the General Orders so that the questions could be confusing; and the subject matter experts were of the opinion that on two questions, none of the answers was entirely accurate. And based on input from the subject matter experts, on another four of the test questions, credit was given for two answers, either one of which might be thought correct.
Candidates were given three hours to complete the written job test, though Dr. Semko estimated that only 90 minutes would be needed. Initially, the Jackson Civil Service Commission, which was in charge of approving the promotional procedure, determined that the cut-off score, i.e., the score applicants would have to attain to move on to the other phases of the process, would be 80, but after hearing objections and with the advice of Semko, determined that the pass score would be 80, unless fewer than forty-five candidates scored 80 or above, since a determination was made that there should be three times the number of initial openings (15) on the promotion list. The Commission determined that in the event fewer than forty-five candidates scored 80 or higher, the cut-off score would be lowered to 70, but in no event would a candidate who scored below 70 advance to the assessment center and panel interview. As it turned out, each of the forty-seven candidates who scored 70 or higher on the written test moved on to the assessment center and panel interview. Plaintiffs' Challenge
As Dr. Wade testified, "[T]he reason we had a minimum score at all was they (the Civil Service Commission) said that if the sergeants can't get 70 percent of the questions like this right, they shouldn't . . . be sergeants."
Interestingly, one of the plaintiffs herein, Deric Hearn, argued to the Civil Service Commission that it should allow candidates with 70 or higher to move on through the process, which is exactly what happened.
In connection with their disparate impact claim, plaintiffs submit that the City has not demonstrated the validity of the test, and challenge all facets of the development and utilization of the written test in question, including the job analysis, the test questions, establishment of the cut-score and the scoring of the test. As an initial matter, plaintiffs argue that the record is devoid of any data that relates written test scores to actual job performance, and insist, in fact, that there is no evidence to indicate that the City sought to measure the job performance of candidates who were promoted to sergeant such that some correlation between the test and job performance could be made. Plaintiffs insinuate, in other words, that the test was not properly validated inasmuch as Semko used content validity as the sole method of validation of the test at issue, when, according to plaintiffs, Semko should have used criterion-validity or a combination of content- and criterion-validity in an effort to correlate performance on the test to actual job performance. The court rejects this position.
"Neither the case law nor the Uniform Guidelines purports to require that an employer must demonstrate validity using more than one method." Williams v. Ford Motor Co., 187 F.3d at 544-45 (citing 29 C.F.R. § 1607.5(A) ("For the purposes of satisfying these guidelines, users may rely upon criterion-related validity studies, content validity studies or construct validity studies"), and § 1607.14(C)(1)) ("Users choosing to validate a selection procedure by a content validity strategy should determine whether it is appropriate to conduct such a study in the particular employment context."); see also Washington v. Davis, 426 U.S. 229, 248 n. 13, 96 S.Ct. 2040, 2051 n. 13, 48 L.Ed.2d 597 (1976) (stating that "[i]t appears beyond doubt by now that there is no single method for appropriately validating employment tests for their relationship to job performance," and that any of the three recognized basic methods of validation may be used). Plaintiffs do not dispute that content-validation can be an appropriate tool, but insist that here, something more or different was needed. On this point, the court recognizes that the nature of the inquiry at issue may tend to recommend the use of one type validation study over another, and that there are circumstances in which criterion-related validation might be preferable to content-validation. However, plaintiffs have not persuaded the court that content validation was not an appropriate or sufficient tool to for use in this specific situation. Drs. Semko, Wade and Landy, whose testimony the court found consistent and credible, testified that content validation alone was adequate and proper, and that while criterion-related validation might also have been useful, it was neither required nor, in fact, reasonably available as a validation tool under the particular circumstances. In this vein, Dr. Semko cogently explained that although the plan from the outset had been to do a content validity study, Semko was interested in a criterion-related study, as well, to determine whether test performance was a valid predictor of job performance based on an analysis of the job performance of earlier Semko test takers who had been promoted to sergeant and performed in the job for a period of time. It was determined, though, that there was no reliable, objective and valid performance criterion that would have supported such a study since although the City had in place a performance evaluation system of sorts, that system was thought to be too subjective to provide a valid measure of job performance. Dr. Semko made clear in her testimony that had there been a reliable, objective measure of job performance, she and her associates would have looked at the relationship between the predictor (the test) and that measure of job performance; but because that was no such measure and criterion validation was thus not a viable option, Semko determined to use a content validity study exclusively. The law, in the court's opinion, required no more.
For example, the Uniform Guidelines suggest that content validity is not a proper vehicle to test for skills that can readily be learned on the job. Uniform Guidelines 29 C.F.R. § 1607.5(F), entitled, "Caution against selection on basis of knowledges, skills, or abilities learned in brief orientation period," states:
Caution against selection on basis of knowledges, skills, or ability learned in brief orientation period. In general, users should avoid making employment decisions on the basis of measures of knowledges, skills, or abilities which are normally learned in a brief orientation period, and which have an adverse impact.
Similarly, § 1607.14(C)(1), which sets forth standards for content validity studies, provides, in relevant part:
Content validity is also not an appropriate strategy when the selection procedure involves knowledges, skills, or abilities which an employee will be expected to learn on the job.
Here, plaintiffs argue, among other things, that content validation was not appropriate since the written test was designed to evaluate applicants' knowledge of materials to which ready reference could have been made on the job, such as the General Orders. However, while it is true that police sergeants need not rely solely on memory to do their job, and have at their disposal materials to find answers to questions to which they may not know the answer, it is undeniable that there is a basic fund of knowledge that persons occupying the position of sergeant should know, without the need to refer to source materials, and it does appear from the evidence that the test was designed to gauge applicants' knowledge of these more important matters. The test was not designed to evaluate matters that could readily be learned on the job, but rather to evaluate the officers' knowledge of matters that they ought to readily know in order to be effective in the position of sergeant.
The court notes that plaintiffs have moved to disallow Dr. Landy's testimony. The court finds this request is not well taken, and rejects it.
As stated, the purpose of a content validity study is to demonstrate that the content of an exam matches, or correlates to the content of the job to be performed and in that way is a measure of job performance. Content validation of a test used as a screening tool should demonstrate that the minimum passing score correlates to the minimum amount of knowledge necessary to effective performance of the job. Thus, it is clear that content validity is an appropriate tool for validation of a test used solely as a measure of minimal competence to perform the job. However, the cases indicate that where a test is used not to determine minimum competence to perform the job, but for ranking purposes, unrelated to minimum competence, there must be proof that a higher test score correlates to better job performance. See, e.g., Ensley Branch of N.A.A.C.P. v. Seibels, 616 F.2d 812, 822 (5th Cir. 1980) ("Use of a test for such ranking purposes, rather than as a . . . device to screen out candidates without minimum skills, is justified only if there is evidence showing that those with a higher test score do better on the job than those with a lower test score."). Thus, in such cases, to be sufficient, the validation studies must show that higher test scores predict better job performance. In the court's opinion, the absence of such evidence in this case does not render the City's content validation study inadequate, for the test here was used as a screening device, i.e., to identify those thought to possess the minimum level of technical and legal knowledge required for the job, and not merely to rank applicants for selection for promotion.
Plaintiffs submit that the City has not established the test's validity in any event. With respect to the job analysis conducted by Semko plaintiffs object (1) that it was not developed contemporaneously with the 1998 promotional process but rather was prepared several years earlier; (2) that an insufficient number of subject matter experts were consulted in its preparation; (3) that the rating system used by the subject matter experts was not sufficiently defined; and (4) that Semko made no timely effort to link KSA's and tasks. As to the test itself, plaintiffs object that the relevance and accuracy of test questions and answers was not determined before the test was administered, and complain, as well, of what they contend was poor "readability" of numerous test items, all of which resulted in the inclusion of test items which should have been reworked or excluded altogether. Plaintiffs further complain that the test was not evaluated after-the-fact for internal consistency, i.e., positive item correlation, and that consequently, no effort was made after-the-fact to adjust the scoring, or to revise the cut-score. The court considers these contentions in turn.
In the court's opinion, plaintiffs' various objections to the job analysis are not well founded. The fact that Semko relied on a job analysis that had been developed several years earlier does not detract from the propriety of Semko's reliance on that job analysis in devising the 1998 promotion procedure. As is apparent not only from the testimony of witnesses in this case, including most notably Drs. Semko and Landy, but from numerous cases, there is no requirement in the industry or in the law that a new job analysis be prepared for each successive selection procedure, and an earlier-developed job analysis may appropriately be used so long as it is established that the job analysis remains relevant and accurate. See, e.g., Rudder v. District of Columbia, 890 F.Supp. 23, 42 (D.D.C. 1995) (fact that job analysis was begun four years earlier and may have borrowed from an even earlier job analysis did not discredit its validity where proof was presented that officials were interviewed to make sure that the job analysis was still relevant and a determination was made that the jobs had not changed in any way related to the testing procedure). The evidence here demonstrates that before proceeding in reliance on the 1993 job analysis, Semko confirmed that it remained valid for the 1998 promotion process by presenting it for review by the police chief and existing sergeants.
Dr. Landy testified that a new job analysis is needed only if one of two things is present, namely, if the department has changed structurally, so that positions have been added or eliminated in a way to change the duties and responsibilities of the position in question, or if the department has fundamentally changed the way it does its work. With respect to the job of police sergeant, in particular, Dr. Landy testified that he has been analyzing this position for thirty years, during which time the essential duties and responsibilities tend to have remained the same. And according to Dr. Landy, conventional wisdom places the shelf-life of a job analysis for the sergeant position at "five plus years," and up to ten years or more.
Further, although plaintiffs object that the number of subject matter experts used in the development of the job analysis was inadequate, the court concludes otherwise. In fact, the evidence shows that in conducting the job analysis initially, Semko drew upon the knowledge and experience of all the sergeants, as well as some lieutenants, in the Jackson Police Department; and although the actual panel of subject matter experts was more limited, consisting of seven sergeants, that number was clearly sufficient for Semko's purposes.
In a related vein, the court finds no merit in plaintiffs' implicit challenge to Semko's efforts to update the job analysis, which simply required confirmation by a few knowledgeable people in the department that the job analysis remained accurate, which was done; nor, in the court's view, is there merit to plaintiffs' objection that the the rating system used by the subject matter experts was not sufficiently defined, for in the court's opinion, it was.
Plaintiffs charge, additionally, that the job analysis was flawed because Semko made no effort to link tasks and KSA's, or to demonstrate the linkages, prior to preparation of the test questions. The court, however, finds from the evidence that this was done at or before the time the test was constructed and thus rejects plaintiffs' position on this point. In summary, then, based on the evidence adduced at trial, the court concludes that the job analysis conformed to the standards of industrial psychology, as well as the EEOC Guidelines.
As with the job analysis, plaintiffs' objections to the test and test procedure are many, though the court concludes, as it has with respect to the job analysis, that they are not well taken. Among other things, plaintiffs fault Semko, and hence the City, for administering the test prior to validating the test through analysis and evaluation by the subject matter experts, and further, for failing to evaluate the test after-the-fact for reliability and to make necessary adjustments. As to the former contention, Dr. Semko agreed in her testimony that, although there is nothing in the law or in the applicable Guidelines that mandates it, tests are ordinarily evaluated by subject matter experts in advance of being administered to applicants so that potential problems can be identified and rectified before the test is given to the applicants. She acknowledged that this was not done here, and that instead, the test was submitted to the three subject matter experts for their evaluation and comments at the same time it was administered to applicants. She reasonably and credibly explained, though, that a determination was made to present the test to the subject matter experts contemporaneously with administration of the test to applicants based on the desire expressed by some applicants in the orientation session that the content of the test not be disclosed to anyone in the department prior to administration of the exam.
Dr. Semko also described the manner in which the problems identified by the subject matter experts were addressed, by excluding test items that were deemed unacceptable for one reason or another — resulting in a 91-item rather than the original 99-item test — and giving credit for more than one answer where appropriate. Plaintiffs submit that while these steps were certainly necessary under the circumstances in light of Semko's decision to proceed with the test without a prior assessment of the test's adequacy by the subject matter experts, additional steps should have been taken by Semko to validate the test, including, inter alia, assessment of the test results for reliability, or internal consistency. On this issue, plaintiff's expert, Dr. Wayne Burroughs, opined that the test is not valid because there are too many negative correlations between questions. Specifically, Dr. Burroughs testified that whereas an acceptable reliability coefficient for the exam would have been .95, he determined that the reliability coefficient for this exam was .79. He maintained, therefore, that as many as nineteen questions should have been eliminated or the cut score should have been lowered to account for this. Dr. Landy explained, however, that while a high reliability coefficient, or correlation coefficient, in the range of .9 or above, would be expected where an examination is intended to measure a skill or ability that is homogenous, such as arithmetic, a lower value would be expected when measuring something that is heterogenous, such as in the case of a knowledge exam that draws material for test items from multiple sources. According to Dr. Landy, in the latter case, which is the category into which the exam at issue would fall, a reliability coefficient of .79 is a common and acceptable value. The court credits Dr. Landy's testimony in this regard. Additionally, Dr. Landy testified that upon analy sis, he determined that upon eliminating the questions identified by Dr. Burroughs, the reliability coefficient of the test actually dropped from .79 to .74, the percentage of blacks passing dropped from 22% to 16% and the adverse impact actually increased marginally. His testimony in this regard was not rebutted or otherwise challenged by plaintiffs.
Plaintiffs suggest in their post-trial memorandum that the test is subject to challenge on the basis that Semko failed to perform a differential item functioning analysis to determine whether, and if so on which items, blacks performed more poorly than whites, so that an effort could have been made to reduce adverse impact by eliminating those items on which blacks performed more poorly. However, at trial, their expert, Dr. Burroughs, did not suggest that this was required. In any event, Dr. Landy, whose testimony the court found, on the whole, to be credible, testified that the consensus of professional opinion is that differential item functioning modifications of tests is not a good idea because it reduces the validity of the examination. Dr. Landy persuasively explained:
[T]he problem with that is suppose one of those items is knowledge item and has to do with an issue like a Miranda issue or an issue in the preservation of evidence or the item that we just saw about a hostage situation. You're going to take that item out only because whites answer it more correctly than blacks do, in spite of the fact that you'd really want a sergeant to know this because the sergeant is going to supervise. A police officer is going to count on that officer to tell him or her what to do. So you're reducing the validity of the exam just for the sake of making sure that there are no items in which whites and blacks do differentially, or DIF, and he's assuming that the reason that 65 percent of the blacks got it right and 70 percent of the whites got it right was that it's an unfair item rather than, hey, maybe two or three whites or two or three blacks studied more or less that section of general orders.
Plaintiffs have insinuated in their memorandum that the City's methodology in setting the cut-score in the first place was improper, yet every witness, including Dr. Burroughs, testified that it is a common and acceptable practice to establish a pass rate by reference to some multiple of the number of anticipated vacancies. Dr. Burroughs thought that while this was an appropriate way in which to set the cut-score in advance of the test, he believed that after the test results were in, and Semko had information that would have suggested, upon proper analysis, that the level of reliability of the test was in doubt, the cut-score should have been lowered.
Dr. Landy cited the bar exam as an example of a heterogenous knowledge exam, and testified:
[I]f I was going to calculate the reliability, the consistency of the bar examination, it would be considerably lower than the consistency or the reliability of numerical ability. . . . [T]he knowledge exam in Jackson, like the bar exam or like many other kinds of licensing examinations, draw their material for test items from many different heterogenous sources. It's not a single unitary ability —
. . . Knowledge is more diverse, and that's why you would expect the reliability coefficient not to be .95, to be .79 as it is in this case.
Dr. Landy pointed out, in fact, that the reliability coefficient of the 1991 multi-state bar exam was "virtually identical" to the reliability of the written examination portion of Jackson's 1998 sergeant's exam.
The court notes that Dr. Wade similarly testified that the reliability index used by Dr. Burroughs "gives you probably spuriously low coefficients on a test like this where the difficulty of items varies considerately."
Thus, the court rejects plaintiffs' argument that the City could have eliminated these questions or lowered the cut-score to account for the low reliability coefficient as an alternative with less adverse impact.
On another point, Dr. Burroughs asserted in his report and trial testimony that the test functioned poorly as a measure of job knowledge based on the poor readability of numerous test items. Although not entirely specific in his objection on this ground, Dr. Burroughs testified generally that he believed the reading level of the test exceeded the reading level required for the sergeant's job by several grade levels making the test more a test of applicants' reading ability and comprehension than a test of job knowledge. However, the court credits Dr. Landy's opinion that readability was not an issue in the performance of black and white candidates. According to Dr. Landy, in looking at the readability of what might appear on a computer-based analysis to have been less readable test items, there were not any differences in the pass rates between blacks and whites. Dr. Landy further opined that the reading level of the test was equal to or lower than — in fact, much lower than — the reading level required by the job. In this regard, Dr. Landy opined that the reading level required by the job of sergeant — not just of police officer, but of sergeant — was 13th to 14th grade, not 9th to 10th grade as testified to by Dr. Burroughs, and that the reading level of the test was "much lower" than the 11th to 12 grade level testified to by Dr. Burroughs.
Dr. Wade also addressed the readability issue, and took the position, as did Dr. Landy, that once some of the more difficult or complex words that were derived directly from the source materials and which would (or should be be readily be known to a police officer, e.g., perpetrator, contraband, confiscate, apprehend, the readability level was relatively low.)
Based on all of the foregoing, the court concludes that the City has sustained its burden to establish the validity of the test. That is, the 1998 test was content valid and sufficiently reliable. As explained supra, once an employer establishes job relatedness, as the City has done, the burden shifts to the plaintiffs to demonstrate that an equally valid selection procedure with less adverse impact exists that the employer refused to use. In this vein, plaintiffs submit that alternatives were available to the City which would have had less adverse impact, including (1) using the panel interview, rather than the written test, as the screening device, since the literature suggests that blacks perform better in interviews than on written exams; or (2) allowing all applicants to complete all three phases of the process, as was done by the City of Jackson in its 2002 promotion procedure for the sergeant position.
Notably, plaintiffs have not suggested that the assessment center should have been used as the screening device, perhaps because as noted by Semko, the black candidates who went through the assessment center did relatively poorly on "technical knowledge, written communication, decisiveness/judgment, legal knowledge and training skills." In fact, according to Semko's report, "with regard to legal knowledge and technical knowledge, the black candidates in this group fared much better on the written test score than on the assessment center exercise which measured the same KSAs."
As to the use of the panel interview as the screening device, all the witnesses who testified at trial, including Dr. Burroughs, agreed that the most effective (and cost-efficient) way to measure technical and legal knowledge, both of which are essential to effective performance of the job in question, is an objective, standardized written test. In her testimony, Dr. Semko explained that other dimensions, like communication skills and interpersonal skills, which can be measured and evaluated in the interview setting, are important after the foundation skills of technical and legal knowledge, since "[i]t doesn't matter how good a communicator someone is or how good their interpersonal skills are in the job of sergeant if they don't have an adequate level of technical and legal knowledge." Viewed from this perspective, if a screening device was to be used, then it should logically have been the written test, since in the court's opinion, weaknesses in technical and legal knowledge are legitimate dimensions on which to screen out candidates.
Moreover, as Dr. Wade noted, had the City chosen to use a panel interview as a screening device, there would no doubt have been vigorous objection to what many would have perceived as a "good ole boy system" that prevented them from vying for the positions.
Plaintiffs have suggested, though, as an alternative with less discriminatory impact, that the City could have allowed all the applicants to go through the entire selection procedure, which is precisely what the City did in 2002. The testimony was clear, however, that coupled with the City's determination that technical and legal knowledge were essential for the position was the fact of budgetary constraints facing the City, which gave rise to a need to conduct the promotion selection process as efficiently and economically as could reasonably be done. All the witnesses agreed that it would have required much more time and more money for all 150 candidates to have gone through all three phases of the process, with the assessment center and panel interviews requiring the most time and money. They agreed, additionally, that to allow all candidates to go through the process could have contributed to a feeling among candidates that the process was not fair and unbiased. For example, for all applicants to have gone through the panel interview, multiple panels would have been required, which could have contributed to a feeling among candidates that a disparity in interviewers' scoring adversely affected them. And to have allowed all candidates to have proceeded through the assessment centers would have necessitated a lengthy process with the resultant risk that those who went through the centers later might be privy to information not available to those who went through the press earlier.
Additionally, as Dr. Landy observed, many of those who go through the process have no realistic chance for promotion, particularly where there are limited numbers of openings, so to have them go through the entire process could be seen as setting a false expectation in the individuals, and asking them to invest a lot of time and energy into something for which they have no realistic chance of success.
Given the considerations identified the court concludes that plaintiffs have not shown that there existed equally valid selection procedures that the City refused to follow, and the court thus concludes that plaintiffs have failed to sustain their burden to prove disparate impact. DISPARATE TREATMENT
In addition to their disparate impact claim, plaintiffs have alleged that the City engaged in intentional race discrimination, or disparate treatment, by proceeding to use the results of a test which it knew had a discriminatory impact and rely, in support of their contention in this regard, on the fact that the City used the test results, without making any adjustment to the results or cut-score, after the Justice Department had specifically informed the City that the test had a disparate impact. In the court's opinion, however, City officials involved in the decision to so proceed, all of whom, as it happens, were black, testified credibly that they had no intent to discriminate. All of the City's witnesses explained that while they were aware of the Justice Department's comments regarding the test and test results, they believed those comments related to any future exams they might use and interpreted the Justice Department's letter as expressly approving their use of the test results for this particular round of promotions. Their interpretation in this regard was reasonable, in the court's opinion, given the language of the Justice Department's letter, and in the absence of further proof to suggest a basis for inferring an intent on the part of City officials to discriminate, the court concludes that plaintiffs' disparate treatment claim is without merit and should be dismissed.
Conclusion
Based on the foregoing, it is ordered that plaintiffs' complaint be dismissed.
A separate judgment will be entered in accordance with Rule 58 of the Federal Rules of Civil Procedure.