Skip to page navigation
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

OPM.gov / Policy / Assessment & Selection / Designing an Assessment Strategy
Skip to main content

Designing an Assessment Strategy

 

Overview

When designing an assessment strategy and when selecting and evaluating assessment tools it is important to consider a number of factors such as:

Reliability

The term reliability refers to consistency. Assessment reliability is demonstrated by the consistency of scores obtained when the same applicants are reexamined with the same or equivalent form of an assessment (e.g., a test of keyboarding skills). No assessment procedure is perfectly consistent. If an applicant's keyboarding skills are measured on two separate occasions, the two scores (e.g., net words per minute) are likely to differ.

Reliability reflects the extent to which these individual score differences are due to "true" differences in the competency being assessed and the extent to which they are due to chance, or random, errors. Common sources of such error include variations in:

  • Applicant's mental or physical state (e.g., the applicant's level of motivation, alertness, or anxiety at the time of testing)
  • Assessment administration (e.g., instructions to applicants, time limits, use of calculators or other resources)
  • Measurement conditions (e.g., lighting, temperature, noise level, visual distractions)
  • Scoring procedures (e.g., raters who evaluate applicant performance in interviews, assessment center exercises, writing tests)

A goal of good assessment is to minimize random sources of error. As a general rule, the smaller the amount of error, the higher the reliability.

Reliability is expressed as a positive decimal number ranging from 0 to 1.00, where 0 means the scores consist entirely of error. A reliability of 1.00 would mean the scores are free of any random error. In practice, scores always contain some amount of error and their reliabilities are less than 1.00. For most assessment applications, reliabilities above .70 are likely to be regarded as acceptable.

The practical importance of consistency in assessment scores is they are used to make important decisions about people. As an example, assume two agencies use similar versions of a writing skills test to hire entry-level technical writers. Imagine the consequences if the test scores were so inconsistent (unreliable) applicants who applied at both agencies received low scores on one test but much higher scores on the other. The decision to hire an applicant might depend more on the reliability of the assessments than his or her actual writing skills.

Reliability is also important when deciding which assessment to use for a given purpose. The test manual or other documentation supporting the use of an assessment should report details of reliability and how it was computed. The potential user should review the reliability information available for each prospective assessment before deciding which to implement. Reliability is also a key factor in evaluating the validity of an assessment. An assessment that fails to produce consistent scores for the same individuals examined under near-identical conditions cannot be expected to make useful predictions of other measures (e.g., job performance). Reliability is critically important because it places a limit on validity.

Validity

Validity refers to the relationship between performance on an assessment and performance on the job. Validity is the most important issue to consider when deciding whether to use a particular assessment tool because an assessment that does not provide useful information about how an individual will perform on the job is of no value to the organization.

There are different types of validity evidence. Which type is most appropriate will depend on how the assessment method is used in making an employment decision. For example, if a work sample test is designed to mimic the actual tasks performed on the job, then a content validity approach may be needed to establish the content of the test matches in a convincing way the content of the job, as identified by a job analysis. If a personality test is intended to forecast the job success of applicant's for a customer service position, then evidence of predictive validity may be needed to show scores on the personality test are related to subsequent performance on the job.

The most commonly used measure of predictive validity is a correlation (or validity) coefficient. Correlation coefficients range in absolute value from 0 to 1.00. A correlation of 1.00 (or -1.00) indicates two measures (e.g., test scores and job performance ratings) are perfectly related. In such a case, you could perfectly predict the actual job performance of each applicant based on a single assessment score. A correlation of 0 indicates two measures are unrelated. In practice, validity coefficients for a single assessment rarely exceed .50. A validity coefficient of .30 or higher is generally considered useful for most circumstances (Biddle, 2005). 1

When multiple selection tools are used, you can consider the combined validity of the tools. To the extent the assessment tools measure different job-related factors (e.g., reasoning ability and honesty) each tool will provide unique information about the applicant's ability to perform the job. Used together, the tools can more accurately predict the applicant's job performance than either tool used alone. The amount of predictive validity one tool adds relative to another is often referred to as the incremental validity of the tool. The incremental validity of an assessment is important to know because even if an assessment has low validity by itself, it has the potential to add significantly to the prediction of job performance when joined with another measure.

Just as assessment tools differ with respect to reliability, they also differ with respect to validity. The following table provides the estimated validities of various assessment methods for predicting job performance (represented by the validity coefficient), as well as the incremental validity gained from combining each with a test of general cognitive ability. Cognitive ability tests are used as the baseline because they are among the least expensive measures to administer and the most valid for the greatest variety of jobs. The second column is the correlation of the combined tools with job performance, or how well they collectively relate to job performance. The last column shows the percent increase in validity from combining the tool with a measure of general cognitive ability. For example, cognitive ability tests have an estimated validity of .51 and work sample tests have an estimated validity of .54. When combined, the two methods have an estimated validity of .63, an increase of 24% above and beyond what a cognitive ability test used alone could provide.

Back to Top

Table 1: Validity of Various Assessment Tools Alone and in Combination
Assessment methodValidity of method used aloneIncremental
(combined) validity
% increase in validity from combining tool with cognitive ability
Tests of general cognitive ability .51    
Work sample tests .54 .63 24%
Structured interviews .51 .63 24%
Job knowledge tests .48 .58 14%
Accomplishment record* .45 .58 14%
Integrity/honesty tests .41 .65 27%
Unstructured interviews .38 .55 8%
Assessment centers .37 .53 4%
Biodata measures .35 .52 2%
Conscientiousness tests .31 .60 18%
Reference checking .26 .57 12%
Years of job experience .18 .54 6%
Training & experience point method .11 .52 2%
Years of education .10 .52 2%
Interests .10 .52 2%

Note:

Table adapted from Schmidt & Hunter (1998). Copyright © 1998 by the American Psychological Association. Adapted with permission. 2

* Referred to as the training & experience behavioral consistency method in Schmidt & Hunter (1998).

Technology

The technology available is another factor in determining the appropriate assessment tool. Agencies that receive a large volume of applicants for position announcements may benefit from using technology to narrow down the applicant pool, such as online screening of resumes or online biographical data (biodata) tests. Technology can also overcome distance challenges and enable agencies to reach and interview a larger population of applicants.

However, because technology removes the human element from the assessment process, it may be perceived as "cold" by applicants, and is probably best used in situations that do not rely heavily on human intervention, such as collecting applications or conducting applicant screening. Technology should not be used for final selection decisions, as these traditionally require a more individualized and in-depth evaluation of the candidate (Chapman and Webster, 2003). 3

Legal Context of Assessment

Any assessment procedure used to make an employment decision (e.g., selection, promotion, pay increase) can be open to claims of adverse impact based on subgroup differences. Adverse impact is a legal concept used to determine whether there is a "substantially different" passing rate (or selection rate) between two groups on an assessment procedure (see www.uniformguidelines.com for a more detailed discussion). Groups are typically defined on the basis of race (e.g., Blacks compared to Whites), gender (i.e., males compared to females), or ethnicity (e.g., Hispanics compared to Non-Hispanics). Assessment procedures having an adverse impact on any group must be shown to be job-related (i.e., valid).

What is a "substantially different" passing rate? The Uniform Guidelines provide a variety of statistical approaches for evaluating adverse impact. The most widely used method is referred to as the 80% (or four-fifths) rule-of-thumb. The following is an example where the passing rate for females is 40% and the passing rate for males is 50%. The Uniform Guidelines lay out the following steps for computing adverse impact:

  • Divide the group with the lowest rate (females at 40%) by the group with the highest rate (males at 50%)
  • In this case, divide 40% by 50% (which equals 80%)
  • Note whether the result is 80% or higher

According to the 80% rule, adverse impact is not indicated as long as the ratio is 80% or higher. In this case, the ratio of the two passing rates is 80%, so evidence of adverse impact is not found and the passing rate of females is not considered substantially different from males.

Agencies are encouraged to consider assessment strategies to minimize adverse impact. When adverse impact is discovered, the assessment procedure must be shown to be job-related and valid for its intended purpose.

Back to Top

Face Validity/Applicant Reactions

When applicants participate in an assessment process, they are not the only ones being evaluated; the agency is being evaluated as well. Applicants who complete an assessment process leave with impressions about the face validity and overall fairness of the assessment procedure. Their impressions can also be impacted by whether they believe they had a sufficient opportunity to display their job-related competencies. The quality of the interactions between the applicant and agency representatives can also affect applicant reactions. Agencies using grueling assessment procedures may end up alienating applicants. It is important to recognize applicants use the assessment process as one means to gather information about the agency. Failure to act on this fact can be very costly to agencies, particularly if top candidates are driven to look elsewhere for employment opportunities.

Designing a Selection Process

The design of an assessment strategy should begin with a review of the critical competencies identified from the job analysis results. Once you decide what to assess, you must then determine how to structure the personnel assessment process. In designing a selection process, a number of practical questions must be addressed, such as:

  • How much money is available?
  • What assessment tool(s) will be selected?
  • If using multiple tools, in what order should they be introduced?
  • Are trained raters needed, and if so, how many (e.g., for conducting interviews)?
  • How many individuals are expected to apply?
  • What is the timeframe for filling vacancies?

For example, if your budget is tight, you will need to rule out some of the more expensive methods such as assessment centers or work simulation tests. If you are expecting to receive thousands of applications (based on projections from similar postings), you will need to develop an effective screening mechanism ahead of time. If you need to fill a vacancy and only have a few weeks to do so, then a multi-stage process will probably not be feasible. In working out answers to these questions, it is usually helpful to think in terms of the entire selection process, from beginning to end.

One key consideration is the number of assessment tools to include in the process. Using a variety of assessments tends to improve the validity of the process and will provide information on different aspects of an applicant's likely job performance. Using a single measure will tend to identify applicants who have strengths in a specific area but may overlook applicants who have high potential in other areas. Assessing applicants using multiple methods will reduce errors because people may respond differently to different methods of assessment. For example, some applicants who excel at written tests may be too nervous to do well in interviews, while others who suffer from test anxiety may give impressive interviews. Another advantage of using a variety of assessment methods is a multiple hurdle approach can be taken. The least expensive assessments can be used first to pare down the applicant pool. More labor-intensive and time-consuming procedures can be introduced at a later stage when there are fewer candidates to evaluate.

Considering which assessment methods best measure which competencies at which stage in the process should help you develop a process well suited to your agency's hiring needs.

Ensuring an Effective Assessment Process

Agencies are encouraged to standardize and document the assessment process through the following steps:

  • Treat all individuals consistently. This is most easily accomplished by adopting a standardized assessment and decision-making process. "Standardizing" means making a process uniform to ensure the same information is collected on each individual and is used in a consistent manner in employment decisions.
  • Ensure the selection tool is based on an up-to-date job analysis and is supported by strong validity evidence. A validation study can verify applicants who score well on the selection device are more likely to do well on the job and contribute to organizational success. Agencies not familiar with validation research methodology are encouraged to consult a measurement expert.
  • To ensure applicants perceive the process as fair, agencies are encouraged to:
    1. Offer applicants a realistic job preview before the assessment process
    2. Discuss with applicants the rationale for using the selection device, as well as what it assesses and why these competencies are important to the job
    3. Provide applicants the opportunity to ask questions about the job and the selection process
    4. Treat individuals with respect, sensitivity, and impartiality during the process
    5. Provide feedback about all hiring decisions in a timely and courteous manner
    6. Elicit feedback from applicants (those selected and those not selected) on the selection process
  • Ensure all persons involved in the selection process (e.g., administrators, interviewers, assessors) understand their roles and responsibilities

(Information adapted from Gilliland, S.W., & Cherry, B., 2000). 4

Back to Top

Sources of Additional Information

For a more in-depth introduction to personnel assessment practices, including measurement techniques and related considerations (e.g., reliability, validity, job analysis, and legal requirements), refer to Essentials of Personnel Assessment and Selection by Guion and Highhouse (2006). 5

For a non-technical summary of the research literature on the value of commonly used assessment methods, see Selection Methods: A Guide to Implementing Formal Assessments to Build a High Quality Workforce (Pulakos, 2005). 6

More information about designing and implementing a selection process can be found in Competency-based Recruitment and Selection: A Practical Guide by Wood and Payne (1998).7


1 Biddle, D. (2005). Adverse Impact and Test Validation: A Practitioner's Guide to Valid and Defensible Employment Testing. Burlington, VT: Gower Publishing.

2 Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.

3 Chapman, D. S., & Webster, J. (2003). The use of technologies in the recruiting, screening, and selection processes for job candidates. International Journal of Selection and Assessment, 11, 113-120.

4 Gilliland, S. W., & Cherry, B. (2000). Managing customers of selection. In J. K. Kehoe (Ed.), Managing Selection in Changing Organizations (pp. 158-196). San Francisco: Jossey-Bass.

5 Guion, R. M., & Highhouse, S. (2006). Essentials of Personnel Assessment and Selection. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

6 Pulakos, E. D. (2005). Selection Methods: A Guide to Implementing Formal Assessments to Build a High Quality Workforce. Alexandria, VA: SHRM Foundation.

7 Wood, R., & Payne, T. (1998). Competency-based Recruitment and Selection: A Practical Guide. Hoboken, NJ: Wiley.

Back to Top

Assessment Decision Tool

The Assessment Decision Tool (ADT) is designed to help human resources professionals and hiring supervisors/managers develop assessment strategies for their specific hiring situation (e.g., volume of applicants, level of available resources).

The basic steps are:

  1. Indicate whether you are looking for general assessment information or developing an assessment strategy.
  2. Provide additional information, such as the specific type of assessment method you're interested in, the type of position you are filling, the competencies required for the job, and your hiring situation.
  3. Review and print your summary report.

That's all there is to it! Get started now.

Please Note:

The ADT is located on a different section of OPM's website and will open in a new window.

What Can You Do When Having Questions or Problems?

If you experience any problems while using the ADT or have questions about what to do, please submit a query at the technical support page. If you have questions or comments on the content of the ADT, please send an e-mail to Assessment_Information@opm.gov.

Reference Materials

Control Panel