What is the main purpose of licensure?

Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).

When most individuals hear the terms license and licensure, their first reaction is that these are easily understood and relatively simple words. Everyone knows what these terms mean. Or do they?

What is licensure? It is a multifaceted, complex governmental system of regulation with the stated purpose being public protection. According to Webster's dictionary (Guralnik, 1976), a license is defined as "a formal permission to do something: esp., authorization by law to do some specified thing (license to marry, practice medicine, hunt, etc.)." The term Licensure is then defined to mean "the act or practice of granting licenses, as to practice a profession." Unfortunately, the dictionary definitions encompass a myriad of activities for which the terms license or licensure may be applicable and only serve to further complicate what is meant by these related terms.

Licensure confers upon a licensee the legal authority to practice an occupation or profession I. In 1952, The Council of State Governments defined licensing as:
the granting by some competent authority of a right or permission to carry on a business or do an act which would otherwise be illegal. The essential elements of licensing involve the stipulation of circumstances under which permission to perform an otherwise prohibited activity may be granted-largely a legislative function; and the actual granting of the permission in specific cases- generally an administrative responsibility. (p. 5)

Later, Shimberg and Roederer (1994) rephrased the above definition of professional licensure.

Licensing is a process by which an agency of government grants permission to an individual to engage in a given occupation upon finding that the applicant has attained the minimal degree of competency required to ensure that the public health, safety, and welfare will be reasonably well protected. (p. 1)

Occupational and professional licensure is an activity reserved to each state by the federal constitution; the exercising of a state's inherent police power. Licensure is designed to protect citizens from mental, physical, or economic harm that could be caused by practitioners who may not be sufficiently competent to enter the profession.

Whether licensure is viewed as a privilege or a right, it is to be granted only to individuals who demonstrate to the satisfaction of a state that they possess, at the time of initial licensure, the requisite minimal level of knowledge, skills, and abilities determined necessary to practice competently. Malcolm Parsons (1952) emphasized that permission is the essential element of licensure and that such permission "may be granted or denied, renewed or refused to be renewed, withdrawn temporarily through suspension, or withdrawn altogether through revocation" (p. 4). A license is not unconditionally granted to an individual, but usually for only a finite period of time and can be removed or limited by a state for a number of reasons.

Paradoxically, although freedom is a cornerstone of the Constitution of the United States, licensure imposes considerable restrictions upon an individual's freedom to pursue certain career choices. Once a profession has been legislatively mandated to be licensed, it is illegal for an individual to practice that profession or use a specific title without first obtaining the necessary license. Additionally, in order to obtain a license, an individual must have been successful at meeting a variety of requirements.

Page 2

Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).

INTRODUCTION

Testing programs nearly always need examinations that measure the same thing, but are composed of different questions (i .e., alternate forms of the same test). When different questions are used, however, there is no assurance that scores on the forms are equivalent; different sets of items might be easier or harder and, therefore, produce higher or lower scores. Equating is used to overcome this problem. Simply stated, it is the design and statistical procedure that permits scores on one form of a test to be comparable to scores on an alternate form.

A hypothetical example will help explain why equating is needed. Suppose Fred takes a certifying examination for aspiring baseball umpires. The examination has 100 questions sampled from the domain of questions about baseball rules and regulations. Fred gets 50 questions right and receives a score of 50. Ethel also takes an examination about baseball rules and regulations, but her test is composed of 100 different items. Ethel gets 70 questions right. Does Ethel know more about baseball than Fred? Or, might it be that Fred's test was much more difficult than Ethel's test, and contrary to appearances, Fred knows more about baseball than Ethel? The answers to these questions lie in equating, the process of ensuring that scores from multiple forms of the same test are comparable.

Equating is a technical topic and it generally requires a considerable background in statistics. The goal of this chapter is to provide a helpful and readable introduction to the issues and concepts, while highlighting useful references that will provide technical details. The chapter begins with some general background and then presents common equating designs and an overview of methods and statistical techniques. For the most often used design, the common-item design, discussion will be expanded and examples will be provided. This will be followed by a consideration of factors that affect the precision of equating and an outline of some basic research questions. Finally, examples of currently available software will be inventoried.

BACKGROUND

At the outset it should be noted that the term "equating" implies that scores from different forms of a test will be rendered interchangeable. In fact, few data sets ever meet all of the strict assumptions that lead to interchangeable or equated scores. A more technically correct term would be scaled or comparable scores (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1985). In keeping with this notion, an attempt has been made to use the terms "scaled" or "comparable" scores throughout the chapter.

Reasons for Multiple Forms

There are at least three reasons to have multiple forms of a test. The first is security. Many testing programs administer high-stakes examinations in which performance has an important impact upon the examinee and the public: conferring a license or certificate to practice a profession, permitting admittance to a college or other training program, or granting credit for an educational experience. For a test score to have validity in any of these circumstances, it is crucial that it reflect the uncontaminated knowledge and ability of the examinees. Therefore, security is a concern and it is often desirable to give different forms to examinees seated beside each other, those who take the examination on different days, or those who take the examination on more than one occasion (Petersen, Kolen, & Hoover, 1989).

A second and related reason for different test forms is the current movement to open testing. Many programs find it necessary or desirable to release test items to the public (Holland & Rubin, 1982a). When this occurs, it is not possible to use the released items on future forms of a test without providing examinees an unfair advantage.

A third reason for different forms is that test content, and therefore test questions, by necessity changes gradually over time. Knowledge in virtually all occupations and professions evolves and it is crucial for the test to reflect the current state of practice. For example, it is obvious that today's medical licensure and certification examinations should include questions on HIV and AIDS, whereas these topics were not relevant several years ago. Even when the knowledge does not so obviously change, the context within which test items are presented is at risk of becoming dated. One could imagine a clinical scenario in medicine where descriptions of a patient's condition should be rewritten to include current drugs; in law one might want to include references to timely cases and rulings, especially if they lead to different interpretations of the law. It sometimes happens also that the correct answer to previously used questions simply changes. When this occurs it is necessary to rewrite or replace the item. [As will be discussed later, equating assumes that the test scores are based on parallel forms of the test. Thus, if the changes in content are too severe, it is not appropriate to equate.]

Page 3

Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).

INTRODUCTION

The number of people in the United States who carry some responsibility for the writing of examination questions and the construction of tests is unknown. In the Preface to The Construction and Use of Achievement Examinations, published by the American Council on Education in 1936, the authors indicated that the number probably exceeded a million. That number has certainly grown in the past 60 years. Questions are posed to students by teachers at all levels of education; the Armed Forces have people whose job it is to construct tests which are used in the promotion of personnel; over 1,000 occupations are regulated by the states and many, ranging from the professions to the trades, require licensure or certification (Brinegar, 1990). Many licensure and certification decisions are based on test performance.

Throughout the years, the types of test questions being used have changed, emphasis has changed from performance testing to multiple-choice testing and back to performance assessment. Apprenticeship programs in the trades- a kind of continuous assessment of performance-have been supplemented, or even replaced, by written examinations, or by a combination of written and performance tests. More recently, the use of technology in testing has begun to come into the picture. For example, computer administration of questions, interactive video, and CD-ROM are beginning to be used.

Regardless of the type of test, whether it was written 50 years ago or last week, there are some important concerns. Fundamental among these concerns are the reliability and validity of the measures. The purpose of this chapter is to focus on the psychometric issues of reliability and validity of measures as they pertain to licensure examinations. In addition, the chapter focuses on the relationship of the measures to various guidelines- those of the Equal Employment Opportunity Commission (EEOC, 1975) and The Standards for Educational and Psychological Testing, produced by a joint committee of the American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME) and published by the AP A (1 985). (We will refer to the EEOC document as the EEOC Guidelines and the AERA, APA, and NCME document as the Standards.)

Frequent references are made to the reliability and validity of examinations when, in reality, it is the scores and the decisions made on the basis of the scores that are, or are not, reliable and valid . In the context of licensure, scores are used to make decisions. Statistical analysis may show that the scores possess properties indicative of reliability . Studies may be conducted to show that the measures have some type of validity. However, reliable and valid scores may be used inconsistently or incorrectly, and when this happens, the decisions made on the basis of the scores may not be reliable or valid decisions.

The discussion of reliability and validity in this chapter focuses on the traditional concepts of reliability and validity rather than on a more contemporary approach broadly called generalizability theory. Our reasons for the focus on the more traditional concepts are simply that most licensure and certification programs with which we are familiar have not yet made the transition to generalizability theory as their basic approach to reporting the psychometric characteristics of their tests.

Page 4

Licensure Testing: Purposes, Procedures, and Practices, ed. James C. Impara (Lincoln, NE: Buros Institute of Mental Measurements, University of Nebraska-Lincoln, 1995).

Computerized testing has come out of the laboratory and into the field. By rough estimates, over a million licensure and certification examinations are currently given by computer each year, and the number is rising. Computerized testing is not appropriate for every application, however. Computerized tests always result in significantly greater direct costs than paper-and-pencil tests. To justify their use, a computerized test must result in a net dollar saving. This means that something in the process of computerization must offer a cost reduction that more than offsets the direct cost of computerization. The purpose of this chapter is to identify the areas in which computerization can result in dollar savings and to help the reader determine if, and in what form, computerized testing is appropriate for a specific application.

It may be possible to make the case that a computerized test is useful because it can implement new question types or questioning strategies and thus measure something that cannot be measured by other means. Such an application has yet to be demonstrated in licensing. This chapter will thus ignore this possibility, dealing exclusively with the use of computerization of traditional test questions as a means of saving costs.

SCHEDULING EFFICIENCY-AN OBVIOUS ADVANTAGE

The success of computerized testing in licensure today is due in large part to the scheduling improvements it has offered. Consider a typical paper-and-pencil license testing program: Tests are given every 2 weeks and must be scheduled 2 weeks in advance. Say a candidate decides on October 1 to take a licensure test. The scheduling deadline for the October 14 test has just passed and the first test available is October 28. The candidate takes and fails that test, learns of the failure on November 10, and must reschedule for November 25. A typical computerized testing program is different: Tests are given daily and candidates need to register only one day in advance. Thus, the candidate could fail the first test on October 2, study hard that night, and take the retest on October 3. Assuming the candidate passed the second time in either scenario, the result of computerization would be a time saving of almost 2 months. If passing a test stands between a candidate and a career, a 2-month time saving can be significant.

Why does a computerized testing program offer such scheduling improvements? The direct costs in a testing program can be divided into five categories: (1) registering a candidate to take a test, (2) providing a place for the candidate to take the test, (3) providing a medium on which to present the test, (4) providing someone to proctor the examination, and (5) scoring and reporting the results. An optimal administration design must balance all five of these categories. If the criterion for design is minimal cost, the least expensive combination of elements must be found.

Paper-and-pencil administration offers significant freedom to choose a low cost design. The minimal expense in administration is achieved by requiring the candidate to mail an application and a check (avoiding telephone and credit-card charges), administering the test in idle space that is normally used for other purposes (e.g., Saturday in a high-school cafeteria), presenting the questions on an inexpensive medium (e.g., paper), using part-time personnel earning supplemental (lower wage) income to administer the test, and limiting expensive equipment to a single site (e.g., scoring and reporting results from a central office). The optimal economic design results in the often seen massed administration of paper and pencil tests and 2- to 4-week advance registration requirements.

A computerized testing program has less freedom in design. The media for test presentation are not readily portable; this suggests implementation in a permanent site. The media, as well as the space to store them, are relatively expensive; this suggests that relatively few be used. When the costs of equipment and space are balanced against the cost of proctoring, small, frequent sessions usually result. In its optimal configuration, computerized administration is significantly more expensive than paper-and-pencil administration. Historically, this naturally gave rise to the offering to candidates of improved services such as rapid scheduling and score reporting.

Computerized administration is not essential to achieve the scheduling advantages typically obtained through computerized testing. However, when the design appropriate for computerization (and yielding the scheduling advantages) is applied to paper-and-pencil testing (e.g., small, frequent sessions; rapid scheduling; onsite score reporting), its costs are nearly as great as full computerization. The direct cost of a computer system adequate for implementing multiple-choice licensure tests is only about $300 per testing station per year, which translates to about one dollar per test in a center that gives one test per station per day. Thus, if daily testing is implemented, the additional costs of computerization are small.

Scheduling improvements, from a scientific perspective, are not very interesting. Psychometric journals rarely publish articles documenting the time saved through efficient handling of candidates. As a point of comparison with psychometric savings discussed below, however, remember that the time savings achieved through scheduling improvements are on the order of 1 to 2 months.

Note, however, that these time savings translate into dollar savings only when the time has value. Time typically has great value when a candidate must pass a test to get a license to practice a profession. When the translation is achieved by comparing the earning power of an unemployed individual with that of a licensed individual, the figures are large enough to defy belief. Anecdotal experience suggests that these savings are meaningful to licensure candidates. Time has less value if the candidate can practice the profession on a provisional license while attempting to pass the test. Similarly, time has less value to certification candidates than to license candidates because the connection between having the certification and earning money is less direct. If the decision to computerize a test is based on the improvements possible in scheduling efficiency, it is wise to first verify that the time saved is truly valuable.

Toplist

Latest post

TAGs