Comprehension Testing for OTC Drug Labels: Goals,
Methods, Target Population, and Testing Environment

Louis A. Morris, Karen Lechter, Michael Weintraub, and Debra Bowen

Journal of Public Policy & Marketing 1998 Spring; 17(1): 86-96〕
Rinsho HyokaClinical Evaluation ) 2000; 27(Suppl XIV): 115-125〕


Drug products may be switched from precription (Rx) to over-the-counter (OTC) status if labeling can be written that ensures that the label information is comprehensible to ordinary consumers, including persons with low literacy ability, under normal conditions of purchase and use. The Food and Drug Administration has been working with sponsors to develop methods to test consumer comprehension of proposed OTC product labels. The authors discuss several conceptual and operational elements of comprehension testing, focusing on the goals, methods, appropriate target audience, and testing environment. The authors also examine areas in need of further research and debate. As more complex products are considered for OTC status, it is even more important to ensure that OTC labels are comprehensible. As understanding and the validity of methods to evaluate consumer comprehension improve, so should the quality of labels offered to consumers.


For many products, it is necessary to ensure that labels are understandable so that consumers can use the product safely and effectively. For pharmaceutical products, however, the development of a comprehensible label can influence whether the product may be sold directly to consumers as an over-the-counter (OTC) drug or whether it must be dispensed by a pharmacist as a prescription (Rx) drug. The distinction between these two product categories is based on both the potential for harm and the ability of the consumer to use the product safely and effectively.

  A drug is appropriate for Rx status if, because of the potential for toxicity or other harmful effects, it cannot be used safely unless such use is supervised by a person who is licensed to make diagnostic and usage decisions−usually a physician (Food, Drug and Cosmetic Act, Section 503 (B)). Proper diagnostic and usage decisions are, in turn, based on the adequacy of the product labeling. Both OTC and Rx drugs are deemed misbranded (an illegal act) unless their labeling bears adequate directions for use. However, adequate directions for laypersons might be different than adequate directions for licensed prescribers.

  Adequate directions enable a layperson to use an OTC drug safely for its intended purposes (21 CFR 201.5). For OTC labeling to be clear and truthful (and not false or misleading), it must contain directions, warnings, and information on intended uses and side effects and be presented in such a manner "as to render the label likely to be understood by ordinary consumers, including individuals with low comprehension ability, as assessed under customary conditions of purchase and use" (21 CFR 330.10 (a)(4)(v)).

  As a prectical mattter, most newly approved chemical entities initially are marketed as Rx-only products and usually remain Rx-only products. However, marketers can apply to switch certain products from Rx to OTC status. In recent years, there have been many Rx-to OTC drugs approved, including those that treat heartbum, aid smoking cessation, induce hair restoration, and relieve seasonal nasal allergies (Juhl 1997). To approve a switch, the Food and Drug Administration (FDA) must determine if the product has a favorable benefit-to-risk profile and if it can and will be used safely and effectively at an OTC dosage level in an OTC environment (i.e., without direct supervision of a prescribing health care professional). Recently, the FDA has requested that applicants for new Rx-to-OTC products present evidence of (1) label comprehension and (2) the product's "actual use" in a simulated OTC environment by the intended target population.

  Evidence regarding label comprehension has been provided in the form of label comprehension tests. Recently, Friedman, Romeo, and Hilton (1997) published a label comprehension study that provides an example of this methodology. Comprehension tests often are similar to advertising-copy tests, in particular, to forced-exposure "communication tests" (Stewart 1995). Recruited subjects are shown a proposed OTC drug label or a control label and are asked questions that measure their understanding of the material. Research participants might be asked additional questions about product use or behavioral intentions. Evidence of actual use has been provided in the form of simulation studies, in which recruited subjects are provided an opportunity to purchase and/or use the product and behavioral outcomes are tracked.

  In this article, we discuss several issues involved in testing OTC product labels to determine comprehensibility to "ordinary individuals." The article is divided into four sections. First, we discuss the goals of comprehension testing; second, some general methodological issues; third, the target audience; and fourth, the testing environment and context. In each of these sections, we address conceptual and operational concerns and then review areas in need of further research and debate. Unfortunately, most OTC label comprehension tests are not publicly available. These tests usually are submitted to the FDA in support of a new drug application to switch a product from Rx to OTC status. They are considered trade secrets and are not releasable by the FDA to the public. However, a few examples of specific design considerations or questions posed in these tests have been discussed at FDA advisory committee hearings, and several of these are cited in this article. We address other issues in the development and review of comprehension tests in general terms.

  The FDA has used comprehension-testing results in its reviews of most recent Rx-to-OTC candidate drugs. However, comprehension-testing methodology is still in its earliest stage of development. During this initial phase, the FDA and the nonprescription drug industry have experimented with a variety of testing methods. Results from these tests can help analysts understand consumer's interpretations of labeling information. However, the subjective nature of such testing makes the results particularly sensitive to the reactivity of research mathods (e.g., questionnaire wording) and other biases. In this article, we seek to identify controversies in comprehension testing and thereby encourage debate and futher development of new approaches to this research methodology.

  Scholars in marketing and public policy have debated the theory and practice of copy testing in the development of evidence to support advertising deception cases (Andrews and Maronick 1995; Maronick 1991; Morgan 1990). Much of the purpose and design of comprehension testing is similar to this form of research. However, there are important differences in the goals of comprehension testing, as are described subsequently. In this article, we seek to highlight some of the similarities and differences in these forms of research and thereby ground comprehension testing in an extant literature and foster critical analysis of evolving issues.

Goals of Label Comprehension Testing

Comprehension Sufficiency
Having read an OTC label and presumably having constructed some mental representations (Atman et al. 1994; Bostrom et al. 1994; Payne and Bettman 1992), does the consumer then "comprehend" the presented material? From a purely cognitive perspective, any mental representation is evidence of consumer comprehension. Indeed, advertising-copy tests seek to measure directly what is communicated by an advertisement, particularly those claims that are implied and not directly asserted (Yao and Vecchi 1992).

  If we apply this analysis to comprehension testing, we could conceive of comprehension as what the consumer knows after reading a label. However, the legal concern for product labeling, as opposed to that for advertising deception, is whether the information is comprehensible. Therefore, a more appropriate question for label comprehension is whether consumers can understand the labeling information accurately and sufficiently to use the product safely and effectively. Thus, we conceive of adequate OTC label comprehension as an educational (i.e., does the consumer comprehend the important information presented?), as opposed to a linguistic (i.e., does the consumer correctly decode the message?) or cognitive (i.e., what new mental representations have been formed by the presented material?) outcome (Schellings and Van Hout-Wolters 1995).

  This educational perspective places the burden of comprehension on both the design and the processing of the presented text. It differs from Jacoby and Hoyer's (1987, 1989) concept of comprehension (or miscomprehension), which focuses on whether an extracted meaning of a communication was contained explicitly or implicitly in the communication (i.e., a linguistic outcome). This educational perspective also demands that the important messages be identified and measured by the comprehension test.

Limits of Label Comprehension
Surveys compiled by the Nonprescription Drug Manufacturers Association (1995) show that a high percentage of consumers report reading OTC labels. For example, more than 90% of survey respondents report reading OTC labels before taking their medication (Harris 1991; Heller 1992), 96% report reading the label on their children's medication, more than half state that they always read the label, and almost 90% state that they usually read the label on OTC medication (Princeton Survey Research Associates 1992).

  The large majority of people report reading the labels of their OTC medications. However, even if we ignore issues of respondent recall, we find that these surveys do not provide a clear indication of what people conceive of as label reading.What are the necessary and sufficient behavioral elements of reading for consumers to conclude that they have read an OTC product label? For example, does label reading mean that the consumer visually has (1) examined each of the package panels (directions and warnings customarily are located on the back or side panels but not the front), (2) skimmed each of the sections of the panels, as he or she might skim a print advertisement, or (3) focused on a sufficient number of words for a sufficient length of time to encode an accurate mental representation of the textual information (Rayner and Well 1996)? There is a big difference between "more than 90% report reading a label" and "more than 90% of the label's content is read." Thus, though the majority of consumers report paying some attention to OTC labels, we do not know the level of reading involvement that accompanies customary reading conditions.

  After reading an OTC label, a consumer might retain only a limited amount of the information provided (Mazis and Morris, in press). The label could be considered adequately comprehensible, however, if the consumer retains and infers the most important messages presented in the text. Therefore, regulatory concerns about label comprehensibility can be addressed by examining whether the most important messages, those relevant to the safe and effective use of the product, are retrieved and applied correctly.

Communication Objectives
Before we can design a valid assessment instrument, we must decide what variables are to be measured. The general approach advocated by the FDA is that comprehension assessment should be based on measurement of predefined communication objectives. These are the most important informational elements consumers should know, or behaviors they should follow, to use the drug safely and effectively. Health care and communication professionals at the pharmaceutical company that proposes the switch and at the FDA, which reviews the switch application, decide on these objectives on a case-by-case basis.

  Specific communication objectives (e.g., the consumer should know that the most common side effects are itching and skin irritation) (Lechter 1995c), as opposed to general objectives (e.g., the consumer should know the risks of using the drug), are presumed to lead to better behavioral compliance (Ajzen 1991). They also aid in questionnaire development by enabling the construction of more objective questions. For example, developing a valid set of questions to measure the concept of appropriate usage knowledge requires the selection and validation of a series of test questions that measures this concept (American Psychological Association 1985). A single question, designed to test whether the consumer understands if the drug should be taken on a full or empty stomach, could be scored confidently on the basis of face validity.

Interpreting Test Results
The utility of any comprehension test requires consideration of how the results should be interpreted and applied. If consumers score well on a test, does that mean the label is easy to understand, or does it mean that the test has insufficient discriminate validity (American Psychological Association 1985)? Currently, there are no validity standards for comprehension tests, nor are there consistent criteria for testscore interpretation.

  Sponsors conducting comprehension tests have taken two approaches in interpreting test results. Some have specified minimal success rates prior to testing, in terms of a pass rate (e.g., 80% of subjects will answer correctly all of the questions posed) or in comparison to a control group. Others have used test results as a descriptive measure without prespecifying passing criteria.

  Each of these approaches has strengths and weakness. With a prespecified passing rate, success or failure can be determined easily. However, when companies have the clear incentive of proving that their label is comprehensible, assumptions, biases, and question formats are likely to be weighted heavily in the company's favor, which requires careful scrutiny by FDA reviewers to ascertain whether the presented data represent valid measures of label comprehension.

  Using test data as descriptive measures might reduce the incentives for the sponsor to demonstrate differences between the test and control documents. Furthermore, descriptive measures can provide an analyst with diagnostic information about strengths and weakness in label design, which can lead to label improvements. Several companies have used an iterative testing approach (Lechter 1995b). They redesign the label on the basis of comprehension tests and subject the "improved" label to subsequent testing. Although this approach does not lead to a affirmative conclusion that a label is comprehensible, it can help improve the comprehensibility of OTC labels.

  Without a prespecified pass rate, however, the determinaton of whether a label is sufficiently comprehensible is based on a subjective assessment. Jacoby, Hoyer, and Sheluga (1980) suggest that a miscomprehension index could be used to judge the accuracy of advertising communications. The acceptable level of inaccurate communication would vary depending on the consequences of misleading communications. For products such as OTC drugs, a fairly low percentage of incorrect responses (between 5 and 10%) might be deemed appropriate (Jacoby and Small 1975). On occasion, the FDA and the sponsor have established mutually agreed-upon comprehension-test goals prior to conducting the study. The particular goal is based on the nature of the communication objectives, type of questions (e.g., recall, recognition, tests requiring responses to posed scenarios), number of questions that must be answered correctly (or the particular outcome specified), and safety or efficacy concers about the product. Comprehension-test passing rates in the range of 80% to 95% have been used.

Research Needs
Is the goal of comprehension testing to provide information that can help determine whether a proposed OTC product label is sufficiently comprehensible, or is it to help design the most comprehensible label? Research designs can vary considerably depending on which of these goals is considered when the tests are planned. Both of these goals is considered when the tests are planned. Both of these goals, however, appear necessary. It is important to have a predetermined and empirical basis for deciding whether an OTC product label meets regulatory standards. It is also important that consumers receive the most comprehensible label possible when they purchase OTC products. Perhaps further research could focus on how to combine these testing approaches. Could they be applied in an iterative fashion to address both of these goals?

  Another approach to help interpret test results could be to develop some standard for comparison. With an increasing number of comprehension tests, it might be possible to collect and publish normative data. Could such normative data be used to help interpret individual test results? Would it be possible to standardize test results to make them comparable? For example, all comprehension tasts could include a set of identical, nonspecific questions about consumers' interpretation of the label. These data then could be used to assess comprehension of individual labels. Whether this approach is desirable, or even feasible, awaits additional research.

  Research to help formulate communication objectives also is needed. Specific comprehension objectives provide clear directions for both the design and testing of proposed labels. However, relying solely on face validity does not ensure that the questions measure what they purport to measure. The alternative−insisting on studies to validate customized questionnaires−appears unreasonably excessive. Developing research methods that provide evidence of validity without excessive time or work is warranted. Perhaps multiple assessment scales or certain analytical methods would increase confidence in measurement validity. Intention-to-heed measures generally have not been developed as thoroughly as the behavioral intention measures used in other contexts (see Ajzen 1991: Ajzen and Madden 1986). For example, in a comprehension test for a proposed label for a men's hair-growth product (which is contraindicated for women), respondents were asked: "Based on the label you just read, would you buy this particular product for your personal use?" (Lechter 1997b). We can question the validity of such singular behavioral intention measures. The current lack of evidence that warning labels necessarily result in behavioral change (Stewart and Martin 1994) warrants additional work to develop these types of measures.

Methods of Comprehension Testing

In many ways, comprehension-test methodology is similar to copy testing. The design of comprehension tests often involves trade-offs and requires procedural choices based on resource considerations. It is beyond the scope of this article to review all the aspects of the methodology needed to design reliable comprehension tests. Several reviews of copy-testing methodology provide important insights that can help assess the rigor of comprehension tests (e.g., Andrews and Maronick 1995; Halley and Baldinger 1991). However, building on copy-tasting literature, we find two aspects of comprehension-testing methodology that require particular attention: (1) the wording of questionnaire items and (2) the nature of the control groups.

Questionnaire Wording
As with copy-test methodology, there are considerable trade-offs in selecting open- or closed-ended questions to measure comprehension. Although open-ended questions are less likely to stimulate prompting effects, they might not probe consumers' knowledge sufficiently. Closed -ended questions might provide more precise and sensitive measures of consumers' knowledge, but teasing out the influence of the label from the prompting effect of the question (or the previous questions) remains a problem.

  In studies of advertising deception, the courts have expressed a clear preference for open-ended questions that are free from prompting biases. Closed-ended questions, if used, are included only after the open-ended questions in a funnel sequence (Jacoby and Szybillo 1995; Maronick 1991; Stewart 1995). Open- and closed-ended questions tap different ways in which consumers understand and retrieve information. The preference for open-ended questions has emerged because such questions are less likely to be challenged in a legal proceeding (Sudman 1995).

  In comprehension testing, either type of question may be appropriate, and funnel sequencing from open- to closed-ended seems the preferred method to structure these questions. As is noted previously, analyzing communication objectives for label-usage patterns in which the consumer is apt to consult the label actively can be measured appropriately by closed-ended questions that tap consumer recognition of label information. For conditions in which the label is unlikely to be used as a reference, open-ended questions are appropriate.

  A major difference between copy and comprehension testing is the underlying purpose of the questions posed. Whereas copy tests seek to measure which information is explicitly or implicitly communicated to the consumer by the advertisement, comprehension tests seek to examine if certain predetermined communication elements have been communicated successfully. Thus, more in-depth, probing questions might be needed to identify and solicit responses that more sensitively measure the communication of these elements.

  In addition, following from the educational perspective that underlies comprehension testing, consumer understanding and application of the information could be fair game for comprehension testing. Recently, some studies used scenarios to test consumer application of the label information. For example, in one comprehension test, participants were asked what they would do if they took the drug and noticed certain outcomes (described on the label as side effects). Other studies include scenario questions that assess consumer understanding of product warnings, directions, and dosage calculations. In still other studies, respondents are asked whether they personally could use the product. The response then is analyzed in light of medical information, gathered about the respondent in another section of the questionnaire, that might contraindicate use.

Questionnaire Bias
As it is in copy tests, it is vitally important to distinguish the influence of the label from the biasing effects of question order and phrasing. Leading questions, yea-saying effects, and other biases can invalidate both copy and comprehension tests (Morgan 1990). For example, Lechter (1997a) questioned the validity of a comprehension test for extra-strength ExcedrinR, because the correct answers to the posed questions were invariably simple affirmative responses. However, eliminating all biasing influences from a questionnaire is not feasible. Rather, the FDA's evidentiary standard for "adequate and well-controlled investigations" to support labeling claims appears logically consistent with the judicial standards for a study to have probative value (see 21 CFR 314.126; Morgan 1990).

  Although certain biasing influences can be minimized by careful design, understanding and teasing out the influence of question order and wording can help interpret test fingings. For example, in one comprehension test, consumer understanding of a warning was probed in four separate questions that repeatedly asked for knowledge of that specific message. Not surprisingly, the percentage of consumer who "knew" the warning increased after each question. When measured as a whole, a sizable proportion of the test population knew the particular warning. However, the increasingly specific questions might have stimulated the appropriate response. Therefore, the company did not provide convincing data that consumers nuderstood the warning because it was impossible to separate the influence of the probing questions from that of the label.

  Following Sudman and colleagues (Schwarz and Sudman 1996; Sudman, Brandburn, and Schwarz 1996), we do not conceive of bias as present or absent in a questionnaire. Rather, bias is complex and inherent in the conversational linguistics that are integral to consumer response to questions. Biasing effects result from a variety of factors, and decomposing them entails a multipart and subjective process (Tourangeau and Rasinski 1988).

Control Groups
In experimental research, control groups often are used to isolate, or control for, artifacts that could influence response. Controls, along with random assignment of research participants to groups, help ensure that any observed differences are due to the experimental factors, such as a test label. Control groups also can serve as concurrent comparisons to help interpret observed effects.

  In copy tests, control groups are often vitally important to demonstrate whether an hypothesized message element leads to specific knowledge (Stewart 1995). Creating a control advertisement that isolates and eliminates the possibly offensive material can provide strong evidence that a particular message, contained in the test but not in the control advertisement, is deceptive.

  In comprehension testing, the use of control groups is more problematic. For the most part, companies use control groups as a means of providing overall comparative information about communicated messages rather than attempt to isolate specific communication elements. The choice of a comparator is critical to understanding the comprehensibility of a particular label.

Nonequivalent Comparisons
Some companies use existing OTC labels as controls for comparison purposes. For example, in a study of proposed OTC Nicorette R labeling, the company used the label for Tavist-DR (an OTC antihistamine). This and other, similar studies usually find that proposed labels perform approximately as well as existing labels on comprehension measures. However, the use of a nonequivalent control poses difficult interpretative problems.

  Because two labels are not significantly different does not mean that the labels are equivalent. Lack of difference could be due to insensitive measures, small sample size, or methodological limitations. Consumer motivation for reading the different labels could vary. For example, labels for products of general interest (e.g., pain relievers) might be read with less involvement than products for more serious conditions (e.g., an asthma drug).

  In addition to interpretative problems, there is a public policy concern about the use of nonequivalent controls. Even if two labels (i.e., test and control) are shown to be equivalent using an acceptable methodology, does this mean that the test label is sufficiently comprehensible? Such data demonstrate only that an OTC drug label is not significantly or meaningfully inferior to an existing label. We can make an equity argument for this testing approach (e.g., manufacturers should be held responsible only for demonstrating that their labels are as comprehensible as existing labels). However, as we learn more about how to construct understandable OTC drug labels, there should be an incentive for manufacturers to produce more comprehensible labels. Demonstrating comprehension equivalence likely would lead to stagnation in the comprehension level of OTC labels. Producing more comprehensible labels as we learn more about label comprehension is a more laudable goal.

Equivalent Controls
Some companies use two or more versions, which vary in minor ways, to test a product label (e.g., changed order of information, information presented in a bulleted versus a text format). Often, such test results fail to find significant differences between labels. Such minor variations in label design might not have a major impact on comprehension measures. In addition, if significant differences are found between test and control, we cannot conclude automatically that the variation makes a meaningful difference. The methodological rigor of the test must be reviewed, statistical artifacts ruled out (e.g., alpha inflation due to multiple comparisons), and clinical meaningfulness of the observed difference assessed. Even a small percentage improvement in comprehension rates could mean a large absolute improvement, given the many people who might use an OTC product.

  Some companies use product labels as controls that vary the presentation format. For example, some companies use a text-only, gray version, without discernible headings, bolded typeface, or white space, and compare it with a more graphically pleasing format, with bulleted infromation, clear headings, and ample white space. Research by Lorch and Lorch (1995, 1996) and Lorch, Lorch, and Inman (1993) suggests that the use of graphic devices to signal important information raises the probability that the signaled information will be communicated more fully.

  Although testing a label against such a variation can induce a finding of significant differences, it is unclear how to interpret these fingings. Observed differences between test and control might not be because of improvements in label design for the test label, but because of the obscuring of important information in the control version. Rather than raising the water, the company might have engineered differences by lowering the bridge. A different or better test label, compared with a control on a comprehension measure, does not mean that the test label is comprehensible to a large percentage of consumers.

  Therefore, the use of control labels for comparison purposes can be perceived best as a means of understanding how label variations influence comprehension. The mere existence of a statistical difference between test and control versions of a label is not sufficient to conclude that the label is comprehensible.

Control Questions
Because of the problematic nature of control groups in comprehension testing, some newer submissions concentrate on single-group designs. These studies focus on uncovering specific weaknesses in proposed labels rather than evaluating the comprehensibility of the entire label. This type of diagnostic research uses more in-depth questioning of consumer understanding of specific messages. In this type of research, controlling for response biases when specific questions are answered is an important methodological issue.

  In these examples, the use of control questions to check for the level of acquiescence response bias and control for other false positive responses is vitally important (Jacoby and Szybillo 1995). For example, in a study of a proposed OTC, cholesterol-lowering drug, the parent company sought to measure consumers' understanding of the reasons why they should see their doctor before initiating treatment. Seven choices were given, of which five were correct and two incorrect. Approximately two-thirds to three-quarters of the respondents endorsed the correct responses, whereas aproximately one-sixth to one-third endorsed the incorrect responses (Lechter 1995a). Thus, it appeared that respondents were two to four times more likely to select correct answers to this question, providing some confidence that the label gave the respondents an appreciation of why they should follow the directions.

Analytical Issues
Complex statistics generally have not been needed to understand comprehension-test results. Often the data analyses are of a descriptive nature, as opposed to consisting of hypothesis testing. Significance testing might not be perceived as critical to interpreting fingings. Conversely, the portrayal of results as significant using nontraditional alpha levels (other than .05), or without correcting for the multiplicity of analyses, has been regarded as overly liberal and not helpful for interpretive purposes.

  Studies seeking to demonstrate equivalence with existing labels have undergone particular scrutiny. Data purporting to show equivalence with other labels require not only sufficient power, but also great care in determining inclusion-exclusion criteria, sensitivity of measurement, and attention to other methodological factors (Jones et al. 1996).

Research Needs
Test evaluators often are concerned that open-ended questions do not probe consumers' knowledge adequately, whereas closed-ended questions prompt yea-saying or other response biases. The practice of using both types of questions in an open-then-closed order provides a helpful course of action (Jacoby and Hoyer 1987; Morris, Mazis, and Brinberg 1989). However, teasing out the influence of early questions on later responses remains a difficult dilemma. For example, if a company repeatedly asks a question, funneling from open- to closed-ended questions, most respondents eventually will answer the question correctly. What level of prompting is optimal when measuring comprehension? What type of response indicates comprehension (full-recall, partial-recall, partially prompted [e.g., multiple choice], or fully prompted [e.g., true-false] response)? Research on cognitive responses in survey research could provide general answers to these questions (see Willis 1994). However, more research, specifically targeted at comprehension measurement, is warranted.

  Perhaps a more dramatic need is research that would inform methodologists about the role of control groups. The search for a single, proper comparison stimulus might be a fruitless endeavor. A more meaningful approach might be to search for the necessary and sufficient conditions that indicate that an OTC label is comprehensible. Until we understand these conditions, research that uses control groups will be interpreted in a vacuum. Perhaps the role of control groups will depend on the type of comprehension test used. Pass/fail tests might require the development of a comparison label that reliably can be perceived as comprehensible when measured by the same questionnaire as the test label. Diagnostic copy tests could benefit most from control questions. In this case, the development of thoughtful control questions might help measure the influence of certain response artifacts. However, to develop more specific messages, diagnostic tests might require the development of customized comparison labels. These labels would test alternative messages or designs that are hypothesized to convey communication elements in a more comprehensible fashion. How to design either the pass/fail or the diagnostic control group labels, however, awaits careful research and development. As was discussed previously, iterative testing might help in this regard. However, the cost- effectiveness of such a research program would need to be assessed.

The Target Population

Comprehension testing generally includes samples of consumers who are potential users of the product category. For many OTC drugs, every adult is a potential user. For other OTC drugs, the population might be restricted to certain demographic categories (e.g., gender specifications for RogaineR, which has different diagnostic criteria and packaging for men and women) or by the presence of a medical condition treated by the drug. If the goal of a study is to ensure that consumers who have decided to purchase a product can comprehend the label directions adequately, the sample should be composed of actual or potential product users. However, if the goal is to determine if consumers can self-select whether they are appropriate candidates for the therapy accurately, an actual-use study could be conducted that includes both potential users and nonusers. For example, in testing whether men could choose appropriately between regular Rogaine R and a proposed extrastrength version of the same product, the company selected both current users of the product and nonusers to ensure representation among all potential users (Lechter 1997b).

  The regulatory standard for label comprehension specifies that it be based on samples representative of "ordinary consumers, including individuals with low comprehension ability" (21 CFR 330.10(4)(v)). The sampling procedure that has evolved for copy tests is convenience sampling at shopping malls, usually in at least four geographically dispersed areas (Mazis 1996). Although this sampling method ensures some geographic representativeness, it does not ensure that the sample contains a sufficient number of persons of low comprehension ability. A more sufficient sample combines geographic dispersion with a method to ensure that the sample contains adequate representation of consumers with lower literacy. Furthermore, from a public policy prespective, including lowerliteracy samples in comprehension testing can increase the probability that improved label designs will benefit a larger proportion of the population (Adkins and Ozanne 1997).

Literacy Defined
In 1991, the National Literacy Act (PL 102-73) defined literacy as "an individual's ability to read, write, and speak English, and compute and solve problems at levels of proficiency necessary to function on the job and in society, to achieve one's goals and develop one's knowledge and potential." The definition of literacy was developed further by a panel of experts assembled by the Educational Testing Service in preparation for the National Adult Literacy Survey (NALS) (Kirsch et al. 1993). These experts defined literacy as "using printed and written information to function in society, to achieve one's goals, and to develop one's knowledge and potential" (p. 2).

  The NALS conceptualization and definition of literacy focuses on using information, not simply understanding messages. The NALS task force also endorsed the notion that literacy is an ordered set of skills needed to accomplish a diverse set of tasks. They suggested three broad literacy domains: prose (knowledge and skills to understand and use written text), document (knowledge and skills to locate and use information from forms, maps, tables, graphs, and so forth), and quantitative (knowledge and skills to apply arithmetic operations) literacy.

Literacy Assessment
To conduct the NALS, the panel developed a grading system that operationally defines literacy in each of the domains at five levels. Each successive level requires higher-order information processing skills to complete the tasks rated at that level. For example, for prose literacy:

・Level 1requires completion of simple matching tests for which respondents read a relatively short text and provide a single piece of information that is identical to or synonymous with information given in the question or directive;
・Level 2 requires respondents to locate information in the text, but there are several distractor items in the text;
・Level 3 requires readers to make low-level inferences or integrate information from various sections of the text;
・Level 4 requires more complex integration and wynthesis synthesis of information from more complex and lengthy passages; and
・Level 5 requires an information search from dense text and high-level inferences.

  The NALS provides reliable estimates of the nature and extent of literacy skills in the United States. It was conducted in 1992 with a nationally representative sample of 13,600 persons, age 16 years and older. It used a personal interview, household survey with oversamples of African-American and Hispanic households, and a separate survey of 1147 inmates in 80 federal and state prisons.

  Results of the NALS indicate similar distributions for prose, document, and quantitative literacy. Overall, approximately one-fifth of the sample (21%-23%) was graded at Level 1, approximately one-quarter at Level 2 (25%-28%), approximately one-third at Level 3 (31%-32%), and progressively smaller percentages at Level 4 (15%-17%) and Level 5 (3%-4%). Projfecting these results onto the national population suggests that approximately 40% of the U.S. population (90 of 191 million adults) is functioning with severely restricted literacy (Levels 1 and 2).

  However, the NALS data also indicate that low literacy is caused by a variety of physical and cognitive impairments. Twenty-five percent of the people rated at Level 1 were immigrants who might not have learned yet to speak or read English; approximately one-third were 65 years of age and older; and one-fourth had physical, mental, or health problems that kept them from participating in a full range of activities. Nearly two-thirds (62%) of the people surveyed had not completed high school. For persons at Level 1, simplification of material might have minimal effects because overcoming their literacy impairments would require more than making OTC labels easier to read. Making OTC labels understandable to persons of low comprehension ability appears to be applied more readily to those at Level 2, at which simplification of reading material could influence comprehension directly.

Making Literacy Assessment Operational
The requirements of samples of "ordinary consumers, including consumers of low comprehension ability" aptly defines a universe that fairly represents the U.S. population, with accommodation for persons of lower literacy. Even a probability-based national sample could have difficulty fairly representing such a range if the sample relied only on households with a telephone, noninstitutionalized persons, or otherwise avoided sampling lower literacy persons.

  Most comprehension tests rely on intercept recruitment, usually at geographically dispersed shopping malls. However, the FDA has requested, and the industry has responded with, sampling techniques that ensure the inclusion of research participants with low literacy. These techniques include selecting malls or other sites in locations that serve populations of a low socioeconomic class, recruiting at locations that serve populations of a low socioeconomic class, recruiting at locations that might serve persons with low reading ability (e.g., adult education classes), stratifying or retrospectively analyzing on the basis of highest educational grade completed, quota sampling on the basis of education, or screening and quota sampling on the basis of a literacy assessment test. Such tests include the Rapid Estimate of Adult Literacy in Medicine, a test that asks participants to pronounce 66 progressively more complex medical terms (Davis et al. 1991, 1993), and the Wide Range Achievement Test, a pronunciation test of general vocabulary with several versions (Jastak and Wilkinson 1984). Other less-rapidly administered literacy tests could be used, such as the Test of Health Literacy in Adults, a 50-item reading comprehension test and 17-item numerical ability test (Parker et al. 1995).

Inclusion-Exclusion Criteria
It has become judicial dictum that studies that are part of judicial proceedings must be based on the proper universe is (Maronick 1991). However, identifying what the proper nuiverse is can be controversial. For example, Jacoby and Szybillo (1995) discuss whether the proper universe for copy testing the Kraft Singles advertisement are people who purchase American cheese, individually wrapped American cheese, or any cheese product. The important question is how specifically the test population must match the potential population of users.

  Many label comprehension studied recruit research participants who have the condition treated by the product. For example, in studies of smoking cessation, only current smokers who admitted a desire to stop smoking were included as test subjects (Lechter 1995b). Other studies attempt to be all-inclusive, with minimal inclusion-exclusion criteria. The inclusion criteria for most comprehension tests include the ability to read English, defined both physiologically (i.e., can discem visually the words on the page) and culturally (i.e., can understand the English language). Most studies exclude persons who work for market research companies and health care professionals.

  However, for comprehension testing, not all research participants should be potential users of the product. Persons for whom the product is contraindicated usually are not excluded from study because it is important to test whether they correctly exclude themselves from taking the drug. For some tests, quotas are established or special efforts are made to recruit persons with preexisting conditions. For label comprehension testing, the potential universe should be persons who might be drawn to consider using the product, not only those who actually use it.

Research Needs
Although educational attainment is the most frequently used surrogate measure for literacy, studies do not support the conclusion that education is a valid indicator of reading ability (Sawyer 1991). Early experience with vocabulary pronunciation tests find them easy to administer, and some analyses find them somewhat predictive of literacy (Parker et al. 1995). However, there are few validation studies. Proper pronunciation is one aspect of reading ability, but it is unclear how predictive vocabulary pronunciation is of overall OTC label reading ability. Nor is it clear how domain-specific the vocabulary must be. To predict OTC label reading, must the items tested be common OTC label terms, general health care terms, simple expository prose, or any sample of vocabulary words? Developing measures to assess the validity, reliability, and sensitivity of OTC label reading-ability tests awaits further research. Vocabulary pretests provide an alternmative method for assessing whether the recruited sample contains persons with low comprehension ability. However, their advantages or disadvantages over other measures have not been demonstrated.

Testing Environment
It generally is assumed that knowledge gained from reading a product label will vary with attention devoted to the task. Testing consumers' unuderstanding of a label under customary reading conditions requires consideration of the externmal validity of the testing environment (Cook and Shadish 1994). We seek to simulate customary reading conditions or account for differences between measured comprehension and what would occur under customary reading conditions. Several testing approaches are possible; for example, (1) we could ask research participants to read a proposed label the way they "usually" read such labels; (2) we could attempt to simulate a mundanely realistic enviroment (Aronson and Carlsmith 1968) by, for example, constructing an in-store stimulus display and observing reading behavior; or (3) we could seek to simulate an ecologically valid OTC environment (Aronson and Carlsmith 1968) by, for example, limiting the time that research participants are given to read a label, adding distractor activities, degrading the reading environment (e.g., reducing the lighting available), or otherwise adding constraints.

  These and other manipulations presumably would influence both the motivation and opportunity to read and process OTC labels. The regulatory standard-that labels be comprehensible-implies that the comprehension tests measure consumers' potential to understand the label as opposed to the knowledge gained from a minimally motivated search for label information. If people refuse to read a label, no amount of document simplification will improve comprehension. Therefore, most comprehension tests have asked respondents to read a label as they would if they were purchasing or using a product for the first time. They have not tested the extent to which degraded conditions or reduced motivation influence consumers' willingness or ability to read labels.

  However, this does not mean that other types of simulations have been ignored. For example, when a product primarily provides cosmetic benefits and consumers would not expect serious product warnmings, separate naturalistic simulation studiesd have been conducted to examine the extent to which consumers flip the product over to read the side panels that contain the product warnings.

  Although participants in most label comprehension studies are told to read the label as they usually would, we interpret the findings from such comprehension tests as representing label comprehension under optimal reading conditions. Experimental demand characteristics (Femandez and Turk 1994; Orne 1969), test sensitization (Lana 1969), and other long-understood behavioral research artifacts (Rosenthal and Rosnow 1969) likely would exaggerate research participants' interest and effort expended in reading test labels in the research setting.

Customary Conditions of Use
The experimental procedure used to test label comprehension can influence observed results greatly. As is discussed previously, the regulatory standard of "customary conditons of use" is a guiding principle in the design of comprehension tests. Naturalistic studies, such as observing consumers in actual shopping environments, are impossible because the products are not marketed yet. Simulations can be used, for example, by placing a proposed label on a mocked-up shelf, observing actual behavior, and then testing for comprehension. However, such simulations could be costly and inefficient, because many consumers might not select the test product, and no data on comprehension of the test label would be collected.

  Judicial review standards include review of the research environment (Morgan 1990). Results that might be influenced by the research setting, rather than by the intended stimulus, can be dismissed. For example, Morgan (1990) describes a trademark case in which consumers were asked to examine four brands of teddy bears. The court rejected the contention that the results (that consumers were able to identify the four brands) were evidence of a lack of confusion because consumers were able to examine the brand names that appeared on the product containers. The courts are sensitive to research environments that can generate biased or unreliable results. Furthermore, the courts have held that the experimental evidence must be gathered in circumstances that are reflective (to some degree) of the market conditions in which the products are used. In this case, comprehension tests do not seek to simulate naturalistic conditions but more likely are representative of the best of the ordinary circumstances under which labels are read.

Stimuli Presentation
As is the case with copy tests, the purpose of a comprehension test is to determine consumer response to and interpretation of a presented stimulus. For this purpose, the closer the presented stimuli are to their final form, the more confidence we have that the results will predict actual consumer response. Presenting subjects with complete label mockups, in the form of labeled containers, provides a degree of assurance that the legibility of the study stimuli match actual label characteristics.

Label Status During Test
In most comprehension tests, the participant begins to answer the questionnaire as soon as he or she reads the label. A constant concern is whether the product label should be present during tasting. Some sponsors argue that the label should be available to consumers as a resource, because comprehension could be measured accurately by the consumer's ability to find the correct information on the label. Others argue that when the label information is used (e.g., when consumers are about to drive their car) the label usually is not present, and consumers would need to retrieve relevant information from memory.

  Following the logic discussed previously regarding the use of recall and recognition questions, the FDA has suggested that the decision of whether the label is present should be based on the communication objectives and an analysis of how the information would be used by the consumer in drug usage situations. For certain outcomes (e.g., contraindications describing who should not use the medication), decisions would be made immediately by consumers when reading the label (e.g., if the consumer has high blood pressure and the product is contraindicated for this condition, it likely would be recognized immediately). This suggests a label-present test. However, for directions that must be stored in memory and retrieved when needed (e.g., precautions to avoid sunbathing when taking the product), such issues should be measured in a label-absent test.

Customization
There are many other design considerations in the development of comprehension-testing procedures. For the most part, these tests are customized to account for particular concerns in the testing of individual products. We consistently find that trade-offs are often necessary. For example, to obtain information about the comprehensibility of a package insert, the insert usually is provided after the participant has been questioned about the outer carton label. This procedure could inflate comprehension of the insert because the research participant already would have reviewed the outer carton label and been exposed to questions about the outer label's content (which would be similar to the insert's content).

  Although every study can be criticized, the FDA has found saponsors' explanations for design considerations helpful in understanding why certain approaches have been taken. Explanations that demonstrate that sponsors have made concerted efforts to measure how well their labels are understood, as opposed to those seeking to prove that consumers understand their labels, generally have been more convincing.

Research Needs
The regulatory standard that labels be comprehensible suggests that consumers should be able to understand such labels if they attempt to read them. Therefore, sponsors usually have performed such tests using a forced-exposure method. However, how forced and how exposed have been matters of methodological concern. For example, instructions to read a label "carefully," which suggest that questions about the label content will follow, and other influences, which would lead to increased attention to the label, are problems. Prolonged exposure to the label could prompt greater processing than would occur under natural conditions. Finding a set of directions that reliably simulates naturalistic reading conditions awaits development and careful testing. Alternatively, assessing reading involvement under various sets of directions could provide a helpful road map for test designers and reviewers.

  Other test procedures raise concserns. In most instances, participants are given a label to read in the presence of the interviewer. During this exposure time, the FDA suggests that the interviewer leave the respondent alone to minimize social facilitation influences (Lechter 1996). In actual shopping and use situations, distractions and competing task demands are likely to guide consumers' reading time and effort. Prolonged and uninterrupted exposure during the study does not seem to simulate "customary conditions for use." Again, developing test procedures to simulate naturalistic conditions or developing a set of test conditions and assessing their influence on reading conditions would be helpful.

  Most label comprehension test settings do not try to simulate actual purchase or use conditions. Thus, comprehension-testing results are interpreted more appropriately as measures of whether labels are comprehensible under optimal reading conditions rather than as measures of actual comprehension under ordinary conditions of use. The degree to which comprehension testing varies under optimal and degraded conditions is an important element in developing label assessment testing. Information about the nature and impact of this difference would be important.

Conclusion
As the FDA is asked to consider additional Rx-to-OTC switches, it becomes increasingly important to determine whether consumers will be able to use the products safely and effectively without medical supervision. Developing assessment instruments that can measure consumer comprehension validly is vital to developing product labels that are informative. There are direct pubic health consequences that flow from consumers' ability to understand and follow label directions.

  Recently, Mitchell, Van Bennekom, and Louik (1995) reviewed the effects of a pregnancy-prevention program for patients who were prescribed isotretinoin (AccutaneR). During the evaluation period, the manufacturer redesigned the product labeling to communicate more fully that users needed to conduct a pregnancy test prior to taking the drug, to wait until their next menstrual period beforeto taking the drug, and to use effective birth control for at least one month after taking the drug. The authors found a 10% to 25% increase in reported behavioral compliance, which they attributed to the redesign of the labeling. Although OTC drugs might not present the risks of Rx drugs to the consumer, they are dispensed without the direct supervision of a physician. Therefore, the need for understandable labeling on OTC drugs is imperative.

  Comprehension testing offers the potential to improve the quality of the OTC drug information offered to consumers. However, this methodology is still in its infancy. During this early developmental period, the FDA and sponsors of Rx-to-OTC drugs have examined a wide range of research approaches. As we learn more about procedures to test for consumer comprehension, we hope and expect that the methodology will improve. Thus, the FDA maintains a rolling objective: to improve continually the quality of OTC drug comprehension testing. Newer testing methods are expected to be better than older methods. Testing methods used in the past might not be acceptable as more rigorous research methods have been developed. Furthermore, testing methods that are acceptable currently likely will be replaced by more rigorous methods as the field continues to develop. As testing improves, we hope and expect that the comprehensibility of OTC drug labels also will improve.

  What suggestions can be offered to comprehension-test designers? At this stage in the development of comprehension-testing methodology, we can speculate on four considerations or research approaches for those planning such studies. First, consider the predictive validity of measurements. Better understanding of literacy assessments and intention-to-heed measures is important to understand research results. Although comprehension testing reasonably can use more probing questions than advertising-copy tests, biasing or leading questions than advertising-copy tests, biasing or leading questions must be avoided. One cross-cutting approach is the use of scenario questions, in which research participants are asked to apply label information to answer questions about what they would do under certain circumstances.

  Second, diagnostic tests that seek to understand how to improve labels, as opposed to tests designed to prove a label is understandable, offer more immediate, and perhaps longterm, potential for improving labels. Although there is regulatory oversight of label comprehension tests, they are not necessarily reviewed as part of a confrontational legal proceeding. Mutual cooperation between the FDA and the sponsor to produce the most comprehensible label possible (with some understanding of the overall level of performance) offers the best prospect for improving labels.

  Third, fluidity in the use of controls appears to be the most reasonable course for the test designers. The development of specific control questions to help interpret test results can aid in improving understanding of specific communication elements. The value of various types of control groups must be considered as part of the design of individual tests.

  Fourth, consideration of the testing environment and population is important. However, determining the external validity of various reading instructions and environments could be difficult. Here, more developmental work is needed.

References

Adkins, Natalie R, and Julie L. Ozanne (1997), "Johnny's Mom Can't Read: The Stigma of Low Literacy in the Marketplace," in Marketing and Public Policy Conference, Vol. 7. Chicago: American Marketing Association, 9-10.

Ajzen, Icek (1991), "The Theory of Planned Behavior," Organizational Behavior and Human Decision Processes, 50(2), 179-211.

―――and Thomas J. Madden (1986), "Prediction of Goal-Directed Behavior: Attitudes, Intentions, and Perceived Behavioral Control." Journal of Experimental Social Psychology, 22(5), 453-74.

American Psychological Association (1985), Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.

Andrews, J. Craig and Thomas J. Maronick (1995). "Advertising Research Issues from FTC Versus Stouffer Foods Corporation," Journal of Public Policy & Marketing,. 14(2),. 301-307.

Aronson, Elliot and J. Merrill Carlsmith (1968), "Expereimentation in Social Psychology," in The Handbook of Social Psychology, Vol. 2, G. Lindzey and E. Aronson, eds. Reading, MA: Addison-Wesley Publishing Company, 1-79.

Atman, Cynthia, Ann Bostrom, Baruch Fischhoff, and M. Granger Morgan (1994), "Evaluating Risk Communications: Completing and Correcting Mental Models of Hazardous Processes, PartT," Risk Analysis, 14(5), 779-88.

Bostrom, Ann, Cynthia J. Atman, Baruch Fischhoff, and M. Granger Morgan (1994), "Evaluating Risk Communications: Completing and Correcting Mental Models of Hazardous Processes, PartU," Risk Analysis, 14(5), 789-98.

Cook, Thomas D, and William R. Shadish (1994), "Social Experiments: Some Developments Over the Ppast Fifteen Years," Annual Review of Psychology, 45, 454-80.

Davis, Terry C., Michael A. Crounch, Sandra W. Long, Robert H. Jackson, Pat Bates, Ronald B. George, and Lee E. Bairnsfather (1991), "Rapid Assessment of Literacy Levels in Adult Primary Care Patients," Family Medicine, 23(6), 433-35.

―――, Sandra W. Long, Robert H. Jackson, E.J. Mayeaux, Ronald B. George, Peggy W. Murphy, and Michael A. Croch (1993), "Rapid Estimate of Adult Literacy in Medicine: A Shortened Screening Instrument," Family Medicine, 25(6), 391-95.

Fernandez, Ephrem and Dennis C. Turk (1994), "Demand Characteristics Underlying Differential Ratings of Sensory Versus Affective Components of Pain," Journal of Behavioral Medicine, 17(4), 375-90.

Friedmen, Carola P., Daniel Romeo, and Sandra Smith Hilton (1997), "Healthcare Decisions and Product Labeling: Results of a Consumer Comprehension Study of Prototype Labeling for Proposed Over-the-Counter Cholestyramine," American Journal of Medicine, 102(2A), 50-56.

Haley, Russell I, and Allan Baldinger (1991), "The ARF Copy Research Validity Project," Journal of Advertising Research, 31 (April-May), 11-32.

Harris, Louis (1991), Using Medicine Safely, New York: Louis Harris and Associates.

Heller, Robert (1992), Self-Medication in the '90's: Practices and Perceptions. Washington, DC: Heller Research Group.

Jacoby, Jacob and Wayne D. Hoyer (1987), The Comprehension and Miscomprehension of Print Communications: An Investigation of Mass Media Magazines. New Yorak: The Advertising Educational Foundation Inc.

―――and―――(1989), "The Comprehension / Miscomprehension of Print Communications: Selected Findings," Journal of Consumer Research, 15(March), 434-43.

―――, ―――, and David A. Sheluga (1980), Miscomprehension of Televised Communications. New York: American Association of Advertising Agencies.

―――and Connie B. Small (1975), "The FDA Approach to Defining Misleading Advertising," Journal of Marketing, 39(4), 65-68.

―――and George J. Szybillo (1995), "Consumer Research in FTC Versus Kraft (1991): A Case of Heads We Win, Tails You Lose?" Journal of Public Policy & Marketing, 14(1), 1-15.

Jastak, Sarah and Gary S. Wilkinson (1984), Wide Range Achievement Test-Revised, San Antonio, TX: The Ppsychological Corp.

Jones, B., P. Jarvis, J.A. Lewis, and A.F. Ebbutt (1996), "Trials to Assess Equivalence: The Importance of Rigorous Methods," British Journal of Medicine, 313(July), 36-39.

Juhl, Randy (1997), "The OTC Revolution," Drug Topics, 141(5), 124-29.

Kirsch, Irwin, Ann Jungeblut, Lynn Jenkins, and Andrew Kolstad (1993), Adult Literacy in America: A First Look at the Results of the National Adult Literacy Survey. Washington, DC: National Center for Educational Statistics.

Lana, Robert E. (1969), "Pretest Sensitization," in Artifact in Behavioral Research, R. Rosenthal and R.L. Rosnow, eds. New York: Academic Press, 121-46.

Lechter, Karen (1995a), "Review of the QuestranR Label Comprehension Study," presentation at the Joint Nonprescription Drug and Metabolic and Endocrine Drug Advisory Committee Meeting, Bethesda, MD, (September 27).

―――(1995b), "Review of the NicoretteR Label Comprehension Study," presentation at the Joint Nonprescription Drug and Drug Abuse Advisory Committee Meeting, Bethesda, MD, (September 28).

―――(1995c), "Review of the RogaineR Label Comprehension Study," presentation at the Joint Nonprescription Drug and Dermatologic/Ophthalimic Drug Advisory Committee Meeting, Bethesda, MD, (November 17).

―――(1996), "OTC Label Comprehension Testing," presentation at the Nonprescription Drug Manufacturers' Association Meeting, Washington, DC, (November 12).

―――(1997a), "Review of the ExcedrinR Label Comprehension Study," presentation at the Joint Nonprescription Drug and Arthritis Drug Advisory Committee Meeting, Bethesda, MD, (July 15).

―――(1997b), "Review of the RogaineR Extra Strength Label Comprehension Study," presentation at the Joint Nonprescription Drug and Dermatologic/Ophthalmic Drug Advisory Committee Meeting, Bethesda, MD, (November 16).

Lorch, Robert F., Jr. and Elizabeth Pugles Lorch (1995), "Effects of Organizational Signals on Text-Processing Strategies," Journal of Educational Psychology, 87(4), 537-44.

―――and―――(1996), "Effects of Organizational Signals on Free Recall of Expository Text," Journal of Educational Psychology, 88(1), 38-48.

―――, ―――, and W. Elliot Inman (1993), "Effects of Signaling Topic Structure on Text Recall," Journal of Educational Psychology, 85(2), 281-90.

Maronick, Thomas J. (1991), "Copy Tests in FTC Deception Cases: Guidelines for Researchers," Journal of Advertising Research, 31(6), 9-17.

Mazis, M. (1996), "Copy-Testing Issues in FTC Advertising Cases," in Marketing and Public Policy Conference Proceedings, Ronald Paul Hill and Charles Ray Taylor, eds. Chicago: American Marketing Association, 122-30.

―――and Louis A. Morris (in press), "Channel Communications Issues," in Warning and Risk Communication, Michael Wogalter, ed. New York: Raven Press.

Mitchell, Allen A., Carla M. Van Bennekom, and Carol Louik (1995), "A Pregnancy-Prevention Program in Women of Childbearing Age Receiving Isotretinoin," New England Journal of Medicine, 333(2), 101-106.

Morgan, Fred W. (1990), "Judicial Standards for Survey Research: An Update and Guidelines," Journal of Marketing, 54(January), 59-70.

Morris, Louis A., Michael Mazis, and David Brinberg (1989), "Risk Disclosures in Televised Prescription Drug Advertising Directed to Consumers," Journal of Public Policy & Marketing, 8, 64-80.

Nonprescription Drug Manufacturers Association (1995), Comments in Response to FDA Hearing, Docket No. 95N-0259 [60 Fed. Reg. 42578].

Orne, Martin T. (1969), "Demand Characteristics and the Concept of Quasi-Controls," in Artifact in Behavioral Research, R. Rosentha and R.L. Rosnow, eds. New York: Academic Press, 147-81.

Parker, Ruth M., David W. Baker, Mark V. Williams, and Joanne R. Nurss (1995), "The Test of Functional Health Literacy in Adults: A New Instrument for Measuring Patients' Literacy Skills," Journal of General Internal Medicine, 10(October), 537-41.

Payne, John W. and James R. Bettman (1992), "Behavioral Decision Research: A Cognitive Processing Perspective," Annual Review of Psychology, 43, 87-131.

Princeton Survey Research Associates (1992), Using Medicines Safely, Princeton, NJ: Princeton Survey Research Associates.

Rayner, Keith and Arnold D. Well (1996), "Effects of Contextual Constraint on Eye Movements in Reading: A Further Examination," Psychonomic Bulletin and Review, 3(4), 504-509.

Rosenthal, Robert and Ralph L. Rosnow (1969), Artifact in Behavioral Research. New York: Academic Press.

Sawyer, Mary H. (1991), "A Review of Research in Revising Instructional Text," Journal of Reading Behavior, 23(3), 307-33.

Schellings, Gonny L.M and Bernadette H.A.M. Van Hout-Wolters (1995), "Main Points in an Instructional Text, as Identified by Students and by Their Teachers," Reading Research Quarterly, 30(4), 742-56.

Schwarz, Norbert and S. Sudman (1996), Answering Questions. San Francisco: Jossey-Bass.

Stewart, David W. (1995), "Deception, Materiality, and Survey Research: Some Lessons from Kraft," Journal of Public Policy & Marketing , 14(2), 15-29.

―――and Ingrid M . Martin (1994), "Intended and Unintended Consequences of Warningn Messages: A Review and Synthesis of Empirical Research." Journal of Public Policy & Marketing, 13(1), 1-19.

Sudman, Seymour (1995), "When Experts Disagree: Comments on the Articles by Jacoby and Szybillo and Stewart," Journal of Public Policy & Marketing, 14(1), 29-34.

―――, Norman M. Bradburn, and Norbert Schwarz (1996), Thinking About Answers, San Francisco: Jossey-Bass.

Tourangeau, Roger and Kenneth A. Rasinski (1988), "Cognitive Processes Underlying Context Effects in Attitude Measurement," Psychological Bulletin, 103(3), 299-314.

Willis, Gordon B. (1994), "Cognitive Interviewing and Questionnaire Design: A Training Manual," Working Paper Series No. 7. Atlanta, GA: Centers for Disease Control and Prevention.

Yao, Dennis A. and Christa Van Anh Vecchi (1992), "Information and Decisionmaking at the Federal Trade Commission," Journal of Public Policy & Marketing, 11(2), 1-8.

LOUIS A. MORRIS is Senior Vice President of PRR Inc. KAREN LECHTER, MICHAEL WEINTRAUB, and DEBRA BOWEN are from the Division of Drug Marketing, Advertising and Communications (HFD-40) and the Office of Drug Evaluation V (HFD-500), Food and Drug Administration (FDA). Early drafts of this article were written while the first author was employed by the FDA. The authors thank their colleagues at the FDA who provided helpful comments on the form and substance of this article. The views expressed are solely those of the authors and do not reflect the policy of the FDA.

Back to Rinsho Hyoka(Clinical Evaluation) home page
Back to Contents of Vol. 27 Suppl XIV