|
Stats
What is construct validity? (March 8, 2006).
Someone asked me to define face validity, criterion validity, and construct validity.
That's a tall order. In general, validity means that a measurement that we take represents
what we think it should. This is important to establish, because many times we think we are
measuring one thing, but we are measuring something else entirely. It is important to
remember that validity is a journey and not a goal. You never reach a place called the land
of valid measurements. Instead, you gradually strengthen the evidence for validity, but there
is no threshold that you cross where you can say, "We can now conclude that the measure is
valid." Similarly, there is no region we can point to where we can say with confidence "We
have not yet reached the point where we can say that the measure is valid."
Let me tackle the last definition first. Construct validity is the degree to which a
direct measurement represents an unobserved social construct. To establish construct
validity, you demonstrate that the measure changes in a logical way when other conditions
change. For example, the study described below, faculty viewed videotapes
of "standardized residents" who depicted either unsatisfactory,
marginal/satisfactory, or high satisfactory/superior performance. The
ratings given by the miniCEX reflected the actual performance levels,
demonstrating validity with the standardized resident construct.
- Construct Validity of the MiniClinical Evaluation Exercise (MiniCEX). E. S. Holmboe,
S. Huot, J. Chung, J. Norcini, R. E. Hawkins. Acad Med 2003: 78(8); 826-830.
[Medline]
[Abstract] [Full
text] [PDF] PURPOSE: To
investigate the construct validity of the miniclinical evaluation exercise (miniCEX). METHOD:
Forty faculty participants from 16 internal medicine residency programs enrolled in a
randomized, controlled trial of faculty development. Using a standard nine-point miniCEX
rating form, participants watched and rated performances of standardized residents on nine
scripted clinical videotapes depicting three levels of performance (unsatisfactory,
marginal/satisfactory, and high satisfactory/superior). The nine-point rating scale was 1-3 =
unsatisfactory, 4-6 = marginal/satisfactory, and 7-9 = superior. The performances were rated
for three clinical skills, history taking, physical examination, and counseling. RESULTS: For
each of the three clinical skills, the faculty participants were able to successfully
discriminate among the three levels of performance using the miniCEX scale. Differences among
ratings of the three performance levels were statistically significant; however, the range in
ratings among the participants for each videotape was wide. CONCLUSION: The authors believe
this to be the first study to document the construct validity of the miniCEX. Although the
miniCEX appears to have reliability and construct validity, further research is needed to
improve individual faculty observation skills and reduce interrater variability.
Further reading
- Examination of instruments used to rate quality of health information on the internet:
chronicle of a voyage with an unclear destination. A. Gagliardi, A. R. Jadad. Bmj 2002:
324(7337); 569-73.
[Medline] [Abstract]
[Full text]
[PDF] OBJECTIVE: This study updates work
published in 1998, which found that of 47 rating instruments appearing on websites offering
health information, 14 described how they were developed, five provided instructions for use,
and none reported the interobserver reliability and construct validity of the measurements.
DESIGN: All rating instrument sites noted in the original study were visited to ascertain
whether they were still operating. New rating instruments were identified by duplicating and
enhancing the comprehensive search of the internet and the medical and information science
literature used in the previous study. Eligible instruments were evaluated as in the original
study. RESULTS: 98 instruments used to assess the quality of websites in the past five years
were identified. Many of the rating instruments identified in the original study were no
longer available. Of 51 newly identified rating instruments, only five provided some
information by which they could be evaluated. As with the six sites identified in the
original study that remained available, none of these five instruments seemed to have been
validated. CONCLUSIONS: Many incompletely developed rating instruments continue to appear on
websites providing health information, even when the organisations that gave rise to those
instruments no longer exist. Many researchers, organisations, and website developers are
exploring alternative ways of helping people to find and use high quality information
available on the internet. Whether they are needed or sustainable and whether they make a
difference remain to be shown.
- Rating health information on the Internet: navigating to knowledge or to Babel? A.
R. Jadad, A. Gagliardi. Jama 1998: 279(8); 611-4.
[Medline] [Abstract]
[Full text]
[PDF] (Evidence,
Mountain, Measurement Quality, Reliability/Validity) CONTEXT: The rapid growth of the
Internet has triggered an information revolution of unprecedented magnitude. Despite its
obvious benefits, the increase in the availability of information could also result in many
potentially harmful effects on both consumers and health professionals who do not use it
appropriately. OBJECTIVES: To identify instruments used to rate Web sites providing health
information on the Internet, rate criteria used by them, establish the degree of validation
of the instruments, and provide future directions for research in this area. DATA SOURCES:
MEDLINE (1966-1997), CINHAL (1982-1997), HEALTH (1975-1997), Information Science Abstracts
(1966 to September 1995), Library and Information Science Abstracts (1969-1995), and Library
Literature (1984-1996); the search engines Lycos, Excite, Open Text, Yahoo, HotBot, Infoseek,
and Magellan; Internet discussion lists; meeting proceedings; multiple Web pages; and
reference lists. INSTRUMENT SELECTION: Instruments used at least once to rate the quality of
Web sites providing health information with their rating criteria available on the Internet.
DATA EXTRACTION: The name of the developing organization, Internet address, rating criteria,
information on the development of the instrument, number and background of people generating
the assessments, and data on the validity and reliability of the measurements. DATA
SYNTHESIS: A total of 47 rating instruments were identified. Fourteen provided a description
of the criteria used to produce the ratings, and 5 of these provided instructions for their
use. None of the instruments identified provided information on the interobserver reliability
and construct validity of the measurements. CONCLUSIONS: Many incompletely developed
instruments to evaluate health information exist on the Internet. It is unclear, however,
whether they should exist in the first place, whether they measure what they claim to
measure, or whether they lead to more good than harm.
- Evidence for the Factorial and Construct Validity of a Self-Report Concussion Symptoms
Scale. S. G. Piland, R. W. Motl, M. S. Ferrara, C. L. Peterson. J Athl Train 2003: 38(2);
104-112.
[Medline]
[Abstract]
[Full text]
[PDF]
OBJECTIVE: To evaluate the factorial and construct validity of the Head Injury Scale (HIS)
among a sample of male and female collegiate athletes. DESIGN AND SETTING: Using a
cross-sectional design, we established the factorial validity of the HIS scale with
confirmatory factor analysis and the construct validity of the HIS with Pearson product
moment correlation analyses. Using an experimental design, we compared scores on the HIS
between concussed and nonconcussed groups with a 2 (groups) x 5 (time) mixed-model analysis
of variance. SUBJECTS: Participants (N = 279) in the cross-sectional analyses were
predominately male (n = 223) collegiate athletes with a mean age of 19.49 +/- 1.63 years.
Participants (N = 33) in the experimental analyses were concussed (n = 17) and nonconcussed
control (n = 16) collegiate athletes with a mean age of 19.76 +/- 1.49 years. MEASUREMENTS:
All participants completed baseline measures for the 16-item HIS, neuropsychological testing
battery, and posturography. Concussed individuals and paired controls were evaluated on days
1, 2, 3, and 10 postinjury on the same testing battery. RESULTS: Confirmatory factor analysis
indicated that a theoretically derived, 3-factor model provided a good but not excellent fit
to the 16-item HIS. Hence, the 16-item HIS was modified on the basis of substantive arguments
about item-content validity. The subsequent analysis indicated that the 3-factor model
provided an excellent fit to the modified 9-item HIS. The 3 factors were best described by a
single second-order factor: concussion symptoms. Scores from the 16-item HIS and 9-item HIS
were strongly correlated, but there were few significant correlations between HIS scores and
scores from the neuropsychological and balance measures. A significant group-by-day
interaction was noted on both the 9-item HIS and 16-item HIS, with significant differences
seen between groups on days 1 and 2 postconcussion. CONCLUSIONS: We provide evidence for the
factorial and construct validity of the HIS among collegiate athletes. This scale might aid
in return-to-play decisions by physicians and athletic trainers.
- A systematic review of the content of critical appraisal tools. P. Katrak, A. E.
Bialocerkowski, N. Massy-Westropp, S. Kumar, K. A. Grimmer. BMC Med Res Methodol 2004: 4(1);
22.
[Medline] [Abstract]
[Full text]
[PDF] BACKGROUND: Consumers of research (researchers, administrators,
educators and clinicians) frequently use standard critical appraisal tools to evaluate the
quality of published research reports. However, there is no consensus regarding the most
appropriate critical appraisal tool for allied health research. We summarized the content,
intent, construction and psychometric properties of published, currently available critical
appraisal tools to identify common elements and their relevance to allied health research.
METHODS: A systematic review was undertaken of 121 published critical appraisal tools sourced
from 108 papers located on electronic databases and the Internet. The tools were classified
according to the study design for which they were intended. Their items were then classified
into one of 12 criteria based on their intent. Commonly occurring items were identified. The
empirical basis for construction of the tool, the method by which overall quality of the
study was established, the psychometric properties of the critical appraisal tools and
whether guidelines were provided for their use were also recorded. RESULTS: Eighty-seven
percent of critical appraisal tools were specific to a research design, with most tools
having been developed for experimental studies. There was considerable variability in items
contained in the critical appraisal tools. Twelve percent of available tools were developed
using specified empirical research. Forty-nine percent of the critical appraisal tools
summarized the quality appraisal into a numeric summary score. Few critical appraisal tools
had documented evidence of validity of their items, or reliability of use. Guidelines
regarding administration of the tools were provided in 43% of cases. CONCLUSIONS: There was
considerable variability in intent, components, construction and psychometric properties of
published critical appraisal tools for research reports. There is no "gold standard' critical
appraisal tool for any study design, nor is there any widely accepted generic tool that can
be applied equally well across study types. No tool was specific to allied health research
requirements. Thus interpretation of critical appraisal of research reports currently needs
to be considered in light of the properties and intent of the critical appraisal tool chosen
for the task.
- The measurement and monitoring
of surgical adverse events [PDF]. J. Bruce, E. M. Russell, J. Mollison, Z. H.
Krukowski. Accessed on 2003-08-15. [Excerpt] BACKGROUND:
Surgical adverse events contribute significantly to postoperative morbidity, yet the
measurement and monitoring of events is often imprecise and of uncertain validity. Given the
trend of decreasing length of hospital stay and the increase in use of innovative surgical
techniques--particularly minimally invasive and endoscopic procedures--accurate measurement
and monitoring of adverse events is crucial. OBJECTIVES: The aim of this methodological
review was to identify a selection of common and potentially avoidable surgical adverse
events and to assess whether they could be reliably and validly measured, to review methods
for monitoring their occurrence and to identify examples of effective monitoring systems for
selected events. This review is a comprehensive attempt to examine the quality of the
definition, measurement, reporting and monitoring of selected events that are known to cause
significant postoperative morbidity and mortality. METHODS - SELECTION OF SURGICAL ADVERSE
EVENTS: Four adverse events were selected on the basis of their frequency of occurrence and
likelihood of evidence of measurement and monitoring: (1) surgical wound infection; (2)
anastomotic leak; (3) deep vein thrombosis (DVT); (4) surgical mortality. Surgical wound
infection and DVT are common events that cause significant postoperative morbidity.
Anastomotic leak is a less common event, but risk of fatality is associated with delay in
recognition, detection and investigation. Surgical mortality was selected because of the
effort known to have been invested in developing systems for monitoring surgical death, both
in the UK and internationally. Systems for monitoring surgical wound infection were also
included in the review. METHODS - LITERATURE SEARCH: Thirty separate, systematic literature
searches of core health and biomedical bibliographic databases (MEDLINE, EMBASE, CINAHL,
HealthSTAR and the Cochrane Library) were conducted. The reference lists of retrieved
articles were reviewed to locate additional articles. A matrix was developed whereby
different literature and study designs were reviewed for each of the surgical adverse events.
Each article eligible for inclusion was independently reviewed by two assessors. METHODS -
CRITICAL APPRAISAL: Studies were appraised according to predetermined assessment criteria.
Definitions and grading scales were assessed for: content, criterion and construct validity;
repeatability; reproducibility; and practicality (surgical wound infection and anastomotic
leak). Monitoring systems for surgical wound infection and surgical mortality were assessed
on the following criteria: (1) coverage of the system; (2) whether or not denominator data
were collected; (3) whether standard and agreed definitions were used; (4) inclusion of risk
adjustment; (5) issues related to data collection; (6) postdischarge surveillance; (7) output
in terms of feedback and wider dissemination. RESULTS - SURGICAL WOUND INFECTION: A total of
41 different definitions and 13 grading scales of surgical wound infection were identified
from 82 studies. Definitions of surgical wound infection varied from presence of pus to
complex definitions such as those proposed by the Centres for Disease Control in the USA. A
small body of literature has been published on the content, criterion and construct validity
of different definitions, and comparisons have been made against wound assessment scales and
multidimensional indices. There are examples of comprehensive hospital-based monitoring
systems of surgical wound infection, mainly under the auspices of nosocomial surveillance. To
date, however, there is little evidence of systematic measurement and monitoring of surgical
wound infection after hospital discharge. RESULTS - ANASTOMOTIC LEAK: Over 40 definitions of
anastomotic leak were extracted from 107 studies of upper gastrointestinal,
hepatopancreaticobiliary and lower gastrointestinal surgery. No formal evaluations were found
that assessed the validity or reliability of definitions or severity scales of anastomotic
leak. One definition was proposed during a national consensus workshop, but no evidence of
its use was found in the surgical literature. The lack of a single definition or gold
standard hampers comparison of postoperative anastomotic leak rates between studies and
institutions. RESULTS - DEEP VEIN THROMBOSIS: Although a critical review of the DVT
literature could not be completed within the realms of this review, it was evident that a
number of new techniques for the detection and diagnosis of DVT have emerged in the last 20
years. The group recommends a separate review be undertaken of the different diagnostic tests
to detect DVT. RESULTS - SURGICAL MORTALITY MONITORING SYSTEMS: The definition of surgical
mortality is relatively consistent between monitoring systems, but duration of follow-up of
death postdischarge varies considerably. The majority of systems report in-hospital mortality
rates; only some have the potential to link deaths to national death registers. Risk
assessment is an important factor and there should be a distinction between recording
pre-intervention factors and postoperative complications. A variety of risk scoring systems
was identified in the review. Factors associated with accurate and complete data collection
include the employment of local, dedicated personnel, simple and structured prompts to ensure
that clinical input is complete, and accurate and automated data capture and transfer.
CONCLUSIONS: The use of standardised, valid and reliable definitions is fundamental to the
accurate measurement and monitoring of surgical adverse events. This review found
inconsistency in the quality of reporting of postoperative adverse events, limiting accurate
comparison of rates over time and between institutions. The duration of follow-up for
individual events will vary according to their natural history and epidemiology. Although
risk-adjusted aggregated rates can act as screening or warning systems for adverse events,
attribution of whether events are avoidable or preventable will invariably require further
investigation at the level of the individual, unit or department. CONCLUSIONS -
RECOMMENDATIONS FOR RESEARCH: (1) A single, standard definition of surgical wound infection
is needed so that comparisons over time and between departments and institutions are valid,
accurate and useful. Surgeons and other healthcare professionals should consider adopting the
1992 Centers for Disease Control (CDC) definition for superficial incisional, deep incisional
and organ/space surgical site infection for hospital monitoring programmes and surgical
audits. There is a need for further methodological research into the performance of the CDC
definition in the UK setting. (2) There is a need to formally assess the reliability of
self-diagnosis of surgical wound infection by patients. (3) There is a need to assess
formally the reliability of case ascertainment by infection control staff. (4) Work is needed
to create and agree a standard, valid and reliable definition of anastomotic leak which is
acceptable to surgeons. (5) A systematic review is needed of the different diagnostic tests
for the diagnosis of DVT. (6) The following variables should be considered in any future DVT
review: anatomical region (lower limb, upper limb, pelvis); patient presentation
(symptomatic, asymptomatic); outcome of diagnostic test (successfully completed,
inconclusive, technically inadequate, negative); length of follow-up; cost of test; whether
or not serial screening was conducted; and recording of laboratory cut-off values for
fibrinogen equivalent units. (7) A critical review is needed of the surgical risk scoring
used in monitoring systems. (8) In the absence of automated linkage there is a need to
explore the benefits and costs of monitoring in primary care. (9) The growing potential for
automated linkage of data from different sources (including primary care, the private sector
and death registers) needs to be explored as a means of improving the ascertainment of
surgical complications, including death. This linkage needs to be within the terms of data
protection, privacy and human rights legislation. (10) A review is needed of the extent of
the use and efficiency of routine hospital data versus special collections or voluntary
reporting. www.ncchta.org/fullmono/mon522.pdf
-
- Validation of an Index of the Qualtiy of Review Articles. Andrew D. Oxman, Gordon
H. Guyatt. Journal of Clinical Epidemiology 1991: 44(11); 1271-1278. ABSTRACT: The objective of this study was to assess the validity of an
index of the scientific quality of research overviews, the Overview Quality Assessment
Questionnaire (OQAQ). Thirty-six published review articles were assessed by 9 judges using
the OQAQ. Authors reports of what they had done were compared to OQAQ ratings. The
sensibility of the OQAQ was assessed using a 13 item questionnaire. Seven a priori hypotheses
were used to assess construct validity. The review articles were drawn from three sampling
frames: articles highly rated by criteria external to the study, meta-analyses, and a broad
spectrum of medical journals. Three categories of judges were used to assess the articles:
research assistants, clinicians with research training and experts in research methodology,
with 3 judges in each category. The sensibility of the index was assessed by 15 randomly
selected faculty members of the Department of Clinical Epidemiology and Biostatistics at
McMaster. Authors' reports of their methods related closely to ratings from corresponding
OQAQ items: for each criterion, the mean score was significantly higher for articles for
which the authors responses indicated that they had used more rigorous methods. For 10 of the
13 questions used to assess sensibility the mean rating was 5 or greater, indicating general
satisfaction with the instrument. The primary shortcoming noted was the need for judgement in
applying the index. Six of the 7 hypotheses used to test construct validity held true. The
OQAQ is a valid measure of the quality of research overviews.
- Construct Validity in Psychological Tests. Lee J. Cronbach. Psychological Bulletin
1955: 52; 281-302. [Excerpt] Validation of
psychological tests has not yet been adequately conceptualized, as the APA Committee on
Psychological Tests learned when it undertook (1950-54) to specify what qualities should be
investigated before a test is published. In order to make coherent recommendations the
Committee found it necessary to distinguish four types of validity, established by different
types of research and requiring different interpretation. The chief innovation in the
Committee's report was the term construct validity.[2] This idea was first formulated by a
subcommittee (Meehl and R. C. Challman) studying how proposed recommendations would apply to
projective techniques, and later modified and clarified by the entire Committee (Bordin,
Challman, Conrad, Humphreys, Super, and the present writers). The statements agreed upon by
the Committee (and by committees of two other associations) were published in the Technical
Recommendations (59). The present interpretation of construct validity is not "official" and
deals with some areas where the Committee would probably not be unanimous. The present
writers are solely responsible for this attempt to explain the concept and elaborate its
implications.
- Depth of sedation in children undergoing computed tomography: validity and reliability
of the University of Michigan Sedation Scale (UMSS). S. Malviya, T. Voepel-Lewis, A. R.
Tait, S. Merkel, K. Tremper, N. Naughton. Br J Anaesth 2002: 88(2); 241-5.
[Medline] BACKGROUND: Safe care of sedated
children requires ongoing assessment of the depth of sedation to permit early recognition of
progression to over-sedation. This study evaluated the validity and reliability of the
University of Michigan Sedation Scale (UMSS) as a measure of sedation during procedures. The
UMSS is a simple observational tool that assesses the level of alertness on a five-point
scale ranging from 1 (wide awake) to 5 (unarousable with deep stimulation). METHODS:
Thirty-two children aged 4 months to 5 yr (mean 1.5 yr), sedated for computed tomography
(CT), were studied prospectively. The CT nurse assessed sedation using the UMSS before
sedative administration and every 10 min thereafter. The child was videotaped during each
assessment, and segments were edited and their order was randomized. Four nurses blinded to
sedative administration viewed the segments and scored sedation using the UMSS. One of these
nurses also scored sedation using a visual analogue scale (VAS) and another using the
Observer's Assessment of Alertness/Sedation Scale (OAAS). To examine the test-retest
reliability, 75 randomly selected video segments were viewed and scored on a second occasion.
RESULTS: Changes in scores from baseline to discharge supported construct validity
(P<0.0001). Criterion validity was demonstrated by significant correlations between the UMSS
and the VAS and OAAS. There was good interobserver agreement between blinded observers'
scores for each level of sedation and at discharge, and between blinded observers and the CT
nurse for scores of 0 and 1 (lighter levels of sedation), but less agreement for scores 2 and
3 (deeper sedation) and discharge scores. Test-retest reliability was supported by agreement
in the observers' UMSS scores. CONCLUSION: The UMSS is a simple, valid and reliable tool that
facilitates rapid and frequent assessment and documentation of depth of sedation in children.
- A patient survey system to measure quality improvement: questionnaire reliability and
validity. R. G. Carey, J. H. Seibert. Med Care 1993: 31(9); 834-45. This study describes the results of a four-year research effort
to develop inpatient and outpatient questionnaires that have sufficient validity and
reliability to be used to measure patient perceptions of quality. As part of this effort,
over 50,000 inpatients, emergency room patients, and ambulatory surgery patients from over
300 hospitals representing every US census region were surveyed. Separate questionnaires,
called Quality of Care Monitors, were developed for inpatients and outpatients. The inpatient
questionnaire consisted of 8 scales: Physician Care, Nursing Care, Medical Outcome, Courtesy,
Food Service, Comfort and Cleanliness, Admissions/Billing, and Religious Care. The outpatient
questionnaire had 7 scales: Physician Care, Nursing Care, Medical Outcome, Facility
Characteristics, Waiting Time, Testing Services and Registration Process. The study found
strong evidence of construct validity, predictive validity, and internal consistency for both
questionnaires. Each questionnaire is capable of measuring separate dimensions of patient
experience. A data bank developed from these questionnaires is currently accessed regularly
by participating hospitals to assess quality improvement and to make benchmark comparisons
with similar hospitals.
- A Plethora of
Threats: A Mildly Amusing Guide for the Weary Student and Anyone Else Encountering the How
To's and What If's of Construct Validity.. Nicole M. Driebe. Accessed on
2003-09-17. [Excerpt] Warning: This web page may
cause severe gastrointestinal disorders, bloodshot eyes and various other stress-related
pains -- particularly for those who are just about to engage in their thesis research (and
thought they had thought of everything!). Anyone planning on finishing graduate school in
less than 10 years should consult Dr. Daniels (Jack, of course) before reading further.Also
note: The events and characters portrayed here are purely fictional. If anyone or any
situation resemble you or your own situation in any way -- join the club.
trochim.human.cornell.edu/tutorial/driebe/tweb1.htm
-
- Reliability and validity of the Children's Health Survey for Asthma. L. Asmussen,
L. M. Olson, E. N. Grant, J. Fagan, K. B. Weiss. Pediatrics 1999: 104(6); e71.
[Medline] OBJECTIVE: Describe the psychometric
properties of the Children's Health Survey for Asthma (CHSA)- a condition-specific,
self-report, functional health measure for parents of children 5 to 12 years of age with
chronic asthma. METHOD: Data from two cross-sectional and one longitudinal study were used to
assess internal consistency reliability, test-retest reliability, and validity of the CHSA.
Over 275 parents and guardians of children with asthma completed the CHSA in one of three
studies. The combined samples included a heterogenous mix of respondents by child age and
race/ethnicity and parental marital and socioeconomic status. Five domain scores were
computed: physical health, activity (child), activity (family), emotional health (child), and
emotional health (family). Raw scale scores were transformed from 0 to 100 with higher scores
indicating better or more positive outcomes. RESULTS: Across the three samples, mean scale
scores ranged from a low of 61.5 (emotional health of the child) to a high of 86.1 (activity
[family]). Internal consistency reliability for each of the scales was high (Cronbach's alpha
=.81-. 92), and test-retest reliability (correlation between forms) ranged from.62 to.86.
Significant differences in mean scores for four of five scales were noted between those with
low versus moderate to high recent symptom activity. CONCLUSION: In three tests, the CHSA
displays strong reliability and validity. Descriptive statistics demonstrate a range of scale
scores. Internal consistency is good to excellent and short-term test-retest reliability is
good for each of the five scales. Construct validity is demonstrated by the ability of CHSA
to distinguish levels of disease severity, defined by symptom activity.
- Reliability and validity of the Women's Health Initiative Insomnia Rating Scale. D.
W. Levine, D. F. Kripke, R. M. Kaplan, M. A. Lewis, M. J. Naughton, D. J. Bowen, S. A.
Shumaker. Psychol Assess 2003: 15(2); 137-48.
[Medline] Reliability and construct validity
of the 5-item Women's Health Initiative Insomnia Rating Scale (WHIIRS) were evaluated in 2
studies. In Study 1, using a sample of 66,269 postmenopausal women, validity of the WHIIRS
was assessed by examining its relationship to other measures known to be related to sleep
quality. Reliability of the WHIIRS was estimated using a resampling approach; the mean alpha
coefficient was.78. Test-retest reliability coefficients were.96 for same-day administration
and.66 after a year or more. Correlations of the WHIIRS with the other measures were in the
predicted directions. Study 2 used a sample of 459 women and compared the WHIIRS with
objective indicators of sleep quality. Results showed that differences in the objective
indicators could be detected by the WHIIRS. Findings suggest that a between-group mean
difference of approximately 0.50 of a standard deviation on the WHIIRS may be clinically
meaningful.
- Validation of 2 pain scales for use in the pediatric emergency department. B.
Bulloch, M. Tenenbein. Pediatrics 2002: 110(3); e33.
[Medline] OBJECTIVE: To determine the construct,
content, and convergent validity of 2 self-report pain scales for use in the untrained child
in the emergency department (ED). METHODS: A prospective study was conducted of all children
who presented to an urban ED between 5 and 16 years of age inclusive after written informed
consent was obtained. Children were excluded if they were intoxicated, had altered sensorium,
were clinically unstable, did not speak English, or had developmental delays. Children marked
their current pain severity on a standardized Color Analog Scale (CAS) and a 7-point Faces
Pain Scale (FPS). They were then asked whether their pain was mild, moderate, or severe.
Children were then administered an analgesic at the discretion of the attending physician and
asked to repeat these measurements. For assessing content validity, the scales were also
administered to age- and gender-matched children in the ED for nonpainful conditions.
Convergent validity was assessed by determining the Spearman correlation coefficient between
the 2 pain scales. RESULTS: A total of 60 children were enrolled, 30 with pain and 30
without, with a mean age of 9.3 +/- 3.3 years. Boys accounted for 38 of the enrollees
(63.3%). The median score before analgesic administration was 6.0 cm (interquartile range [IQR]:
4.0-8.0) on the CAS and 3.0 faces (IQR: 2.0-5.0) on the FPS; after analgesic administration,
the median scores decreased to 3.1 cm (IQR: 1.1-4.3) and 2.0 faces (IQR: 1.0-3.0),
respectively. As the reported pain intensity increased, so did the scores on the 2 pain
scales. The 30 children with no pain had a median score on the CAS of 0.0 (IQR: 0.0-1.0) and
on the FPS of 0.0 (IQR: 0.0-1.0), whereas the 13 children with severe pain had a median CAS
of 7.0 (IQR: 6.0-8.0) and a median FPS of 5.0 (IQR: 4.0-6.0). The Spearman correlation
coefficient between the CAS and the FPS was positive and strong (r = 0.894). CONCLUSION: The
CAS and the FPS exhibit construct, content, and convergent validity in the measurement of
acute pain in children in the ED.
- Cross-validation of a composite pain scale for preschool children within 24 hours of
surgery. S. Suraseranivongse, U. Santawat, K. Kraiprasit, S. Petcharatana, S. Prakkamodom,
N. Muntraporn. Br J Anaesth 2001: 87(3); 400-5.
[Medline] [Abstract]
[Full text]
[PDF] This study was designed to cross-validate a composite
measure of the pain scales CHEOPS (Children's Hospital of Eastern Ontario Pain Scale), OPS
(Objective Pain Scale, simplified for parent use by replacing blood pressure measurement with
observation of body language or posture), TPPPS (Toddler Preschool Postoperative Pain Scale)
and FLACC (Face, Legs, Activity, Cry, Consolability) in 167 Thai children aged 1-5.5 yr. The
pain scales were translated and tested for content, construct and concurrent validity,
including inter-rater and intra-rater reliabilities. Discriminative validity in immediate and
persistent pain for the age groups < or =3 and >3 yr were also studied. The children's
behaviour was videotaped before and after surgery, before analgesia had been given in the
post-anaesthesia care unit (PACU), and on the ward. Four observers then rated pain behaviour
from rearranged videotapes. The decision to treat pain was based on routine practice and was
made by a researcher unaware of the rating procedure. All tools had acceptable content
validity and excellent inter-rater and intra-rater reliabilities (intraclass correlation >0.9
and >0.8 respectively). Construct validity was determined by the ability to differentiate the
group with no pain before surgery and a high pain level after surgery, before analgesia
(P<0.001). The positive correlations among all scales in the PACU and on the ward
(r=0.621-0.827, P<0.0001) supported concurrent validity. Use of the kappa statistic indicated
that CHEOPS yielded the best agreement with the routine decision to treat pain. The younger
and older age groups both yielded very good agreement in the PACU but only moderate agreement
on the ward. On the basis of data from this study, we recommend CHEOPS as a valid, reliable
and practical tool.
07/08/2008.
Category: Measuring agreement
CMH Employees
|