Stats #45: So you want to write a questionnaire
Content: This class will introduce you to the statistical issues important in developing a survey or questionnaire.This class is useful for anyone who participates in the planning of research study that uses a survey or questionnaire to collect data. There are no pre-requisites for this class.
Teaching strategies: Didactic lectures and small group exercises.
Objectives: In this class you will learn how to:
- identify various research designs and their limitations;
- recognize factors that influence the sample size of a study;
- assess how restrictions on your sample can hamper generalizability; and
- identify ethical issues associated with randomization and blinding.
This class qualifies for 3 IRB Education Credits (IRBECs).
Contents
- Overview of the STATS web pages
- Consulting services that I provide
- So you want to write a questionnaire
- Privacy concerns in research
- Please fill out an evaluation form
In addition, you will receive the following publications:
Selecting, designing, and developing your questionnaire. Boynton PM, Greenhalgh T. Bmj 2004: 328(7451); 1312-5. [PDF]
IRB Frequently Asked Questions (FAQs). American Association for Public Opinion Research. Accessed on 2004-12-24. www.aapor.org/default.asp?page=survey_methods/IRBS_faq
Overview of the STATS web pages (January 21, 2000)
What are the STATS web pages?
The STATS pages are a collection of handouts that I use in my job as a statistical consultant. The web provides a nice home for these handouts, because as I update my material, the newest version is immediately available to anyone who is interested.
Where can I find STATS?
If you have a web browser, like Internet Explorer or Netscape Navigator, you can surf on over to my site,
which is also found at http://internet1/stats, if you are attached to the Children's Mercy Hospital network. There are two obsolete sites: http://www.cmh.edu/stats and http://simon/stats. Do not use either of these sites.
Some of the fun stuff you can find on the STATS web pages.
Ask Professor Mean. For the tough Statistics questions that Dear Abby won't touch.
Planning Your Research Study. Things you need to plan for before you start collecting your data.
Selecting An Appropriate Sample Size. How much data do you really need?
Managing Your Research Data. Everything you want to know before you step to the keyboard.
Steps In a Typical Data Analysis. I have my data on the computer. Now what?
How to Read a Medical Journal Article. Reading a journal is hard work. Here's some help.
Professor Mean's Library. Good books and good web sites about Statistics.
... and even more good stuff!!!
This webpage was written, edited by Linda Foland, and was last modified on 07/08/2008. . Category: Website details
For CMH employees only: Statistical Consulting Services.
You can get free statistical consulting if you work for Children's Mercy Hospital. Ashley Sherman provide a wide range of statistical consulting services to help you with your research projects. This help can start as early as the initial planning of your research. I also help with the analysis of your data, using SPSS or other statistical software. We can also provide assistance with the preparation of your presentations and publications.
Here area some examples of the services that we have provided:
- setting up your research hypothesis,
- selecting and justifying your sample size,
- writing the statistical methods section for your grant,
- preparing randomization tables for your study,
- reviewing your surveys for content and quality,
- developing a system for entering your data,
- choosing an appropriate statistical model for your data,
- establishing validity and/or reliability for your measurement scales,
- checking for violations of statistical assumptions in your data,
- producing graphs and tables for your research publication, and
- providing references for new and unusual statistical methods.
Specific statistical advice has been outlined on a series of web pages which can be found at http://www.childrensmercy.org/stats/. The pages provide advice about planning your research, selecting an appropriate sample size, managing your research data, performing a variety of data analyses, presenting research data, and writing research papers.
This webpage was written on 2003-04-30 and was last modified on 2008-07-08. Category: Professional details
Directions to my new office (April 25, 2008).
I have moved to a new office. It is a modular building just north of Children's Mercy Hospital. It is between 23rd and 22nd street, just off of Kenwood Avenue (Kenwood is a small north/south street just west of Holmes). If you need to get from your office to mine, here are some directions written by my Administrative Assistant, Judy Champion.
- Take the elevator of the research tower down to the yellow level. Exit the employee parking garage on 23rd Street, walk to Kenwood and cross 23rd Street. Your destination is Building M 3 which is the building closest to 22nd Street. However, the entrance to our building faces Building M 2. It's best to walk into the parking area that is just north of Building M 1 and follow the sidewalk around the west side of building M 2 in order to get to our building's entrance on its south side. Another route would be to exit the Hospital Hill Center Building on Holmes and then walk ' block north to 23rd Street, cross 23rd Street, walk west to Kenwood then north to building M 3 address 2220 Kenwood.
2008-07-14. Send Category: Professional details
So you want to write a questionnaire (July 12, 2002)
Dear Professor Mean, I need to write a questionnaire for a research study I am conducting. Can you help me write it? -- Cautious Carmen
Dear Cautious,
If I wrote the questionnaire, it would include a bad joke on every page.
Short answer
You need to think about several issues while writing a questionnaire:
- What is the purpose of your questionnaire?
- What level of anonymity can you provide?
- How will you minimize non response?
- Are you asking the right questions?
Also make sure that you do some pilot testing of your questionnaire.
I'll presume, for the most part, that you are giving a questionnaires to individual patients, but you can also send them out to groups and organizations as well.
What is the purpose of your questionnaire?
You need to identify the purpose of your survey. Are you trying to identify
- attitudes
- needs
- behavior
- demographics
or some combination of the above.
Who are you sampling from and who do you want to generalize these results to. Will you make any extrapolations from this survey?
Will you collect the data
- through a web page
- through postal mail
- through FAX
- through email
- through the phone
- through a face to face interview
Have you considered a focus group as an alternative method for data collection?
What level of anonymity can you provide?
You should provide the greatest degree of anonymity possible and you should inform your patients what level of anonymity you can provide.
For may questionnaires, you will publish only aggregate results; individual responses will not be reported. If you have to link the questionnaire data to a medical record or other source of information, then you need to inform your patients.
Sometimes you need to track the survey in order to find out who to send a follow up reminder notice to. This could be done with the use of a code number on the survey, but if you can, you should assure the patient that this code will not be used beyond the use of reminder notices.
I am starting to write up a web page about privacy concerns in research.
You need to identify any sensitive questions. Some examples include
- genetic information
- information about mental illnesses
- information about sexual attitudes, preferences or practices
- information on the use or abuse of alcohol and other drugs
- information on illegal activities
What is sensitive may depend on what group you are asking. Questions about smoking and alcohol consumption might be more threatening, if your population is a group of pregnant women.
Ask yourself if the disclosure of this information might embarrass or harm the respondent. If it can, then you need to take special precautions.
How will you minimize non response?
Some of the people you send your questionnaire to will not receive it. Some of them will not return the questionnaire. And those who do respond, may not respond to all the questions. All of this can cause a serious bias in your data analysis.
Think first about motivation. Why would anyone take the time to fill out and return your questionnaire? You need to give them some incentive.
- You might include something of value with your survey such as cash or a gift certificate. Your budget probably can't afford a large incentive, but a large incentive might be considered coercive anyway.
- Sometimes people are motivated by altruism, so you should explain how your questionnaire will help make the world a better place. Make sure that your patients see a link between this questionnaire and something that is important to them.
- Curiosity can also motivate; consider offering a summary of your research findings after the questionnaires have been analyzed.
Also be sure to avoid common demotivators.
- Don't give your patients an undue work burden with an overly long and complex questionnaire.
- Don't ask for information that you don't need or which you already have.
- Don't make your patients pay for a stamp or a long distance phone call.
- If you are sending a survey to an organization, don't send it to the wrong department and certainly don't send it to a general address with the hope that it will find its way to the right person.
If possible, use follow up reminders by phone, email, or postal mail to those who do not respond by the deadline. These reminders can sometimes raise concerns about anonymity, so be careful about this. Use coded numbers on the surveys to track who has responded and let them know that the link between the codes and any personal identifiers will be destroyed.
The best way to minimize the number of non respondents is to make the survey clean, simple, and easy to respond to.
- Most of us have limited attention spans. Be brief.
- Most of us are easily confused and befuddled. Ask one question at a time. If you are using conjunctions (and/or) in a question, try splitting it into two more more simpler questions.
- Most of us have dreadful memories. Minimize the amount of recollection that your patients have to do. Don't ask for exact numbers when a range will do.
- Most of us do not handle abstractions well. Try to ask questions about tangible items and give examples. Avoid questions about concepts that are not encountered in daily living.
- Most of us are impatient. Ask questions that your patients can answer rapidly and without much mental effort. Avoid questions that involve arithmetic computations, such as adding up several sources of income. Avoid questions that involve ranking or selecting preferences from a long list.
Finally, be sure choose an appropriate language level. For many questionnaires, you should write at a fifth grade reading level.
In spite of all this some people will not respond. If you can get some abbreviated information from them, such as demographics or their reasons for not participating, that might help. It might determine a good statistical adjustment for your data. Even if you can't adjust for it, this information might help you determine the direction and severity of any bias caused by non response.
Are you asking the right questions?
Use standardized questions and scales whenever you can. These standards were developed and tested over a long period of time, so you know how they will behave. By using standardized questions, you also make it easier for anyone who might incorporate your research into a systematic review or meta-analysis.
A good example of a standardized scale is the Burns Anxiety Inventory. This is a series of 33 questions about anxious feelings, anxious thoughts, and physical symptoms. Here are six of the items:
- Feeling that things around you are strange, unreal or foggy.
- Apprehension or a sense of impending doom.
- Racing thoughts or having your mind jump from one thing to the next.
- Feeling that you're on the verge of losing control.
- Butterflies or discomfort in the stomach.
- Tight, tense muscles.
By asking a wide range of questions about anxiety, you are helping to get an accurate assessment of anxiety, especially for those patients who might show anxiety in some ways but not in others.
Lack of standards can cause problems. Jadad and Gagliardi (1998) criticize scales used to rate web sites providing health information. There were too many of them, most of them did not present any justification for their
When you are using categories, use the same categories that others use. Birthweights, for example, as classified as low (LBW) if less than 2500 grams, very low (VLBW) if less than 1500 grams, and extremely low (ELBW) if less than 1000 grams.
Running a pilot study of the questionnaire
Pilot test your questionnaire.
- What are you thinking?
- Remember to read aloud for me.
- Can you tell me more about that?
- Could you describe that for me?
- Remember to tell me what you are doing.
-- Dillman, page 143.
Here are some other issues to examine during a pilot.
- Were any items skipped frequently?
- Were any items answered incorrectly or ambiguously?
- Were any items redundant (no variation, or perfect correlation with another item?)
- Should you add extra categories to certain questions?
Try to estimate the resources you need to conduct this questionnaire.
Summary
[This section is not yet available.]
Thank you for filling out this survey. We don't have enough money to include a pre-printed envelope. Be sure to use extra postage, since the survey weighs more than one ounce. We're not sure how we will use this data and maybe we'll disclose this information to other researchers. This survey hasn't had any pilot testing, so if we goof up badly, you might have to fill out a better one later. No one has used the survey form before, so we're not sure if we'll find out anything interesting.
Further reading
- Survey Research Methods Second Edition. Earl Babbie (1990) Belmont, California: Wadsworth Publishing Company.
- Examination of a survey methodology. Dillman's Total Design Method. FE Crosby, MR Ventura, MJ Feldman. Nurs Res 1989: 3856-58.
- Mail and Internet Surveys: The Tailored Design Method. Don A. Dillman (2000) Canada: John Wiley & Sons, Inc.
- Mail and telephone surveys: the total design method. DA Dillman. New York et al.: Jon Wiley & Sons 1978:
- How Surveys Answer A Key Question: Are Consumers Satisfied With Managed Care?. Karen Donelan. Accessed on 2003-10-20. www.managedcaremag.com/archiveMC/9602/MC9602.survey.shtml
- Survey Research Methods Second Edition. Floyd J. Jr. Fowler (1993) Newbury Park, CA: Sage Publications, Inc.
- Improving Survey Questions: Design and Evaluation. Floyd J. Jr. Fowler (1995) Thousand Oaks, CA: Sage Publications, Inc.
- A Brief Guide to Questionnaire Development. Robert Frary. Accessed on 2001-01-04. www.testscoring.vt.edu/fraryquest.html
- Survey research. JA Krosnick. Annu Rev Psychol 1999: 50537-67. [Abstract]
- Research Resources. Laurier Institute for the Study of Public Opinion and Policy, Laurier Institute for the Study of Public Opinion and Policy. Accessed on 2003-10-20. www.wlu.ca/lispop/lispop.htm
- How to Measure Survey Reliability and Validity. Mark S. Litwin (1995) Thousands Oaks, CA: Sage Publications.
- How to Conduct your Own Survey. Priscilla Salant, Don A. Dillman (1994) Toronto: John Wiley & Sons, Inc.
- Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. Howard Schuman, Stanley Presser (1996) Thousand Oaks, CA: Sage Publications.
- Health Measurement Scales A Practical Guide to Their Development and Use. David L. Streiner, Geoffrey R. Norman (1989) New York: Oxford University Press, Inc.
- Standard Definitions: Final Dispostions of Case Codes and Outcomes Rates for Surveys. Mischael W. Traugott, Murray Edelman, Warren J. Mitofsky, The American Association for Public Opinion Research. Accessed on 2000-www.aapor.org/default.asp?page=survey_methods/standards_and_best_practices/standard_definitions
- The Survey Research Handbook Second Edition Guidelines and Strategies for Conducting a Survey. Alreck PL, Settle, Robert B. (1995) Chicago, IL: Irwin Professional Publishing.
- PEDAKSI: methodology for collecting data about survey non-respondents [pdf]. Lynn PJ, Institute for Social & Economic Research. Accessed on 2005-04-06. www.iser.essex.ac.uk/pubs/workpaps/pdf/2002-05.pdf
- Separating Refusal Bias and Non-Contact Bias: Evidence from UK National Surveys [pdf]. Lynn PJ, Clarke P, Institute for Social & Economic Research, Working Paper 2001-24 (November 2001). Accessed on 2005-04-06. www.iser.essex.ac.uk/pubs/workpaps/pdf/2001-24.pdf
- Brochures about Survey Research. ASA Series: What is a Survey? American Statistical Association Survey Research Methods Section. http://www.amstat.org/sections/SRMS/whatsurvey.html
- Further reading -- Empirical evidence of response bias
- Effect of UK national guidelines on services to treat patients with acute low back pain: follow up questionnaire survey. A. G. Barnett, M. R. Underwood, M. R. Vickers. British Medical Journal 1999: 318(7188); 919-20. [Medline] [Full text] [PDF]
- Mortality and cancer rates in nonrespondents to a prospective study of older women: 5-year follow-up. K. M. Bisgard, A. R. Folsom, C. P. Hong, T. A. Sellers. American Journal of Epidemiology 1994: 139(10); 990-1000. [Medline]
- Characteristics of non-responders and the impact of non-response on prevalence estimates of dementia. F. Boersma, J. A. Eefsting, W. van den Brink, W. van Tilburg. International Journal of Epidemiology 1997: 26(5); 1055-62. [Medline]
- Non-response bias in a lifestyle survey. A. Hill, J. Roberts, P. Ewings, D. Gunnell. J Public Health Med 1997: 19(2); 203-7.
- The Tromso Heart Study: responders and non-responders to a health questionnaire, do they differ? B. K. Jacobsen, D. S. Thelle. Scand J Soc Med 1988: 16(2); 101-4.
- Do safety practices differ between responders and non-responders to a safety questionnaire? D. Kendrick, R. Hapgood, P. Marsh. Injury Prevention 2001: 7(2); 100-3. [Medline] [Abstract] [Full text] [PDF]
- Nonresponse bias in a national study of dentists' infection control practices and attitudes related to HIV. G. M. McCarthy, J. K. MacDonald. Community Dent Oral Epidemiol 1997: 25(4); 319-23.
- Comparison of early and late respondents to a postal health survey questionnaire. A. Paganini-Hill, G. Hsu, A. Chao, R. K. Ross. Epidemiology 1993: 4(4); 375-9.
- Quality of response in different population groups in mail and telephone surveys. J. Siemiatycki, S. Campbell, L. Richardson, D. Aubert. Am J Epidemiol 1984: 120(2); 302-14.
- Representativeness and response rates from the Domestic/International Gastroenterology Surveillance Study (DIGEST). J. G. Tijssen. Scand J Gastroenterol Suppl 1999: 23115-9. [Medline]
- What are the characteristics of general practitioners who routinely do not return postal questionnaires: a cross sectional study. N. Stocks, D. Gunnell. J Epidemiol Community Health 2000: 54(12); 940-1. [Medline]
- Further reading -- Ambiguous Questions
- Would you say you "had sex" if...? S. A. Sanders, J. M. Reinisch. Jama 1999: 281(3); 275-7.
- Collection of Race and Ethnicity Data in Clinical Trials. U.S. Food and Drug Administration. Accessed on 2003-02-25. www.fda.gov/cber/gdlns/racethclin.htm
- Further reading -- Format of Your Survey
- Different response rates in a trial of two envelope styles in mail survey research. D. A. Asch, N. A. Christakis. Epidemiology 1994: 5(3); 364-5. [Medline]
- A comparison of responses to mailed questionnaires and telephone interviews in a mixed mode health survey. D. J. Brambilla, S. M. McKinlay. American Journal of Epidemiology 1987: 126(5); 962-71. [Medline]
- Increasing response rates to postal questionnaires: systematic review. Phil Edwards, Ian Roberts, Mike Clarke, Carolyn DiGuiseppi, Sarah Pratap, Reinhard Wentz, Irene Kwan. BMJ 2002: 324(7347); 1183-. [Abstract] [Full text] [PDF]
- Measuring later health status of high risk infants: randomised comparison of two simple methods of data collection. D. Field, E. S. Draper, M. J. Gompels, C. Green, A. Johnson, D. Shortland, M. Blair, B. Manktelow, C. R. Lamming, C. Law. British Medical Journal 2001: 323(7324); 1276-81.
- Increasing response rates for mailed surveys of Medicaid clients and other low-income populations. P. J. Gibson, T. D. Koepsell, P. Diehr, C. Hale. Am J Epidemiol 1999: 149(11); 1057-62.
- Do postage-stamps increase response rates to postal surveys? A randomized controlled trial. R. A. Harrison, D. Holt, P. J. Elton. Int J Epidemiol 2002: 31(4); 872-4. [Medline]
- A comparison on nonresponse in mail, telephone, and face-to-face surveys. J. J. Hox, D De Leeuw. Quality and Quantity 1994: 28(4); 329-344.
- Does length of questionnaire matter? A randomised trial of response rates to a mailed questionnaire. C. Iglesias, D. Torgerson. J Health Serv Res Policy 2000: 5(4); 219-21. [Medline]
- Improving the measurement of quality of life in older people: the York SF-12. C.P. Iglesias, Y.F. Birks, D.J. Torgerson. QJM 2001: 94(12); 695-698. [Abstract]
- Increasing response rates to postal questionnaires. Cynthia P Iglesias, Yvonne F Birks, David J Torgerson, Paula-J Roberts, Chris Roberts, Bonnie Sibbald. BMJ 2002: 325(7361); 444-. [Full text]
- Response rate according to title and length of questionnaire. E. Lund, I. T. Gram. Scand J Soc Med 1998: 26(2); 154-60.
- Comparability of telephone and household breast cancer screening surveys with differing response rates. R. M. Mickey, J. K. Worden, P. M. Vacek, J. M. Skelly, M. C. Costanza. Epidemiology 1994: 5(4); 462-5.
- Methods for the design and administration of web-based surveys. T. K. Schleyer, J. L. Forrest. J Am Med Inform Assoc 2000: 7(4); 416-25. [Medline] [Abstract] [Full text] [PDF]
- Understanding Implementation. The mechanics of polling.. The Statistical Assessment Service. Accessed on 2003-10-20. www.stats.org/record.jsp?type=news&ID=378
- Question Time. The Statistical Assessment Service. Accessed on 2003-10-20. www.stats.org/record.jsp?type=news&ID=382
- Response to mail surveys: effect of a request to explain refusal to participate. The ARIC Study Investigators. E. Shahar, K. M. Bisgard, A. R. Folsom. Epidemiology 1993: 4(5); 480-2.
- A comparison of mail, telephone, and home interview strategies for household health surveys. J. Siemiatycki. Am J Public Health 1979: 69(3); 238-45.
- Nonresponse bias and early versus all responders in mail and telephone surveys. J. Siemiatycki, S. Campbell. Am J Epidemiol 1984: 120(2); 291-301.
- Improving the response rates to questionnaires. Liam Smeeth, Astrid E Fletcher. BMJ 2002: 324(7347); 1168-1169. [Medline] [Full text] [PDF]
- Increasing response rates in telephone surveys: a randomized trial. W. Smith, T. Chey, B. Jalaludin, G. Salkeld, T. Capon. J Public Health Med 1995: 17(1); 33-8.
- Using the Visual Analog Scale. Chad Starkey, Pete Koehneke, Daniel Sedory, Paula Turocy. Accessed on 2003-06-23. www.cewl.com/clined/acpm/app_c.html
- Is Shorter Always Better? Relative Importance of Questionnaire Length and Cognitive Ease on Response Rates and Data Quality for Two Dietary Questionnaires. Amy F. Subar, Regina G. Ziegler, Frances E. Thompson, Christine Cole Johnson, Joel L. Weissfeld, Douglas Reding, Katherine H. Kavounis, Richard B. Hayes. Am. J. Epidemiol. 2001: 153(4); 404-409.
- Comparative Response to a Survey Executed by Post, E-mail, & Web Form. Gi Woong Yun, Craig W. Trumbo. JCMC 2000: 6(1); [Full text]
- Refusal and information bias associated with postal questionnaires and face-to-face interviews in very elderly subjects. R. Hebert, G. Bravo, N. Korner-Bitensky, L. Voyer. J Clin Epidemiol 1996: 49(3); 373-81.
- Further reading -- Fraud
- Interviewer Falsification in Survey Research.. Section on Survey Research Methods, American Statistical Association. Accessed on 2003-05-15. www.aapor.org/interviewfalse.pdf
- Further reading -- Interviewer Effects
- Do interviewers' Health Beliefs and Habits Modify Responses to Sensitive Questions? A Study using Data Collected from Pregnant Women by Means of Computer-assisted Telephone Interviews. Anne-Marie Nybo Anderson, Jorn Olsen. American Journal of Epidemiology 2002: 155(1); 95-100.
- Further reading -- Nonresponse Bias
- Response and nonresponse bias in oral health surveys. D Locker. Journal of Public Health Dent 2000: 6072-81. [Medline]
- Separating Refusal Bias and Non-Contact Bias: Evidence from UK National Surveys. Peter J. Lynn, Paul Clarke, Institute for Social & Economic Research. Accessed on 2003-10-20. www.irc.essex.ac.uk/pubs/workpaps/wp2001-24.php
- PEDAKSI: methodology for collecting data about survey non-respondents. Peter J. Lynn, Institute for Social & Economic Research. Accessed on 2002-February. www.irc.essex.ac.uk/pubs/workpaps/2002-05.php
- Further reading -- Response Rates
- Response rates to mail surveys published in medical journals. D. A. Asch, M. K. Jedrziewski, N. A. Christakis. Journal Clinical Epidemiology 1997: 50(10); 1129-36. [Medline]
- Reported response rates to mailed physician questionnaires. S. M. Cummings, L. A. Savitz, T. R. Konrad. Health Serv Res 2001: 35(6); 1347-55. [Medline]
- Further reading -- Reliability and Validity
- seamonkey.ed.asu.edu/~alex/teaching/assessment/reliability.html Reliability and Validity by Chong Ho (Alex) Yu. This page discusses the issues surrounding reliability and validity.
- trochim.human.cornell.edu/kb/measure.htm Measurement by Bill Trochim. This page discusses various research topics in psychology including the various types of validity.
- www.yorku.ca/dept/psych/classics/Cronbach/construct.htm Construct Validity in Psychological Tests, Lee J. Cronbach and Paul E. Meehl (1955). First published in Psychological Bulletin, 52, 281-302. The full text of this classic paper on validity is available on the Internet.
- Rating health information on the Internet: navigating to knowledge or to Babel? Jadad, A. R. and A. Gagliardi (1998). Jama 279(8): 611-4.
- Developing a scale for measuring the barriers to condom use in Nigeria. Sunmola, Adegbenga M. Bull World Health Organ, 2001, vol.79 no.10. ISSN 0042-9686.
This webpage was written by Steve Simon on 2002-07-12 and was last modified on 2008-07-14. Send Category: Ask Professor Mean, Category: Survey design
Stats >> Training >> Stats #45: Practice Exercises
1. Select a disease (e.g., asthma) which affects patients that you work with regularly. Develop a list of five or more questions that you might ask on a survey that addresses quality of life issues associated with that particular disease. Lay these questions out on the flipchart provided to your group.
Privacy concerns in research (July 12, 2002)
Dear Professor Mean, I want to do some research using tissue samples, but the Institutional Review Board has said that I have to get consent first, because the data are not anonymized. The also told me that I might be able to get a waiver from consent if I de-identify the data. What's up with all these privacy concerns in research. -- Doubting Denise
Dear Doubting,
When you are a statistician, it's hard to understand privacy concerns, because no one is interested in us. Even Professor Mean himself finds it difficult to attract any awareness. He was talking about statistics to his cat and she slept through the whole discussion.
Short answer
Privacy is indeed a major concern, and there is much that we statisticians can do to help preserve privacy. Every research project has different privacy concerns, but here are some general suggestions.
- Strip out direct identifiers
- Beware of indirect identifiers
- Securely store any linking data
- Use computer algorithms to preserve confidentiality
Understanding privacy concerns
It's easy to overlook the importance of privacy. You can think to yourself, I don't care if anyone knows that I received a flu shot on November 7, 2003, that the nurse used a Big Bird bandage and that I left with a lollipop even though those were only intended for little kids. But a story I read in the newspapers about a decade ago made me realize the importance of privacy.
The Internal Revenue Service had to fire several employees because they abused the confidentiality of U.S. tax returns. These employees were amusing themselves during work breaks by browsing through the computerized tax returns of rich and famous people. My first reaction was to think how much fun it would be to look at information about Julia Roberts or Steven Spielberg. But then I realized that these people gave this information to the U.S. government reluctantly. They would tolerate the use of these returns for official government business, but they didn't provide this information to entertain every curious civil servant who had access to these records.
If people are protective about their financial records, they are often far more protective of their medical information. They provide this information only reluctantly to health care professionals and they want some assurance that this information is not abused. Inappropriate disclosure of health information can often cause embarrassment, or even financial problems if it gets into the hands of your employer or your insurance company.
Privacy is also a very individual issue. Some people talk openly about the type of birth control they use, for example, and others consider this top secret. A friend of mine (who I shall not name) did not want any of her co-workers to know she was pregnant until it was clearly visible. She was at high risk for miscarriage and didn't want to deal with the public fallout if her pregnancy ended early.
Strip out direct identifiers
No statistical analysis will require the use of the actual names of the patients, but people routinely send me data files with names. I will immediately strip out the names, because I don't want to accidentally see the name of one of my neighbor's kids in the data set.
When you are working with a data set of your own patients, it may help you during the data collection and entry phase to include the patient's name. You might possibly catch some errors during data entry because you remember some of the details of a particular patient.
Even so, you might want to avoid recording patient names anyway. The incremental gain in data quality is probably not worth the risk of accidentally disclosing private information.
There is no justification for using patient names if these aren't your patients to begin with. Strip out those names and have a talk with the person who sent you the data.
There are other obvious examples of direct identifiers, such as a full face photograph of the patient, a social security number, or a fingerprint.
A medical record number gives some semblance of anonymity, but according to HIPAA guidelines the medical record number is a direct identifier because it allows anyone with access to medical records to identify the individuals.
Marketing identifiers
You should also be sensitive about phone numbers, emails, and addresses because they can often identify an individual. My email address, for example, includes my first initial and last name. Even it cannot directly identify an individual, a phone number or address is open to abuse by unscrupulous marketers.
For a drug company, private health information is almost an irresistable temptation. Wouldn't it be nice to have a mailing list narrowly targeted to a disease that your new blockbuster drug can help cure? But people who suffer the heartbreak of psoriasis, do not want to be reminded of this when they open their mailboxes.
People have also learned to closely guard their email addresses. Once your name gets on a spam list, you will never see the end of offers for mail order sales of Viagra and other annoying things.
Indirect identifiers can also compromise privacy
Although a person's name is an obvious identifier, there are other pieces of information that are indirect identifiers that could potentially compromise patient privacy. An indirect identifier is a data values that narrow down the scope of possible patients so that when combined with other information, would potentially allow you able to identify an individual patient.
As an example (this example is only meaningful if you live in the Kansas City area), I have a famous neighbor who lives in the next cul-de-sac. I could tell you two things about him, and either fact alone would tell you nothing about him but the combination identifies him exactly. This person is currently plays wide receiver for the Kansas City Chiefs football team. This person also used to play for the Denver Broncos football team.
If you are a big football fan, you will know who I am talking about. Even if you are not a football fan, this is a potential disclosure. Keep this in mind for health information; just because you are unable to identify someone does not mean that it is private.
For example, I have been told that if you know someone's birthdate and their zip code, you would be able to identify that person. I don't know how you could do this, but I'm sure that it could be done.
How to handle indirect identifiers
Be sure to keep information about indirect identifiers inside the hospital. If you must share some of this information for research purposes you should get a limited data set use agreement (see below).
Geographic divisions: Any information that allows you to limit the location of someone to a very narrow geographical region has the potential to compromise security. This includes zip codes, a piece of information commonly used in statistical analyses. If your data are coded into very large geographic groups, such as the first three digits of the zip code or the state of residence, then you should not have any concerns about privacy, unless you are dealing with a very rare disease or a very exclusive population.
Birthdates and ages: Knowing when a person was born can also compromise confidentiality. You can also implicitly calculate a birthdate if you know that person's exact age on a certain date. This is especially relevant to studies of infants, where you might often measure the age in weeks or days. Rounding ages to the year will usually avoid privacy issues.
Dates of exams and procedures: The actual dates of certain procedures can also sometimes cause problems. Dates of admission to a hospital or surgery dates should be treated with caution, unless they are recorded in very broad categories such as the year.
For all of the information described above, context is important. In some situations, the restrictions described above have to be tightened. For example, there are so few people ages 90 and above that it is not a good idea to identify their ages, even rounded to the year. When you are dealing with a very rare disease, like certain childhood cancers or a very rare procedures like heart-lung transplants, there are so few people in the population that you need to be extra cautious about indirect identifiers.
Exceptions: In some situations, you can loosen these standards, but you have to get some certification from a professional that the information has very little chance of compromising security.
As a very very rough rule of thumb, if a combination of variables could only narrow down the list of people to groups three or more, then some semblance of confidentiality is maintained. So, as an example, you want to include information about the location of a patient who has had a bone marrow transplant, as well as their gender and their age rounded to the year. If you reported it at the county level, then you need to guarantee that for every combination of age, gender, and county that there are at least three patients. If certain combinations are rare, such as there are only two male bone marrow transplants of age 2 in Jackson County, then you need to create bigger groups. Make the geographic region larger (Western Missouri) or the age group broader (0-3 years of age) so that the minimum number in a crosstabulation is large enough. If you had hundreds of patients in each group, then you could afford to go to a higher level of detail, like zip code instead of county, or age in months rather than age in years.
Sharing data outside the hospital
When you share data with researchers outside the hospital, you have three choices.
At one extreme, you could ask for permission from the patient before you share this information. This is a reasonable approach for some prospective studies where you know in advance what information needs to be shared.
At the other extreme, you could strip out any direct or indirect identifiers in the data set. This creates a de-identified data set that can be shared without any privacy concerns.
In between the two extremes is a limited use data set. This is a data set without any direct identifiers and only those indirect identifiers that are needed for research purposes. The person getting the data has to sign an agreement to use the data only as permitted, limit who else can see the data, and promise to not identify or contact anyone in the data set.
With some data that you share, you might need to link this information back at a later date to individual patient records, you can develop a code link, but it cannot be something like a medical record number. Store the linking information in a secure location, like a locked file cabinet.
Use computer security algorithms to preserve confidentiality
If you regularly store confidential information on your computer, you can do some common sense security measures. One very simple and effective approach is to put password protection on your screen saver. This will provide assurance that no one can rummage around on your computer while you are gone.
You should also avoid storing confidential information on a floppy disk that could easily be left somewhere public or on a laptop that could get stolen. There are, however, password protection systems available for laptops and floppy disks. Just make sure that your password is not left up on a sticky note by your computer monitor.

A database can help insure security by segregating private and open data. For people who need to see both types of data, you can link the private and open data (see above) but when you send the database outside, you can insure privacy simply by deleting the table with private information.
There are other computer methods for security that you might want to investigate. In particular, public key cryptography is a really useful technology for maintaining security.
Cryptography is the use of codes to hide information from the eyes of people you don't trust. Most code systems involve the use of a key that allows you to encode and decode information. Public key cryptography is different. There is one key that will allow you to encode information and a different key that will allow you to decode information. This allows a lot of options for one way information transfer that maintains security.
Suppose, for example, you are collecting monthly survey information from Professor Mean and others and would like to track the results for each individual without having to keep the personal identifiers for Professor Mean and others on hand.
Every month when you get Professor Mean's survey, use the encryption key to translate the string "ProfMean" to the coded string "GaEtNrIuUeS" and use that value in the database. Anyone working with the database will know which surveys belong to "GaEtNrIuUeS" but will not be able to translate it back to the original "ProfMean" unless they already know the names of some of the participants or they know the value of the decryption key.
Public key encryption can help with other situations, like multi-center trials. In these trials you often have multiple people are entering data, but you still need a strong level of security. All these data entry specialists could encrypt the data, and store it in a common location, but they would be unable to decode the master file.
The technical details of public key encryption are quite CaObNsFtUrSaIcNtG, so you should consult with a programming expert to set this up properly.
Further reading
- Committee on Privacy and Confidentiality. American Statistical Association. Accessed on 2003-09-08. users.erols.com/dewolf/pchome.htm
- Data Encryption Tutorial ' Lesson 1. Julie Meloni. Accessed on 2003-03-18. hotwired.lycos.com/webmonkey/00/20/index3a.html?tw=programming
- The Effect of the New Federal Medical-Privacy Rule on Research. J. Kulynych, D. Korn. NEJM 2002: 346(3); 201-204.
- Health Insurance Portability and Accountability Act Privacy Regulations: Consequences for Use and Disclosures of Patient Information for Research Purposes. Michele Garvin, Jessica Lind, The National Council of University Research Administrators. Accessed on 2003-09-08. www.ncura.edu/newsroom/enews/August2001/HIPAA.doc
- The High Cost of Skepticism. Carol Tavris. Skeptical Inquirer 2002: 25(4); 41-44. [Full text]
- How Do the Federal Regulations Define "Research with Human Subjects"?. MCO Research & Grants Administration. Accessed on 2003-01-23. www.mco.edu/research/query7.html
- Information for Covered Entities and Researchers on Authorizations for Research Uses or Disclosures of Protected Health Information. U.S. Department of Health and Human Services. Accessed on 2003-07-21. www1.od.nih.gov/osp/ospp/hipaa/authorization.pdf www1.od.nih.gov/osp/ospp/hipaa/authorization.asp
- Investigator Checklist for HIPAA Privacy Rule Compliance. Partners Human Research Committee. Accessed on 2003-03-14. healthcare.partners.org/phsirb/hipatodo.htm
- Issues to Consider in the Research Use of Stored Data or Tissues. Office for Protection from Research Risks, U.S. Department of Health and Human Services. Accessed on 2003-07-28. ohrp.osophs.dhhs.gov/humansubjects/guidance/reposit.htm
- Medical Privacy - National Standards to Protect the Privacy of Personal Health Information. Office for Civil Rights, U.S. Department of Health and Human Services. Accessed on 2003-03-14. www.hhs.gov/ocr/hipaa/privacy.html
- Medical privacy and medical research--judging the new federal regulations. G. J. Annas. New England Journal of Medicine 2002: 346(3); 216-20. [Abstract]
- PGP Corporation. Protecting Confidential Information. In Transit, In Storage, Everywhere, All the Time.. PGP Corporation. Accessed on 2003-09-08. www.pgp.com/
- Protecting Personal Health Information in Research: Understanding the HIPAA Privacy Rule. U.S. Department of Health and Human Services. Accessed on 2003-04-22. privacyruleandresearch.nih.gov/pr_02.asp
- Welcome to the American Statistical Association's Privacy, Confidentiality, and Data Security Website. Committee on Privacy and Confidentiality, American Statistical Association. Accessed on 2003-08-11. www.amstat.org/comm/cmtepc/
- What's so important about conducting research involving third parties? L. Murrelle, C. R. McCarthy. J Contin Educ Health Prof 2001: 21(4); 198-202. [Medline]
This page was last modified on 04/28/08 . Send dot edu or click on the email link at the top of the page. Category: Ask Professor Mean, Category: Privacy in research
Please fill out an evaluation form. Your input is important. These evaluation forms also ensure that we can offer Continuing Medical Education credits for this class.
