Children's Mercy Hospital
For Patients and Families   Your Child's Health   Clinical Services   |   For Health Care Professionals   Medical Education   Medical Research

Stats #01: Using SPSS to Manage Your Research Data

Content:  This three hour training class will give you a general introduction in how to use SPSS software to manage your research data. This class is useful for anyone who needs to use SPSS to enter or analyze research data. Students should know how to use a mouse and how to open applications within Microsoft Windows. No statistical experience is necessary. This class will provide hands-on computer experience using SPSS software. You will also use a simple Excel spreadsheet (bf.xls) and a Microsoft Access database (practice.mdb) in some of the practice exercises.

Objectives:  In this class, you will learn how to:

  • Import data from Excel into SPSS;
  • add documentation to your data set, and
  • enter and manipulate dates in SPSS.

Teaching strategies:  Didactic lectures and individual computer exercises.

IRB Education Credits:  This class does not qualify for IRB Education Credits (IRBECs).

Outline:

  • Seating in the computer lab
  • Overview of the STATS web pages
  • Consulting services that I provide
  • Installing SPSS terminal server (draft)
  • Importing spreadsheet data into SPSS
  • Importing database files into SPSS
  • General guide to data entry
  • Spreadsheet or database
  • Documenting your SPSS data sets
  • Inputting a two-by-two table into SPSS
  • Date calculations in SPSS
  • Stats #01: Practice exercises
  • Please fill out an evaluation form

Welcome to this SPSS computer training class! Please be seated in front of any computer which has a monitor turned on. If the monitor is turned off, that means the computer is not working properly today.


Overview of the STATS web pages (January 21, 2000)

What are the STATS web pages?

The STATS pages are a collection of handouts that I use in my job as a statistical consultant. The web provides a nice home for these handouts, because as I update my material, the newest version is immediately available to anyone who is interested.

Where can I find STATS?

If you have a web browser, like Internet Explorer or Netscape Navigator, you can surf on over to my site,

http://www.childrensmercy.org/stats

which is also found at http://internet1/stats, if you are attached to the Children's Mercy Hospital network. There are two obsolete sites: http://www.cmh.edu/stats and http://simon/stats. Do not use either of these sites.

Some of the fun stuff you can find on the STATS web pages.

Ask Professor Mean.  For the tough Statistics questions that Dear Abby won't touch.

Planning Your Research Study.  Things you need to plan for before you start collecting your data.

Selecting An Appropriate Sample Size.  How much data do you really need?

Managing Your Research Data.  Everything you want to know before you step to the keyboard.

Steps In a Typical Data Analysis.  I have my data on the computer. Now what?

How to Read a Medical Journal Article.  Reading a journal is hard work. Here's some help.

Professor Mean's Library.  Good books and good web sites about Statistics.

... and even more good stuff!!!

This webpage was written, edited by Linda Foland, and was last modified on 07/08/2008. . Category: Website details


For CMH employees only: Statistical Consulting Services.

You can get free statistical consulting if you work for Children's Mercy Hospital. Ashley Sherman provide a wide range of statistical consulting services to help you with your research projects. This help can start as early as the initial planning of your research. I also help with the analysis of your data, using SPSS or other statistical software. We can also provide assistance with the preparation of your presentations and publications.

Here area some examples of the services that we have provided:

  • setting up your research hypothesis,
  • selecting and justifying your sample size,
  • writing the statistical methods section for your grant,
  • preparing randomization tables for your study,
  • reviewing your surveys for content and quality,
  • developing a system for entering your data,
  • choosing an appropriate statistical model for your data,
  • establishing validity and/or reliability for your measurement scales,
  • checking for violations of statistical assumptions in your data,
  • producing graphs and tables for your research publication, and
  • providing references for new and unusual statistical methods.

Specific statistical advice has been outlined on a series of web pages which can be found at http://www.childrensmercy.org/stats/. The pages provide advice about planning your research, selecting an appropriate sample size, managing your research data, performing a variety of data analyses, presenting research data, and writing research papers.

This webpage was written on 2003-04-30 and was last modified on 2008-07-08. Category: Professional details


Directions to my new office (April 25, 2008).

I have moved to a new office. It is a modular building just north of Children's Mercy Hospital. It is between 23rd and 22nd street, just off of Kenwood Avenue (Kenwood is a small north/south street just west of Holmes). If you need to get from your office to mine, here are some directions written by my Administrative Assistant, Judy Champion.

  • Take the elevator of the research tower down to the yellow level. Exit the employee parking garage on 23rd Street, walk to Kenwood and cross 23rd Street. Your destination is Building M 3 which is the building closest to 22nd Street. However, the entrance to our building faces Building M 2. It's best to walk into the parking area that is just north of Building M 1 and follow the sidewalk around the west side of building M 2 in order to get to our building's entrance on its south side. Another route would be to exit the Hospital Hill Center Building on Holmes and then walk ' block north to 23rd Street, cross 23rd Street, walk west to Kenwood then north to building M 3 address 2220 Kenwood.

2008-07-14. Send Category: Professional details


Terminal Server (February 3, 2003)

Terminal server is a new and improved approach to using SPSS and SigmaPlot and other programs. You log on to a dedicated computer and run your programs on that computer rather than running SPSS or SigmaPlot through the network.

Terminal server offers several advantages:

  • Because the code runs on a dedicated computer, SPSS and SigmaPlot will load faster and run faster.
  • You will no longer have to worry about upgrading to new versions of SPSS and SigmaPlot. The upgrades will be handled for you.
  • If you have an older computer with compatibility problems, you will encounter fewer difficulties with terminal server.

Listed below are instructions on how to load terminal server on your computer. It is very easy, even for someone who is not a computer nerd. If you prefer to have someone else load terminal server for you, please ask your contact person in Information Systems for help.

We have done some work with the test version of terminal services. You don't need to use the test version, except for special and unusual situations.

Contents


Downloading and installing terminal server.

The software to load terminal server is located on an internal web site. Open Internet Explorer and type

http://10.1.20.59/ts/install.exe

in the address bar. You will get a FILE DOWNLOAD dialog box (see below) that will ask you what to do with the file.

It might look slightly different, depending on the version of Internet Explorer that you are using. Click on the OPEN button. If you don't see an OPEN button, click on the RUN button.

If you see a SECURITY dialog box and/or WINZIP dialog box, click on the YES button to continue.

Once the installation is complete, Click on the START button and select Programs | Terminal Service Client | Client Connection Manager. Then right-click on the CMH TERMINAL icon. This brings up a pop-up menu (see below). Select PROPERTIES from the popup menu.

This will open up the PROPERTIES dialog box. Select the CONNECTION OPTIONS tab and click on the FULL SCREEN option button. This will ensure that terminal server will use your full screen rather than just part of your screen.

Click on the OK button to close this dialog box.

For a second time, right click on the CMH terminal icon to bring up the popup menu. Select CREATE SHORTCUT ON THE DESKTOP from the popup menu. If you do not see an option for "Create Shortcut on the Desktop", you can select "Send To" and then "Desktop (create shortcut)".


Can I load terminal server on my laptop? Your laptop needs to be connected to the network using a high speed internet connection or it needs to be attached directly to the hospital network. Follow the same steps described above. This will allow you to use SPSS on your laptop as long as you have a direct network connection or a connection via high speed internet access.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Logging on to terminal server.

Click on the TRMSERV icon and a TRMSERV - Terminal Services Client window will appear. In the Log On to Windows dialog box (see below), type the same user name and password that you use when you turn on your computer in the morning.

You will now see the desktop of terminal server (see below). This looks very similar to your own desktop, except that it has a different background color and it has the SPSS 11.5 for Windows icon.

You are now connected to terminal server. Double click to open the SPSS folder and then click on the SPSS icon to run SPSS.

How do I exit from terminal server?

At the bottom of the terminal screen is a start button that looks just like the START button on your regular computer. Click on START and select Shut Down from the menu. You will see either a DISCONNECT or LOG OFF option chosen (see below).

Either one works the same. Click on the OK button. Close the Connect to Terminal Server window.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Using files with terminal server.

You cannot use your floppy disk drive or your local hard drive directly from terminal server. Instead, you must save the file on a network drive. The best location is probably your user folder.

You have to tell terminal server the network name of your user folder. Do this once and your computer will remember from that moment forward.

Connect to terminal server and click on the MY COMPUTER icon. This will bring up a folder labeled My Computer (see below).

From the menu, select Tools | Map Network Drive. This will bring up the Map Network Drive dialog box (see below).

You need to assign a drive letter to the location of your network files. It would be best to set the drive letter to V:, but you can use a different letter if you like. Then type in the name of your folder. For me, it would be \\cmhsan08\users\ssimon.

After you have saved to the network, you can copy the file to a floppy disk or a local hard drive.

How do I open files in SPSS terminal server?

You can only open files located on the network. Before you connect to terminal server, copy the file from your floppy disk to a location on the network.

How do I use the training example data sets?

Training example data sets appear on a folder on the desktop as the SPSS Examples folder. Double click on this folder to open it. You can also find this folder on the D drive at D:\SPSS Examples.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Printing from terminal server.

You have to tell terminal server the network name of the printer that you normally use. Do this once and your computer will remember from that moment forward.

Open Internet Explorer and select File | Print from the menu.

Click on the ADD PRINTER icon and follow the instructions. You will get a series of dialog boxes labeled Add Printer Wizard. The instructions are mostly straightforward. After an introductory screen (not shown), you will get a following dialog box asking if you are adding a local printer or a network printer. You must choose the Network printer option (see below), since terminal server does not work with local printers.

It helps if you know the exact name of your printer (one of the printers I use is named \\hpprint02\Medrsrch2).

If you know the exact name of your printer, you can type it in the above dialog box. If you are not 100% certain about the name of your printer, check the option anyway and leave the name blank. You will get a list of printers and print servers to browse through (see below). Do not use the Find a printer in the Directory option button, as that does not work well (at least not for me).

Once you have selected your printer, you should decide if this is the default printer, the one that SPSS terminal server will try to use as its first choice.

When you click on the Next button, you will get a dialog box summarizing your choices. If these choices appear reasonable, click on the Finish button. If something appears to be wrong, use the Back button to fix things.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Removing terminal server from your computer.

Click on Start | Settings | Control Panel | Add/Remove Programs. Find Terminal Services Client on the list of programs and click on the Change/Remove button.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Terminal server--What if I get an error message?

First of all, don't panic. Some of these error messages occur because efforts to protect against viruses, trojan horses, and other malicious software also interfere with the normal operations of SPSS and Terminal Server. Here are some of the messages that I have encountered already with a brief explanation of what causes the message and how to work around it.

If you encounter an error message other than the ones described here, please contact me.

Administrator access message. I have never seen this message on my computer, but your computer might pop up a dialog box when you are trying to install terminal server that says something along the lines of you don't have sufficient access or permission or administrator privileges to run SPSS. When IS set up your computer, they added a security layer that protects you against malicious viruses and other computer threats but which also disables your ability to install software on your own. You need to call the help desk and they will temporarily grant you "king/queen for a day" privileges that will enable you to install your own programs for a limited time.

"The client software could not initialize with SPSS Server at ." When loading SPSS, I would get a dialog box that says: "The client software could not initialize with SPSS Server at ." The folks at SPSS told me the solution. "This is the result of either a missing or corrupted file named 'registry.txt' in the SPSS program folder. This problem can be fixed by either reinstalling SPSS or obtaining a new copy of that file from our FTP site and replace it with the one in your SPSS directory. That file is located at ftp://ftp.spss.com/pub/spss/windows. Please locate the one that's specific to your SPSS version." -- SPSS Web Support, personal communication, September 25, 2002.

"This action has been cancelled due to restrictions put on this computer." You should not be getting this error message anymore, but I am keeping it here just in case. This message is actually a paper tiger. What happened a while back is that someone was running terminal server from home and thought it would be fun to download some games to run on terminal server. You know what happened next, of course. Virus attack on terminal server! So our IS folks decided that they had to add some major security restrictions to terminal server. The restrictions interfere with some of the minor bookkeeping activities with SPSS as it starts up. Apparently when SPSS checks for a proper license, it touches a part of terminal server that raises a security flag. But whatever happens on terminal server stays on terminal server. If you click OK on the dialog box, everything in SPSS works just fine.

"Windows cannot access the specified device, path, or file. You may not have the appropriate permissions to access." At CMH, we have created an SPSS group for security reasons. If you are not part of the SPSS group, you cannot access SPSS and you will get an error message along the lines of the above. Call the help desk (5-3454) and ask to be added to the SPSS group. You may need to reboot your computer afterwards.

"You do not have sufficient access to your machine to connect to the selected printer."  This message appears when you are trying to get Terminal Server to recognize and print to your networked printer. This occurs when IS has not installed the appropriate printer drivers on terminal server. Tell me the brand name of your printer, and we can fix it from our end.

This webpage was written on 2003-06-06 and was last modified on 2008-07-08.


Stats >> Software >> Terminal server

This webpage was written and was last modified on 07/08/08.


Importing spreadsheet data into SPSS.

Dear Professor Mean, I need to import data in an Excel spreadsheet, but I can't get SPSS to read this data properly. Can you help? -- Stumped Stan

Dear Stumped,

Nothing works right the first time, but you can do a few things that will make it easier to import your Excel data. Every data set has its own unique problems, of course, so there is no foolproof way to ensure that your data will move perfectly.

Here are the four general steps that I recommend for someone who is importing Excel data.

  1. Close the Excel file before you try to import in SPSS.
  2. Arrange the data in a rectangular grid
  3. Don't mix strings and numbers.
  4. Put descriptive names in your first row.

Rectangular grid

A rectangular gird is a systematic layout of your data so that that the intersection of every column and row contains a single number. The data should start in the first row of the spreadsheet, or the second row, if you use the first row as column labels. Don't leave any "holes" in the spreadsheet.

Be sure to delete any rows of your spreadsheet that contains summary data like totals or means. You don't want SPSS to think that this summary row is just another row of data.

Don't mix strings and numbers.

A mixture of strings and numbers in a single column will confuse SPSS. SPSS uses the first value that it sees in a column to decide if that column should be stored using string, date, or numeric format. If any further values in that column do not match the format of your first value, SPSS will convert that value to missing.

Here's an example of a mixture of strings and numbers "1", "2", "3 or more". This type of coding will ensure that a large amount of your data gets converted to missing values. Which values stay and which ones don't depend on what appears first in your column.

Provide brief descriptive names

SPSS can use the first row of your spreadsheet as variable names, as long as you keep within the proper restrictions. Keep you names short. Although the previous restriction to eight characters or less, names that are very long become unwieldy and don't display well in the graphs and tables. You can (and should) use the variable label in SPSS to provide a longer and more detailed description of this variable.

The name has to be one word with no blanks. You can use the underscore symbol "_" or the dot to simulate blanks. You can also use MixedCapitalization to simulate blanks.

Avoid special symbols (other than the underscore and dot). Symbols like the dash (-) and the slash (/) cause problems because they imply some sort of arithmetic operation.

A variable name like "Mother's Age" causes problems because it includes a special symbol (the apostrophe) and it has a blank. If you tried to use this name, SPSS would create a generic name like VAR00001 and use "Mother's Age" as a variable label. A name that SPSS will tolerate would be "mom_age" or "MomAge" or "mom.age".

It takes some creativity to describe a variable well with only eight characters. Do the best you can. Remember that you can always add a lengthy variable label later that has blanks, special symbols.

Close the Excel file

Here is an Excel spreadsheet with data from a breast feeding study. I have already arranged the data in a rectangular grid and placed brief descriptive names in the first row..

Make sure that Excel is closed before you try to import in SPSS. SPSS is very jealous. It will not want to open your data file if it knows that some other software is currently using it. SPSS will warn you about a "sharing violation".

Open SPSS and select FILE | OPEN from the menu. Here is the SPSS dialog box that you will see. Click on the down arrow in the FILE OF TYPE field and select the EXCEL (*.XLS) option. Find your file on the proper drive and folder of your computer.

When you click on the OPEN button, you get the dialog box shown below. Click on the READ VARIABLE NAMES BUTTON if the first row of your spreadsheet has variable names. Then click on the OK button.

Check if you got the correct number of variables (columns) and cases (rows). A common problem is that SPSS will sometimes import a bunch of extra blank rows. You can delete the blank rows manually.

Here is what the SPSS data window looks like. We are now ready to do things like adjusting the number of decimal places displayed and adding documentation.

Summary

If you want to import Microsoft Excel data into SPSS, follow these four steps:

  1. Close the Excel file
  2. Arrange the data in a rectangular grid
  3. Don't mix strings and numbers.
  4. Put descriptive names in your first row.

Once you have done this, select FILE | OPEN DATA from the SPSS menu. Then click on FILE OF TYPE field and select the EXCEL (*.XLS) option.

This webpage was written on 1999-08-20 and was last modified on 2008-07-08. Category: Ask Professor Mean, Category: Data management, Category: SPSS software


Importing database files into SPSS.

Dear Professor Mean, How do I import database files into SPSS? I don't want to re-type everything, because there are 70,000 records. The data are stored in a Microsoft Access file. -- Vexed Vidya

SPSS can import data from a variety of sources using a system known as ODBC (Object Data Base Connectivity). ODBC has links to just about every database that you would ever need to use.

Short explanation

I'll show you an example using Microsoft Access, but this would work just as well on other database systems, such as Oracle and Informix. To import data from Access, select FILE | DATABASE CAPTURE | NEW QUERY from the SPSS menu.

More details

When you import data using ODBC, SPSS asks you what type of data source you want to import from. On my system, I have the ability to import Access, Excel, and FoxPro files. Through Microsoft Windows, I can add the capability of importing from other sources like Oracle or Informix if I needed these formats.

I can also specify a particular location that I want to import from on a repeated basis. In the example I show later, you will see that I have defined data sources labelled "ghstudy", Menninger", "patient complaints", "Santos" and "x". Providing a pre-specified location for my import is especially useful for databases that are being updated on a regular basis. If you want to define such a source, you can click on the ADD DATA SOURCE button, but I will not provide any details about it in this handout.

After you specify the type of data you want to import, SPSS will ask you for the following details.

  • Where the data are located
  • The table or tables in your database you want to import

You also have the following options

  • Specifying relationships between tables
  • Selecting a subset of your data
  • Renaming some of the variables
  • Saving the query for re-use

Saving the query for re-use is another way of simplifying repeated imports from the same data set. Saving the query will save not only the location of hte database you want to import from, but also the information about subsets, changes in variable names, etc.

Example

Here is an example of importing an Access database with data from a growth hormone study. Select FILE | DATABASE CAPTURE | NEW QUERY from the SPSS menu FILE | DATABASE CAPTURE | NEW QUERY from the SPSS menu.

Import6.bmp (289942 bytes)

The dialog box shown above allows you to select your data source. Click on ACCESS 97 and then click on the NEXT button.

Import7.bmp (101674 bytes)

The dialog box shown above asks you for a location for your Access database. Be sure that you select the correct drive and folder. Then click on the file, and click on the OK button.

SPSS gives you a list of all available tables and queries within this database.

Import9.bmp (289942 bytes)

The dialog box shown above gives you a list of all available tables and queries within this database. Drag the table from the AVAILABLE TABLES field into the RETRIEVE FIELDS IN THIS ORDER field. If you want data from more than one table/query, repeat this process. If there are some variables you do not want to import, drag them out of the RETREIVE FIELDS IN THIS ORDER field.

If you have a simple import, you can click on the FINISH button now. If you click on the NEXT button instead, SPSS will give you some options to fine tune your import. You can

  • Specifying relationships between tables
  • Selecting a subset of your data
  • Renaming some of the variables
  • Saving the query for re-use

Saving the query for re-use is another way of simplifying repeated imports from the same data set. Saving the query will save not only the location of hte database you want to import from, but also the information about subsets, changes in variable names, etc.

What should I do if Access files are not listed as a data source?

On some computers, Access files are not listed as a data source in the database capture wizard dialog box. An example of this appears below.

In order to use these Microsoft Access files, you need to click on the ADD DATA SOURCE button. This calls up the following dialog box.

In the ODBC DATA SOURCE ADMINISTRATOR dialog box, you want to add a new data source. Click on the ADD button. You also have the option of removing a data format you no longer need (REMOVE button), or changing some of the options in a data format you already have (CONFIGURE button). When you click on the ADD button, you get the following dialog box.

In the CREATE A NEW DATA SOURCE dialog box, you have a list of ODBC data sources which you can add to your system. Highlight the driver you want (in this case the Access driver) and click on the FINISH button.

As the dialog box above shows, you're not quite finished yet. Tell the system a name and description for this format. A good name would be "Access files" and a good description would be "Microsoft Access 97 files (*.mdb)". If you repeatedly use the same database, you could even have the system select this database automatically (SELECT button). It's a good idea to set up a general driver first, so add a name and description and then click on the OK button.

Now your dialog box has the format you need. Click on the OK button to finish up. This format will appear every time now when you run a database capture in SPSS.

What should I do if I can't find the driver for Microsoft Access in the CREATE NEW DATA SOURCE dialog box?

If you are still having problems please let me know. I can suggest several options that might work.

  1. Call our tech support line and ask for help. Explain that your system does not have the ODBC drivers for Microsoft Access installed.
  2. The CD ROM for SPSS has a special folder called ODBC Drivers. Find the file dataacc.exe and run it. I have this CD ROM and will be willing to help you get it running.
  3. You can also go to http://www.microsoft.com/data/, the Microsoft Universal Data Access Web Site. Download the latest version of MDAC (Microsoft Data Access Components). I have not tried this, so I don't know how easy it is to install.
  4. You might also find MDAC on the Microsoft Access or Microsoft Office CD-ROM.

There are some more details in some email messages that appeared on the SPSSX-L listserver on September 10-11, 2000.

What if SPSS asks for a password?

On some systems, SPSS will ask you for a password, even when the database you are trying to import does not have a password. There are several ways to work around this problem.

1. Configure a specific data source for your particular database. In the ODBC MICROSOFT ACCESS 97 dialog box, there is a SELECT button. Click on this button and tell SPSS where to find the particular database you are working with.

2. You can also use SPSS syntax to open the database. Here is an example.

GET DATA
/TYPE = ODBC
/CONNECT = "DSN=MS Access Database;"
"DBQ=C:\Windows\Desktop\test.mdb;"
/SQL = "SELECT * FROM [YOUR_TABLE_NAME]".
CACHE.
EXECUTE.

The above code uses a database language known as SQL. It is a very easy and very powerful language. Here is an example of using SQL to join two tables, rename some variables, and sort the results.

/SQL = "SELECT "
"T1.[Employee ID] AS idnum, "
"T2.[FullName] AS empname "
"FROM Table1 as T1 INNER JOIN Table2 AS T2 "
"ON T1.[Employee ID] = T2.[Employee ID] "
"ORDER BY T2.[LastName], T2.[FirstName]"

The information described above is taken from two web sites: the SPSS AnswerNet and Raynald Levesque's SPSS Syntax web page.

Further reading

  1. SPSS AnswerNet. SPSS, Inc. Accessed June 25, 2002. The SPSS AnswerNet allows you to search the same SPSS Technical Support database we use to locate solutions to problems. http://www.spss.com/tech/answer/index.cfm
  2. SPSS syntax is a must! Levesque, Raynald. Accessed on June 25, 2002. Don't satisfy yourself with the Graphic User Interface (GUI)! The GUI is fine (I use it every day); however, using syntax in addition of the GUI can easily increase productivity by a factor of 5 to 10 times for simple jobs. The increase can easily be 50 times or more for larger, complex jobs. Furthermore some of SPSS's features are only available through syntax. As a "bonus", syntax files work on all versions of SPSS, not just on Windows.
    There is something for everybody in the sample syntax's included here: some do simple things, are easy to understand and have a lot of comments; some do complex things and have either no comments or a lot of comments; others fall between these two extremes. Suggestions and code contributions are welcomed. Share what you know! Learn what you don't!
    http://pages.infinit.net/rlevesqu/SampleSyntax.htm

Summary

Vexed Vidya wants to import a Microsoft Access table into SPSS. To import Access or other database formats, you use a system called ODBC. Select FILE | DATABASE CAPTURE | NEW QUERY from the SPSS menu. You will then specify where the data are located and the table or tables in your database you want to import.

This webpage was written by Steve Simon on 1999-08-18 and was last modified on 2008-07-14. Send Category: Ask Professor Mean, Category: Data management, Category: SPSS software


General guide to data entry.

Dear Professor Mean, I'm about to start typing in my research data. Do you have any general guidelines for data entry?

Spreadsheets allow you enormous flexibility in how you enter your data. But beware, for if your spreadsheet is loosely structured, you could encounter difficulties when you import the data into statistical software like SPSS. If you follow these general guidelines for data entry, the data import will go smoothly.

  1. Arrange your data in rectangular format.
  2. Create codes for any missing values.
  3. Create variable names (8 characters or less).
  4. Assign number codes for categorical data.
  5. Provide a unique id for each row of your data.

Here are more details about each guideline.

1. Arrange your data in rectangular format.

Arrange your data in a rectangular format. The intersection of each row and column should contain a single number. Here's an example of data which does not fit into a rectangular format. These data are loosely based on a study of breast feeding in pre-term infants. The data have been shortened and modified to serve as a simple example of data entry.

Breast feeding status at six months

No                   Yes                  Lost to follow-up

Mom's Marital Birth  Mom's Marital Birth  Mom's Marital Birth
Age   Status  Weight Age   Status  Weight Age   Status  Weight

 18   Married 1.550   28   Single  2.381   28   Married 1.685
 33   Single  1.990                1.130                2.435
 34   Married         26   Married 2.060
 36   Married 1.640

Notice the jagged shape of the data. There is a 4 by 3 block of data (the No group), and then a 3 by 3 block of data (the Yes group), and then a 2 by 3 block of data (the Lost to follow-up group). If we stack these blocks one beneath another rather than one beside another, we will get a rectangular shape. When we re-arrange the data, however, we need to include an extra column of information to designate the specific block/group.

Here is what the data looks like after we re-arrange it into a rectangular format.

Breast
Feeding  Mom's Marital Birth
Status   Age   Status  Weight

 No       18   Married 1.550
 No       33   Single  1.990
 No       34   Married      
 No       36   Married 1.640
 Yes      28   Single  2.381
 Yes                   1.130
 Yes      26   Married 2.060
 Lost     28   Married 1.685
 Lost                  2.435

2. Create codes for missing values.

Even after re-arranging the data in rectangular format, there are still some blank spots in this data. These represents missing data. Never let a empty field represent missing data. Explicitly create a code for missing, and be sure to explain why the data are missing to anyone involved with analysis of your data. In this example, let -1 represent a missing value for Mom's Age and Birth Weight. Let 9 represent a missing value for Marital Status.

Here's what the data looks like when we plug up the missing value holes.

Breast
Feeding  Mom's Marital Birth
Status   Age   Status  Weight

 No       18   Married 1.550
 No       33   Single  1.990
 No       34   Married    -1
 No       36   Married 1.640
 Yes      28   Single  2.381
 Yes      -1      9    1.130
 Yes      26   Married 2.060
 Lost     28   Married 1.685
 Lost     -1      9    2.435

3. Create variable names.

If you are using a spreadsheet, place a descriptive variable name at the top of each column. If you are using a database, provide a descriptive name for each field. You will use this variable or field name in statistical software like SPSS to specify the variables that you want to analyze. Try to be reasonably descriptive with your variable names; avoid generic names like VAR01, VAR02, etc.

While a spreadsheet or a database generally places few restrictions on variable names, most statistical software (including SPSS) will not be able to handle long names or names with special symbols. Here are some general guidelines that will help avoid trouble.

Use eight characters or less. If you try to use a longer name, SPSS and most other statistical software will truncate the name to the first eight characters. It's a challenge to provide a descriptive name when you are limited to eight characters, but try your best. SPSS will later allow you to provide lengthier and more detailed description in the variable label.

A mixture of numbers and letters is okay, but avoid special symbols such as $, &, or %. Most statistical software will reserve these special symbols for other purposes. The one major exception is the underscore (_), which is found usually paired on the same key with the minus sign. In fact, when SPSS imports names with special characters, it replaces them with the underscore character.

Avoid embedded blanks. In most statistical software, an embedded blank will cause the software to presume that you are referring to two variables. SPSS, for example, gets confused when you ask for a histogram for mom age and will try instead to product two histograms, one for mom and one for age. Here's where the underscore comes in handy. The variable mom_age is easy to read. Compare this to the alternative, momage, which looks like a nonsense word rhyming with homage.

Finally, don't rely on upper/lower case to distinguish among variable names (for example, don't name one variable x and the next one X). Some packages are case insensitive. SPSS, for example, will convert your variable name to all lower case.

Here's what the data set looks like with variable names.

br_feed  mom_age  mar_st   birth_wt
 No        18     Married   1.550
 No        33     Single    1.990
 No        34     Married      -1
 No        36     Married   1.640
 Yes       28     Single    2.381
 Yes       -1        9      1.130
 Yes       26     Married   2.060
 Lost      28     Married   1.685
 Lost      -1        9      2.435

4. Assign number codes for categorical data

If you have categorical data, assign a code to each category level. Use the code during data entry to save time and minimize errors.

Here are some examples of codes: Gender 1=Male, 2=Female, 9=Unknown; Race 1=White, 2=Black, 3=Asian, 4=Hispanic, 5=Native American, 8=Multiracial, 9=Unknown; Likert scale 1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree, 9=No answer.

While I prefer to use number codes, there are some advantages to using short letter codes. Here are some examples of letter codes: Gender M, F, and U (Male, Female, and Unknown); Race W, B, A, H, N, M, and U (White, Black, Asian-American, Hispanic, Native American, Mixed, and Unknown); Likert scale SD, D, N, A, SA, NA (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree, No Answer). Letter codes are easier to remember, and sometimes can be used effectively as plotting symbols.

I prefer number codes because they offer more flexibility during statistical analysis. For example, SPSS will not allow you to draw a scatterplot when one of your variables uses letter codes. Other software will alphabetize your letter codes, which may not be what you intended. For example, an alphabetized Likert scale would be printed in the following meaningless order: Agree, Disagree, Neutral, No Answer, Strongly Agree, Strongly Disagree.

Let's assign number codes for the categorical variables in the breast feeding data example. For br_feed, let 0=No, 1=Yes, and 9=Lost. For mar_st, let 0=Single; 1=Married, and 9=Missing. With this change, the data will look like this:

br_feed  mom_age  mar_st   birth_wt
   0        18       1       1.550
   0        33       0       1.990
   0        34       1          -1
   0        36       1       1.640
   1        28       0       2.381
   1        -1       9       1.130
   1        26       1       2.060
   9        28       1       1.685
   9        -1       9       2.435

Whether you use number or letter codes, though, do make it a habit to document those codes. In SPSS, that means defining value labels for each and every categorical variable.

5. Provide a unique id for each row of your data

If practical, place a unique code in each row. A unique ID makes it much easier to track down errors during data entry. Unique codes are critical if you plan to combine your data with data from another source. The data we have been using above came from a medical records system called Meditech. In Meditech, each patient is assigned a special id code. Here is the data set with those id codes inserted.

  id        br_feed  mom_age  mar_st   birth_wt
J760223         0        18       1       1.550
J676434         0        33       0       1.990
J689673         0        34       1          -1
J785310         0        36       1       1.640
J703538         1        28       0       2.381
J675836         1        -1       9       1.130
J785827         1        26       1       2.060
J562494         9        28       1       1.685
J675320         9        -1       9       2.435

With these id codes in place, we can quickly investigate any outliers that we encounter during our data analysis.

Summary

Before you start entering data, organize your data in the following ways.

  1. Arrange your data in rectangular format.
  2. Create codes for any missing values.
  3. Create variable names (8 characters or less).
  4. Assign number codes for categorical data.
  5. Provide a unique id for each row of your data.

Follow these steps before entering your data and you will simplify the process of importing your data into a statistical software package like SPSS.

This webpage was written on 1999-09-03 and was last modified on 2008-07-08. Category: Ask Professor Mean, Category: Data management


Spreadsheet or Database?

Dear Professor Mean, I am not sure whether I should use a database or a spreadsheet to enter my data?

For data entry, there are several advantages for a database. Databases easily allow you to implement quality checks. They also allow you to easily integrate data from multiple sources. Finally, they are more effective in handling very large data sets.

On the other hand, spreadsheets are faster to set up and allow easier copying and duplication for data with repetitive patterns. Before you choose, make sure that the statistical software can import data from your database or spreadsheet.

Advantages of a database

First, databases allow you to implement quality checks in the data. For example, one of your variables might be gender. It might be coded 1=Male, 2=Female, 9=Unknown (though if gender is unknown, you might want to look at the credentials of the doctor doing the examination). With a database, you could set up data entry in that field so that it would beep anytime you tried to enter something other than a 1, 2, or a 9.

Another quality check found in databases is insuring that the same id code is not assigned to two different subjects. Database specialists refer to this as checking for unique primary keys.

It's also possible to program a database to check for consistencies in dates. If the birth date is in 1994, for example, and the examination date is in 1987, then either your data are in error or you have an extremely far-sighted pre-natal care program.

Yet another example is checking the gender or age of the subject before allowing certain data to be entered. Male subjects, for example, would not normally have a hysterectomy in their medical history. Five year olds are rarely married or widowed. The range of quality checks you can include in a database is limited only by your imagination.

Second, a database is effective at integrating data coming from a variety of sources. For example, you might have data coming from a laboratory, a questionnaire, and from the medical records. A database makes it easy to properly link the information from all three sources.

Another example of where a database is extremely useful is in a multi-center clinical trial. The database offers a standard way for data entry that helps avoid the inconsistencies that can plague such studies.

Of course, if you have a data set so complex as to take information from three different sources, then you should definitely consult an expert early in the design of your study. Databases are nice, but they are no substitute for careful planning.

Third, a database is more effective at handling very large data sets. Unlike a spreadsheet, the entire data set does not have to fit into computer memory. Of course, this is a factor only when the data set on the order of tens of thousands of records or more. If your data set is smaller than this then fitting all of the data into computer memory is unlikely to be a problem.

Advantages of a spreadsheet

On the other hand, spreadsheets can be up and ready for data entry faster than databases. The extra time required by a database might be beneficial, but for a simple data entry situation, it might just as easily be overkill.

Spreadsheets also are more efficient at copying and duplicating blocks of information. This can be a time-saver for data sets with repetitive patterns, such as multi-factorial experiments.

Other considerations

Before you choose, check to make sure that the statistics software can import your version of the spreadsheet or database. SPSS for Windows, for example, can import Excel and Lotus spreadsheets, dBase format databases like FoxPro, as well as any database or spreadsheet which supports the ODBC (Open Data Base Connectivity) standard, like Access.

Before you make your choice, be sure to factor in any human considerations. If the person doing data entry is much more comfortable with a spreadsheet than a database (or vice versa), that might outweigh some of the computer efficiencies. On the other hand, keep in mind that software in general, and database software in particular, is getting easier to use. Don't let lack of experience keep you from trying a database. It's easier than you think.

Summary

In summary, databases allow for better error checking, for better integration of data from multiple sources, and for better handling of very large files. Spreadsheets are faster to get up and running, which can be an advantage for small tasks. Spreadsheets also have an advantage when there are repetitive patterns in the data.

Which to choose? It may come down to how large and complex your data set is. The bigger and messier the data, the more a database can help.

This webpage was written on 2000-01-28 and was last modified on 2008-07-08. Category: Ask Professor Mean, Category: Data management


Documenting your SPSS data sets.

Dear Professor Mean, I need to add some documentation for SPSS data sets that I am creating. I know you covered this in your "Gentle Introduction to SPSS" class, but I've already forgotten everything. Can you review this for me? -- Baffled Bill

Dear Baffled,

The only things people remember from my training classes are my jokes. The rest is unimportant.

It's great that you want to take the time and effort to document your data. SPSS can rapidly produce dozens of graphs and hundreds of tables. Without good documentation as part of these graphs and tables, you can easily get lost. A little time spent now in data documentation will save you a lot of time later when you are interpreting your output.

Short explanation

In versions 10 and later of SPSS, there are two tabs at the bottom of the SPSS screen

  • DATA VIEW
  • VARIABLE VIEW

Use the DATA VIEW tab to enter your data, or to view your data. Use the VARIABLE VIEW tab to add documentation to your data.

From VARIABLE VIEW tab, you can tell SPSS how to display the data in the SPSS data editor window (how many decimal places shown, how dates are displayed, and how wide the columns are). You can also provide SPSS with informational labels that will appear in your output window (labels for the variable itself and, if needed, labels for category levels). You would also use the dialog box to specify any codes that represent missing data.

More details

There are a lot of important ways in which you can document your data. Start with the variable name itself, a brief description of up to eight characters of what this column of data contains. Then specify the format type (numeric, string, or date) for the data that is in this column. Now you can provide a longer description of your variable in the variable label. If your data are categorical, you can describe those categories using values labels. Finally, make sure that SPSS know which code, if any, you use to designate missing values. Repeat this task for each column of data.

Variable name

When documenting your data, your first step should be to provide a brief but descriptive variable name. This goes into the NAME column of the VARIABLE VIEW tab. SPSS provides a default of VAR00001 for the first column, VAR00002 for the second column and so forth. This coding is convenient in that it allows you to produce names for up to 99,999 columns of data. But if you have that much data, I hope you will get some help with your data entry.

Please spend some time to provide descriptive variable names. These names have to be 8 characters or less. It can be a mxiture of numbers and letters, but the very first character has to be a letter (A1 is okay, but 1A is not). The variable name can't include any blanks because it confuses SPSS (when SPSS sees a variable name like "BIRTH WT" it interprets this as two variables, "BIRTH" and "WT". For the most part, SPSS does allows no special symbols in varaible names. There is one exception, though, the underscore character (_). I'm glad they allowed this exception, because it makes it easy to create a variable name that looks like it has a blank (e.g., BIRTH_WT). Also, don't bother mixing upper and lower case here; SPSS converts everything to lower case.

The variable name restrictions on length, blanks, special symbols and case can be frustrating at times, but don't worry. When you type in a variable label, you can type in just about anything that makes you happy.

After you provide a variable name, take a look at the other columns in the VARIABLE VIEW tab. Of special interest are the following:

  • TYPE
  • WIDTH
  • DECIMALS
  • LABELS
  • VALUES
  • MISSING

middle of the DATA | DEFINE VARIABLES dialog box. There are four button here: TYPE, LABELS, MISSING VALUES, and COLUMN FORMAT.

Format type

Click in theTYPE column to add or change the format type. You will notice a gray button on the right hand side. Click on it to get VARIABLE TYPE dialog box.

This dialog box has information about the type of data that you want to use. The most common data type is NUMERIC, which is used for any data that can be represented solely by numbers. If you have numeric data, you should tell SPSS the width of your numbers and how many decimal places you want displayed. Unless you are dealing with unusually large numbers, the default width of 8 works well. For some situations, you might be tempted to use a smaller makes, but this can make it more difficult to view the variable name and the value labels. Be sure to set the number of decimal places appropriately.

For data with number codes or count data, you should change the number of decimal places to zero from the default of two. It's a minor point, but the superfluous .00 at the end of every number will make your data harder to read. For some data, you may instead need to display more than two decimal places. Keep in mind, though, that this dialog box controls how the data are displayed in the data editor window and not (for the most part) how they are displayed in the output window.

Select the STRING options for data that is all letters or a mixture of letters and numbers.When you select this option, SPSS provides a chance for you to tell how long the strings are.

In general, I encourage people to use number codes instead of letter codes for categorical data. SPSS gets confused sometimes by letter codes and restricts their use in certain procedures. Also keep in mind that SPSS places even greater restrictions on long strings (more than 8 characters in length). This is a holdover from the days of FORTRAN and IBM mainframe computers, where strings longer than 8 bytes could not be easily manipulated.

If you click on the DATE option, you will be given choices between various display formats (month names versus month numbers, two digit versus four digit years, etc.). After all the publicity about the year 2000 problem, I don't need to lecture you on being careful with dates. But also remember that people disagree over whether the month or the date should appear first.

Variable and value labels

Click on the label button to get variable labels. A variable label is a longer description of your data. Variable labels appear in your output and make it easier to follow what is going on. You can use a mixture of upper and lower case here, which I recommend for improving readability. AVOID USING ALL UPPERCASE HERE BECAUSE IT IS FAR LESS READABLE THAN A MIXTURE OF CASES.

You can put blanks and special symbols in your variable label. If you are very excited about a variable, spice it up with a couple of exclamation points. Go ahead and type to your heart's content. Just a small warning though. A variable label that is too long can make your output look a bit unwieldy. Although you can type up to 255 characters here, it looks strange to have a six inch label underneath a two inch histogram. A variable label of around 20 to 40 characters in length works well in practice.

You can also specify value labels in this dialog box. Value labels provide informative names for levels in any categorical variable. Leave the value labels blank for continuous data like weight or height. They do make sense, though, for categorical data like gender. This will serve as a reminder that data values of 1 represents males and 2 females. The last thing you want is for people to think that you can't tell the difference between males and females.

Value labels have to be defined one by one. Type in the number (or letter) code for your category in VALUE field, the value label in the VALUE LABEL field just beneath it and then click on the ADD button. Repeat this for your second category level and so forth.

Missing value codes

If needed, click on the MISSING VALUES button to designate missing value codes. Missing value codes are useful for designating data in SPSS where the value is unknown, not applicable or otherwise not provided.

Be careful about missing values. Values can be missing because the subject dropped out of the study. Perhaps you are looking for chemical concentrations that are sometimes too low for a laboratory to detect. Perhaps a subject refused to respond to a certain question. Perhaps you are asking for something like a spouse's age that is not applicable for a single person. Make sure you understand why your data is missing and discuss this issue with anyone you are consulting with. The statistical handling of missing values can vary greatly depending on how the value came to be missing.

When you are planning your project, it is a good idea to select a very clearly impossible code for your missing value. For example, use -1 for a birth weight because any infant with a negative birth weight would float up to the ceiling after delivery. Use a value of 9 to code missing for gender, since it is obvious to most of us that the number of possible genders is much smaller than 9.

Column format

I usually ignore the COLUMN FORMAT button, but you can click here if you like. If you didn't specify a width that differs from the SPSS default width of 8 earlier, you can do so here. You can also tell SPSS to left justify, center, or right justify this column of data in the data editor window. SPSS chooses a logical default of left justification for strings and right justification for just about everything else.

Example

Let's see how to document a column of data that represents marital status. Marital status is a categorical variable with five codes (1=single, 2=married, 3=divorced, 4=widowed, 9=unknow). First we have to choose "marit_st" as a variable name. The eight character limit forces us to select an abbreviated description like this or mar_stat.

We use numeric codes for this variable, so we keep the NUMERIC option selected. With no values larger than 9, we could change the WIDTH field a little bit, but anything much smaller than 8 makes it difficult to see the variable name and the value labels later. The number codes here do not require any decimals, so we change the DECIMAL PLACES field from 2 to 0.

A nice variable label is "Marital Status of the Infant's Mother". Notice that we can include an apostrophe here. I also used a mixture of upper and lower case. This is easier to read than all lower case and much easier than all upper case.

The value labels are "Single"; "Married"; "Divorced"; "Widowed"; and "Unknown". Notice again that I use mixed case. Value labels are appropriate here because this is a categorical variable. For a continuous variable like birth weight, we would leave the value labels blank.

Finally, I designate 9 as a missing value.

Summary

Baffled Bill needs to provide some documentation to SPSS data sets that he is creating. Professor Mean explains that you add documentation by selecting DATA | DEFINE VARIABLE from the SPSS menu or double clicking on the column header. You can then provide information about the variable name, the format type, the variable label, the value label, and the missing value code. You should invest some time now with documentation because SPSS can easily produce dozens of graphs and hundreds of tables. Good documentation will help you keep your bearings in all of this output.

This webpage was written on 1999-08-18 and was last modified on 2008-07-08. Category: Ask Professor Mean, Category: Data management, Category: SPSS software


Inputting a two-by-two table into SPSS.

Dear Professor Mean, I have the following data in a two by two table:

  D+ D- Total
F+ 34 23 57
F- 139 119 258
Total 173 142 315

When I try to enter this data into SPSS, I can't get it to compute risk ratios and confidence intervals. What am I doing wrong? -- Jinxed Jason

Dear Jinxed,

You have values ranging from F- to D+? I hope this isn't data on the grades you received in college.

Actually these data are from a paper: Sands et al (1999). F+ represents presence of a risk factor (in this case, previous miscarriage) and F- represents absence of that risk factor. D+ represents presence of a defect (ventricular septal defect or VSD) and D- represents absence of that defect.

Risk
Factor
Group Number/Total
(Percent)
Odds Ratio
(95%CI)
Miscarriage VSD
Control
34/173 (20%)
23/142 (16%)
1.3 (0.7,2.3)
Female VSD
Control
84/173 (49%)
60/142 (42%)
2.1 (1.3,3.2)
Low parity VSD
Control
76/173 (44%)
58/142 (41%)
1.1 (0.7,1.8)
Smoking VSD
Control
41/173 (24%)
39/139 (28%)
0.8 (0.5,1.3)
Alcohol VSD
Control
18/173 (10%)
20/139 (14%)
0.7 (0.4,1.5)

Notice that we have to do a bit of arithmetic to get all the values. If 34 out of 173 VSD cases had a previous miscarriage, then 139=173-34 did not. If 23 out of 142 controls had previous miscarriage as a risk factor, then 119 did not.

For data like this, you have to re-arrange things and then apply weights. The following discussion talks about SPSS, but the general method works for most other statistical software.

To re-arrange the data, you need to specify three variables: F, D, and COUNT. F takes the value of 1 for F+ and 0 for F-. D takes the value of 1 for D+ and 0 for D-. The 0-1 coding has some nice mathematical properties, but you could use 1 and 2 instead. For each combination of F and D we will record the sample size in COUNT.

Here's what your re-arranged data would look like

Enter the data, and tell SPSS that W represents a weighting variable, and you're ready to rock and roll. You do this by selecting Data | Weight Cases from the SPSS menu.

Then select Analyze | Descriptive Statistics | Crosstabs from the SPSS menu to create a two by two table.

Be sure to click on the Statistics button and select the Risk option box to ask SPSS to compute the risk ratios.

I also usually find it useful to display the row percentages. To do this, click on the Cells button.

In the Crosstabs: Cell Display dialog box, select the Row Percentages option box.

Here's what the first part of the output looks like.

Notice that the rows and columns are reversed in this table. There are several ways to change how the table is displayed, but it is showing essentially the same information in any order.

Here is what the second part of the output looks like.

By the way, if you tried to use the crosstabs procedure without weighting, you would get exactly one observation in each cell. Pretty boring, eh?

Summary

Jinxed Jason can't figure out how to enter data from a two by two table into SPSS. Professor Mean explains that you need three variables to represent a two by two table. The first variable indicates the specific column of your table and the second variable indicates the specific row (or vice versa). The third variable indicates the count or frequency for each intersection of row and column. You do not include the row or column totals in your data entry. You can then select Analyze | Descriptive Statistics | Crosstabs from the SPSS menu to analyze the data from your two by two table. You get additional analyses by selecting the Risk and/or Chi-square option boxes.

Further reading

  1. Incidence and risk factors for ventricular septal defect in "low risk" neonates. Sands AJ, Casey F, Craig B, Dornan J, Rogers J and Mulholland H. Arch Dis Child Fetal Neonatal Ed 1999:81(1);F61-F63. This paper is available on the web.

This webpage was written by Steve Simon on 1999-08-18 and was last modified on 2008-07-14. Send Category: Ask Professor Mean, Category: Data management, Category: SPSS software


Date calculations in SPSS.

Dear Professor Mean, I am trying to use dates in SPSS for certain calculations. For example, I want to use a compute statement in SPSS to create a new variable called duration of injury (durinj). I know that I must subtract the date of injury from the date of interview. However, when I do this, I get a number in the millions. What am I doing wrong? -- Stumped Sharon

Dear Stumped,

Maybe your patients were waiting for their HMO to approve a visit to a specialist.

Short explanation

SPSS stores date/time values as the number of seconds since October 14, 1582 (the start of the Gregorian calendar). If you specify only a date and not a time, then SPSS sets the time to midnight. When you subtract two dates, you get the duration of injury in seconds. Divide by 86,400 (=24*60*60) to get the duration of injury in days. Divide again by 7, 30, or 365.25 to get duration in weeks, months, or years.

More details

To see what SPSS is doing, reformat the date as a number. You will see something with a whole lot of digits. The date of 1/1/2000, for example, the date when all the antiquated software in the world will crash, is just a little more than 13 billion seconds to SPSS. Fortunately, SPSS allocates more than two digits here.

To subtract one column of numbers from another in SPSS, you select TRANSFORM | COMPUTE from the menu. Tell SPSS what name you want for this difference in the TARGET VARIABLE field. Then select the first variable and add it to the NUMERIC EXPRESSION field. Type in a minus sign (or click on the minus button in the mini calculator). Finally, select the second variable and add it to the NUMERIC EXPRESSION field after the minus sign.

If you are using dates, then this time interval is expressed in seconds. Place parentheses around the entire expression. Then place a slash at the end, followed by 86400. Dividing by 86400 changes the units from seconds to days.

Example

A common example using dates is computing length of stay in the hospital.

Figure 3.1 (129078 bytes)

The data shown above represents the birthdate (dob), date of admission to the hospital (dateadm), and date of discharge from the hospital (datedsc) for newobrn babies admitted with a diagnosis of dehydration. To compute length of stay, you need to select TRANSFORM | COMPUTE from the SPSS menu.

Figure 3.2 (155190 bytes)

The figure shown above is the dialog box that you get. Type in a new name in the TARGET VARIABLE field. The formula for computing length of stay is

(datedsc-dateadm)/(24*60*60)

which you should type into the NUMERIC EXPRESSION field. Then click on the OK button.

Figure 3.3 (129078 bytes)

This figure shown above indicates that you have successfully computed length of stay as the difference between two date values. Congratulations.

Summary

Stumped Sharon is having problems with some calculations using dates in SPSS. Professor Mean explains that SPSS stores data values as the number of seconds since October 14, 1582. So when you calculate the difference between two date values, you see the number of seconds between the two events. Divide this difference by 86,400 (=24 hours * 60 minutes * 60 seconds) to re-express this as days.

Further reading

Raynald Levesque has a nice web tutorial about dates in SPSS.

  1. Dates Tutorial. Raynald Levesque. Accessed September 7, 2001.
    http://pages.infinit.net/rlevesqu/LearningSyntax.htm#DateTutorial

Update

I attended a web seminar on the new enhancements in version 13.0 of SPSS software. The most notable change is in date calculations.

Date and time variables in SPSS have always been difficult. I have a web page showing some of the issues involved with computing the difference between two dates. SPSS has now added a Data and Time Wizard. Select TRANSFORM | DATE/TIME from the menu. Here's the first dialog box from that menu.

This is a very pleasant surprise, since dates are a source of constant confusion for me and for the people I work with. There were other enhancements, but to me this is the only important one.

This webpage was written by Steve Simon on 1999-08-18 and was last modified on 2008-07-14. Send Category: Ask Professor Mean, Category: Data management, Category: SPSS software


Stats >> Training >> Stats #01: Practice exercises

1. On your floppy disk, you will find a file: bf.xls. This file is a Microsoft Excel spreadsheet that has been stored in version 3.0 format. Import this data into SPSS.

2. On your floppy disk, you will find a file: practice.mdb. This file is a Microsoft Access database. Import data from the demographics table into SPSS.

3. Define a variable named "Race" in SPSS. Set the type to numeric with zero decimal places. Use "Race/Ethnicity" as the variable label. Use the following value labels:

  • 1=White
  • 2=Black
  • 3=Hispanic
  • 8=Other
  • 9=Unknown

Set the value of 9 to represent missing data.

4. Use the data from Table 1. Create five columns of data in SPSS following the general form shown above.

There are five variables in the data set

  • Anonymized Identification Code
  • Breast Feeding Status
  • Mother's Age (years)
  • Marital Status
  • Birth Weight (kilograms)

Select brief names for each of these variables.

Define the five variables with the following format types

  • string of length 7.
  • numeric with zero decimals
  • numeric with zero decimals
  • numeric with zero decimals
  • numeric with three decimals

Be sure to add the following variable labels (use the list shown above)

Leave the value labels undefined for the first, third, and fifth columns. Define the following value labels for the second column:

  • 0=no BF
  • 1=partial/exclusive BF
  • 9=unknown

Define the following value labels for the fourth column

  • 0=unmarried
  • 1=married
  • 9=unknown

Declare missing values for the last four columns as

  • 9
  • -1
  • 9
  • -1

The first column does not have any missing values.

After you have documented this information, but before you have entered any data, save the file. Then enter the data as it appears in Table 1, and save the file again.

5. Table 2 shows the dates of birth and dates of admission to the hospital for the first ten infants in a study of jaundice in infants. Enter this data into SPSS. Add variable labels to each variable. Enter the data from Table 2 into SPSS. Compute the age at admission by using the formula

(adm_date-bir_date)/(24*60*60)


Table 1. Use this data for exercise #4.

     J760223         0        18       1       1.550
     J676434         0        33       0       1.990
     J689673         0        34       1          -1
     J785310         0        36       1       1.640
     J785827         1        26       1       2.060
     J562494         9        28       1       1.685
     J675320         9        -1       9       2.435

Table 2. Use this data for exercise #5.

     id bir_date   adm_date
     11 03/10/1998 03/16/1998
     12 01/26/2001 02/01/2001
     13 07/18/2000 07/23/2000
     14 07/11/2002 07/15/2002
     15 01/17/2001 01/20/2001
     16 02/06/2001 02/20/2001
     17 08/20/2002 08/24/2002
     18 01/15/2001 01/21/2001
     19 09/19/2001 09/23/2001
     20 12/05/2002 12/08/2002


Please fill out an evaluation form. Your input is important. These evaluation forms also ensure that we can offer Continuing Medical Education credits for this class.