Tutorials > SPSS > IPUMS Extracts
1. The Minnesota Population Center has an excellent website that helps guide you through downloading and understanding census extracts. This tutorial should help walk you through those steps.
The first thing you need to do is register with the Minnesota Population Center to use their IPUMS-USA (Integrated Public Use Microdata Series) data. The stipulations allowing use of the IPUMS data are laid out here, so take a moment and read up on what they require. They also clarify what you can use the data for (for us, it is mere teaching purposes) at the bottom of the sample registration form here. Essentially, they want you to know that this is not a genealogical tool (you can't trace your or other's family), you must notify and cite IPUMS if you publish any of your findings, and you cannot use the data for commercial purposes. If you stick to the confines of this class, you'll be fine.
You will, however, need to follow the steps to apply for an account, using your NYU email. Request an IPUMS account here. Note that this may take a little while to approve, so expect a delay before confirmation.
2. Once you receive your approval and can log-in, you can start "shopping" for the variables you want in your extract. For the purpose of this tutorial, you'll assemble a smaller, specific extract for Manhattan and Brooklyn that will familiarize you enough with the process to apply in the labs.
Go to "Select Data" from the main menu at the top of the header. This will bring up the variables drop-down lists. Try browsing through the lists of variables, which are organized into Household variables and Person (i.e. individual) variables.
The first thing you need to do is register with the Minnesota Population Center to use their IPUMS-USA (Integrated Public Use Microdata Series) data. The stipulations allowing use of the IPUMS data are laid out here, so take a moment and read up on what they require. They also clarify what you can use the data for (for us, it is mere teaching purposes) at the bottom of the sample registration form here. Essentially, they want you to know that this is not a genealogical tool (you can't trace your or other's family), you must notify and cite IPUMS if you publish any of your findings, and you cannot use the data for commercial purposes. If you stick to the confines of this class, you'll be fine.
You will, however, need to follow the steps to apply for an account, using your NYU email. Request an IPUMS account here. Note that this may take a little while to approve, so expect a delay before confirmation.
2. Once you receive your approval and can log-in, you can start "shopping" for the variables you want in your extract. For the purpose of this tutorial, you'll assemble a smaller, specific extract for Manhattan and Brooklyn that will familiarize you enough with the process to apply in the labs.
Go to "Select Data" from the main menu at the top of the header. This will bring up the variables drop-down lists. Try browsing through the lists of variables, which are organized into Household variables and Person (i.e. individual) variables.
IPUMS has its website nicely organized to give you important summary information about each variable so you know what you are working with. Take a look at the Technical variables under Household, for example. Here you find that the data will contain items like the census year (1850, 1860, 1880, etc.), and that this variable is named YEAR. Similarly, every household in the census samples were given an identifying number, SERIAL. If you click on the hyperlinked "codes," it will tell you everything you need to know about how the census treated that information, how IPUMS coded the values, what values are available (if it is is a limited nominal or ordinal variable), and any issues to look out for when analyzing that variable in your data. This is handy information--refer to it often!
Note in the Technical variables that some are "preselected." This means that those variables are essential to your use of the data, and must be included in every extract. One of those variables is HHWT, or household weight. This is a very important variable as most census extracts available on IPUMS are samples (1%, 2%, 5%, 10%, etc.) of a population. Thus, in order to get accurate statistics for 100% of the U.S. in a given year, you must apply the IPUMS HHWT as a multiplier of case sums. IPUMS calculates this in a more complex fashion, but to give the basic idea, numbers of cases in a 1% sample have to be multiplied by 100 to get an accurate count. Thus, if you have 210 German school teachers in your sample, there would be 21,000 in all of the U.S.
When using the 1880 full census count, however, this is already a 100% sample. A weight is not needed (specifically, every individual record's weight is 1)
3. Back on the main Select Variables page, click on the button marked "Select Samples." This allows you go choose a census year (or years).
Make sure that all of the boxes are unchecked by unchecking "Default U.S. Sample from Each Year." Scroll down and select the 1880 100% sample near the bottom of the page. Click on submit sample selections.
Next, add a few variables to analyze. To add a variable, find it from the drop down menu and click on the yellow circle to its left. Add the following:
Click on "View Cart" in the upper right-hand corner. You will see a list of your variables. Click on the green bar at the top that says "Create Data Extract."
Limiting cases: You can reduce the number of cases or focus on a certain sub-area (though it is advisable not to select too small of an area because this is a national-level dataset) by instructing IPUMS to only give you the certain records from certain geographic areas or those fulfilling only certain conditions (e.g. the Irish-born). To do this select "Select Cases."
Next, add a few variables to analyze. To add a variable, find it from the drop down menu and click on the yellow circle to its left. Add the following:
- From Household >> Geographic, select the variable CITY (click on the yellow plus circle to add)
- From Person >> Race, Ethnicity, and Nativity, select BPL
- From Person >> Demographic, select AGE, SEX, and BIRTHYR
- From Person >> Work, select OCC
Click on "View Cart" in the upper right-hand corner. You will see a list of your variables. Click on the green bar at the top that says "Create Data Extract."
Limiting cases: You can reduce the number of cases or focus on a certain sub-area (though it is advisable not to select too small of an area because this is a national-level dataset) by instructing IPUMS to only give you the certain records from certain geographic areas or those fulfilling only certain conditions (e.g. the Irish-born). To do this select "Select Cases."
4. After making any case selections and hitting submit, on the data screen that pops up, you will have a long delay (even as along as an hour for large files) while IPUMS assembles your extract. In fact, you may need to wait several minutes until you receive a confirmation email. But when it is ready, you'll see a screen that looks like this:
On the left, you'll see a hyperlink marked "Data," and next to it your command files for SPSS, etc. Right click on the "Data" hyperlink and select "Save Link As." Save them to a directory on your computer that you will know how to get to, such as the desktop. Do the same for the "SPSS" hyperlink,
5. Lastly, find the Data file on your computer, which will be called usa_00001.dat.gz, right click on it, and unzip (extract) all of its files to your computer. Make a note of the directory for the folder containing all of the unzipped files (should be listed in the address line of your Windows explorer.
Note: You may need to download Winzip in order to extract your files if you computer isn't already equipped with a file extractor.
Opening the extract in SPSS through your Virtual Computer Lab
1. Open up SPSS through your Virtual Computer Lab. If prompted to open a previous file, click cancel and bring up the default data window. Go to File >> Open >> Syntax. In the dialog window, browse to find the syntax file (recognizable because of its .sps file ending). This is the file you downloaded from the SPSS hyperlink from IPUMS.
2. Near the top of the opened SPSS syntax command file you will find a line that says
"data list file = "usa_00001.dat" /
Change that line so that it points to the data file. This is a little tricky since we are working through the VCL. But here's an example. If you saved your usa_00001.dat file to C:\Users\Desktop\, you must enter the line \\Client\C$\Users\Desktop\usa_00001.dat between the quotation marks, like this:
data list file = "\\Client\C$\Users\Desktop\usa_00001.dat" /
On a Mac, where the root directory is slightly different, it would be as follows for a .dat file saved on your desktop:
data list file = "\\Client\C$\Desktop\usa_00001.dat" /
Basically, put \\Client before whatever letter drive you used, and a $ sign after the drive name. Omit colons.
3. Go to Run on the main menu and select "All." SPSS will then open your file (there will be a delay because of the size of the number of cases, about 733,000 individuals, but it should eventually open). If you have any troubles, follow the IPUMS instructions.
You can close the syntax editor now, since you won't need it as long as you save the new data as a SPSS .sav file.
5. Lastly, find the Data file on your computer, which will be called usa_00001.dat.gz, right click on it, and unzip (extract) all of its files to your computer. Make a note of the directory for the folder containing all of the unzipped files (should be listed in the address line of your Windows explorer.
Note: You may need to download Winzip in order to extract your files if you computer isn't already equipped with a file extractor.
Opening the extract in SPSS through your Virtual Computer Lab
1. Open up SPSS through your Virtual Computer Lab. If prompted to open a previous file, click cancel and bring up the default data window. Go to File >> Open >> Syntax. In the dialog window, browse to find the syntax file (recognizable because of its .sps file ending). This is the file you downloaded from the SPSS hyperlink from IPUMS.
2. Near the top of the opened SPSS syntax command file you will find a line that says
"data list file = "usa_00001.dat" /
Change that line so that it points to the data file. This is a little tricky since we are working through the VCL. But here's an example. If you saved your usa_00001.dat file to C:\Users\Desktop\, you must enter the line \\Client\C$\Users\Desktop\usa_00001.dat between the quotation marks, like this:
data list file = "\\Client\C$\Users\Desktop\usa_00001.dat" /
On a Mac, where the root directory is slightly different, it would be as follows for a .dat file saved on your desktop:
data list file = "\\Client\C$\Desktop\usa_00001.dat" /
Basically, put \\Client before whatever letter drive you used, and a $ sign after the drive name. Omit colons.
3. Go to Run on the main menu and select "All." SPSS will then open your file (there will be a delay because of the size of the number of cases, about 733,000 individuals, but it should eventually open). If you have any troubles, follow the IPUMS instructions.
You can close the syntax editor now, since you won't need it as long as you save the new data as a SPSS .sav file.