This homework assignment requires several data files
• wdi_data.dta, data on the World Development Indicators.
• yemen_dipl6310.dta, data from the Yemen survey. (See attached survey
instrument)
• Data of your own choosing from the ICPSR, Dataverse, or another database.
This homework should be completed in Stata (or possibly Excel), and written up in Word.
• You should turn this assignment in as a Word .doc or .docx. You will have to find a way to copy graphs and tables into your document. Do not turn in a Stata file. Your document should be clean and easy to read.
• All plots may be either color or grayscale, should have a legend (or have each bar labeled), should have an accurate title, and should have a meaningful and easy-to- interpret y-axis.
• You will run into challenges during this assignment.Be ready to Google and consult Stata (or Excel) forums to figure out how to complete the assignment. Be resourceful.
Part A (30 points). Using the Yemen example data.
1. The independent variable “X2_dev” classifies whether that household was in a “low” or “high” development area. The dependent variable “Q8_1” is a continuous measure of household food expenditure in Yemeni Rial (YER), and ranges from 0 to 300,000.
a. Create a histogram of “Q8_1”. Explain what you see in this graph.
b. Create a bar plot or box plot showing the relationship between “X2_dev”
and “Q8_1” variable.
c. At first glance, does it look like development level has an effect on food
spending? Why do you think that?
2. State a general hypothesis about the relationship between the “X2_dev” variable
and the dependent variable “Q8_1”. How do you expect development level to relate to spending?
a. State the null hypothesis that matches this hypothesis.
b. What is the appropriate kind of difference of means test for these
data/samples?
3. Conduct a difference of means test comparing the two development groups.
c. Copy-paste these two tables into your write-up.
d. Interpret the tables in 1-3 sentences. Do you reject or fail to reject the null
hypotheses? In plain language, what does it tell us?
4. Can we conclude that there is a causal relationship between development and
household spending on food? Why or why not? What can we conclude?
Part B (30 points).
5. Using data from the world development indicators (wdi_data),
DIPL6310, Assignment 3
a. Create a histogram of GDP per capita for 2012. Create a variable for logged GDP per capita. Do they look different? How?
b. What is the median GDP per capita in this dataset in 2012? What is the mean GDP per capita in 2012? Are they different? Why or why not?
c. Plot GDP per capita over time (2006-2017) for three different countries on the same figure. Label it clearly and make it easy to read and interpret.
6. Do the following:
a. Create a scatterplot showing the relationship between oil rents and
democracy for the year 2006.
b. Explain what you see in this graph.
7. State a simple theory and hypothesis about the relationship between GDP per capita and fertility. (e.g. “As GDPPC increases/decreases, fertility should _______ because ________.”)
8. Create a scatterplot of GDP per capita and fertility for the year 2012.
a. Use Stata to draw a best-fit line through this scatterplot with 95%
confidence intervals (It takes a lot of work to add 95% confidence intervals in Excel; instead, you may just report the 95% confidence intervals from the regression model.)
b. Is the slope of this line positive or negative? What does this mean for your hypothesis? Does it look like a linear relationship?
9. Report the regression table between GDP per capita (X) and fertility (Y) for 2012. (e.g. y=15-2.4x)
10. BONUS (2 points): Show me a graph with a line that fits better. This can be done by transforming variables or through fancy graphwork in Stata.
Part C (20 points).
11. Access the ICPSR data or other resource on SHU libraries (or from another discussed source) and download a dataset relevant to your research design project. To keep it simple, you may want to use data with country or country-year as the unit of analysis. (I highly recommend completing this portion with data you can use in your project. However, if you cannot think of any data that would both serve your project and help you complete this assignment, you may continue using other variables in the WDI data or our survey data.)
a. Write a short paragraph explaining the variables you have chosen. (What is the unit of analysis? Does it cover multiple countries? Multiple years? Talk about at least three key variables within. You may need to locate the data’s codebook.)
b. Import the data into Stata. Select a non-dichotomous variable (in other words, it should have more than two values). Explain what the variable measures, and specify whether it is nominal, ordinal, or continuous (e.g. interval or ratio).
c. Create a histogram or barplot showing the distribution of the variable (that is, the frequency or percent of observations for each value).
d. Create a scatterplot or boxplot showing the relationship between two variables. This should not be a change in one variable over time.
i. Explain what the graph shows.
DIPL6310, Assignment 3
Part D (20 points). Using any of the data sources.
12. Select and explain two variables. State a simple theory and hypothesis about the relationship between these two variables.
13. Using linear regression analysis, show the relationship between these two variables. Your independent variable should be either dichotomous, ordinal, or continuous. (If you want to use nominal variables, visit my office hours for further instruction.)
a. Copy-paste the regression table into your write-up.
b. Interpret the table in 1-3 sentences. What does it tell us? Does it
provide evidence for your hypothesis? (You may want to insert a
scatterplot with line of best fit to demonstrate your point.)
c. Insert a scatterplot with fit line (and confidence intervals) and
regression equation.
14. Add 2-4 control variables to your regression. Explain briefly why you think each
is an important variable to account for.
d. Copy-paste the multivariate regression table into your write-up.
e. Reinterpret the table in 1-3 sentences. Has controlling for other
variables changed your result for the independent variable of interest? How much has your measure of fit (R2) improved?
DIPL6310, Assignment 3
Leave a Reply