Census: Characteristics of Older Americans
In this article, we analyze another of New York Times training data file called Census: Characteristics of Older Americans. These files are a collection of data files that NYT uses to train their journalists and editors in data analysis. They present a series of questions which can be answered by analyzing the data file.
Loading the CSV data into the grid editor
We begin by loading the CSV data into the grid editor as explained here.
Which states had the largest percentage increase in 60+ residents between 2009 and 2016?
- For computing the percentage increase in 60+ population for
each state, we need to add a new column. Click on the
menu-bars icon and select New Column from the menu.
- In the Add Column dialog, enter the
name PopChange and click the Add Column
- Let us now update the column value to reflect the population
change. Click on the newly added column header to show the
dropdown and choose Update Value.
- In the Update Column Value dialog, ensure
expression into the text box, and click Update
(value, row) => row["60 years and over"] - row["60 years and over (2009)"]
- We see that the value of the column PopChange is updated. We, however, need the percentage change. So click the column dropdown, and choose Update Value again.
dialog to the following and click update.
(value, row) => sprintf("%.2f", 100*(row["60 years and over"] - row["60 years and over (2009)"])/row["60 years and over (2009)"])
- We now end up with the percentage change in the 60+
population in the PopChange column.
- Let us now sort this column in descending order to find the top states with the largest percentage increase in 60+ residents between 2009 and 2016.
- From the image, it appears that Alaska, Colorado and Nevada are the top three states where the population of 60+ residents has increased.
Biggest difference between the mean earnings of the general population versus those over 60
To find the difference in the mean earnings of the general population versus those over 60, we need to add a new column for computing the difference.
- Click the grid menu bar (1)
and choose New Column.
- In the New Column dialog,
enter DiffMeanEarnings for the name
(1). Ensure that
(2 ), enter the following
expression for the value (3)
and click the Add Column button
(value, row) => row["Pop; mean earnings "] - row["60+ Pop; mean earnings"]
- After the new column is added, click on the header to sort
it in descending order. From this, we can find the states
with the biggest difference between the mean earnings of the
In which state of the south are the seniors most responsible for non-adult grandchildren?
This is quite easy. All you have to do is filter the records for southern states and sort by Percent of 60+; living with non-adult grandchild.
- To filter for southern states, click the column drop-down on
the Region column (1),
and select Apply Filter. The Search Column dialog
- In the Search Column dialog, drag values to search
for from the Available list
(1 ) to the Filter for
list (2). Here we have
dragged S (3) to
search for. Click Apply to perform the search.
- We now have only the southern states after filtering.
- We now need to sort these results by the column Percent of 60+; living with non-adult grandchild to find the top states where seniors are responsible for non-adult grandchildren. At the top are Texas (7.7%), Georgia (6.9%) and Mississippi (6.2%).
Which Southern state has the lowest percentage of people over 60 living alone?
For this statistic, we use the same filter from the previous section to search for southern states, and then sort by the column Percent of 60+; Living alone in ascending order. The results are shown above. The top states are Texas (37.3%), Georgia (38.4%) and South Carolina (38.4%).
Which state's over 60 population has the highest share of black residents?
Sort by the column Percent of 60+; black in descending order and you have your answer. District of Columbia is first with 59.5% followed by Mississippi with 37.5% and Louisiana with 25%.
In most states, there is a higher percentage of white people in the over 60 population than in the general population. In which state is that not the case?
We add a column to compute the difference between the columns Percent of pop; white and Percent of 60+; white as shown below.
row => row["Percent of pop; white"] - row["Percent of 60+; white"]
Next we sort by this column to get the result: District of Columbia is the state where there is a higher percentage of white people in the general population than in the 60+ population.
Census: Characteristics of Older Americans is a data file used by The New York Times in training their journalists and editors in data analysis. We used these materials to show how the questions posed in it can be done using ArgonStudio. We covered the following topics:
- Adding a new column whose value is computed using the values of other columns. These can include simple calculations or complex computations.
- Viewing outliers in the data values of a numeric column is useful when checking the validity of the data, or variations in data values that are outside of normal.
- Freezing and unfreezing columns help in viewing data in columns of interest and relating it to the frozen column values.
We hope you have enjoyed the material presented in this article. You can now continue to try some of this functionality on your data.