Firearm Injuries 2004 - 2014
Firearm Injuries is a data file with data from the National Archive of Criminal Justice Data regarding non-fatal firearm injuries in the US between 2004 and 2014. This data file is used by the New York Times to train journalist and editors in data analysis and visualization. In this post, we attempt to answer some of the questions posed by the training materials using ArgonStudio.
Which gun type is the most common?
- First step is to load the CSV
- Note that all the columns are entirely numeric. Some of the
column values are indicators which are referenced
in this document. For example
the column FIREARM_C is the firearm type code and is
defined in the document as shown below.
- We extract this data as CSV and use it to join with our
file. Here is the extracted CSV.
- Let us now do the join. Click on the grid menu bars
(1) and choose Join CSV
- In the Join CSV dialog, choose the CSV file to join
by click the file open button
(1). Select the CSV file and
open it. Pick the source column
(3) to join with the target
column (2). Leave the Join
Type at Inner Join
(4) and click Join
- After joining (and changing the name to firearm), we
see that firearm type is easier to identify.
- Let us now compute some metrics using this data. Navigate to
the Analytics tab and ensure that Grouped pill
- Specify terms for the bucketing function and choose firearm for the bucketing column (1).
- Choose count for the metric function (2) and firearm for the metric field (3).
- This procedure groups records using the firearm field and counts instances of firearm for each group. The result is the number of firearm related incidents by firearm type.
Who is most likely to shoot you?
- Let us now look at the most common aggressor in
firearm-related injuries. This value is stored in the
field WHO_C with the values defined
in the reference document as
- We again need to join the CSV defining the WHO_C
column values with the injuries CSV as above. The result
after joining the who definitions are shown below.
- As before, we need to aggregate this data in
the Analytics panel as follows - basically
setting bucket field and metric field
to who and sorted by descending.
In what percentage of shootings did victims wait 3 or more days before getting treatment?
This question needs a couple more steps compared to the ones above. The relevant fields are: TRDATE which is the date the injury in question was treated, and the INJDT_C which is the date of the injury. So the difference of these two dates is the number of days the victim waited to get the injury treated. We need to know the percentage of cases where the victim waited more than 3 days to seek treatment.
- First we need to convert the above fields to the date
type. These fields are specified in the format YYYYMMDD
and we can convert the column to the date type. The following
image shows the date columns (before conversion).
- To convert these columns to the date type, click the
column header menu-bar and choose Update Value.
- In the dialog that pops up, ensure
that Pre-configured tab is selected
(1). Choose Convert to
date from the drop-down
(2). Click the drop-down at the
end of the format input field
(3) and choose a date format
that matches the column data
(4). The appropriate date
format is copied to format-input field
(4). Click the Update
button in the dialog which applies the update.
- Repeat the same for the INJDT_Ccolumn. The result of
converting to date is shown below.
- Let us now add a new column for computing the difference
between treatment date and injured date. Fill the column as
shown below. The column name is WAITED_FOR_TREATMENT
(1), added after INJDT_C
(2), and computes whether the
victim waited for more than 3 days with the expression
row => datediff(row.TRDATE, row.INJDT_C, 'day') >= 3
- In the following picture, we see that the
column WAITED_FOR_TREATMENT has been added and set
to true or false depending on whether the victim
waited for 3 or more days to seek treatment.
- We now need to create an aggregation to find out in how many
cases did the victim wait for 3 or more days. Click
the Analytics tab (1)
and make the following selections in the Grouped tab
(2). Choose WAITED_FOR_TREATMENT
for bucket terms
(3), count for the
metric function and WAITED_FOR_TREATMENT again for the
metric field (4). The results
appear in the table below (5).
- Finally we have the answer. In 350 cases, the victim waited for 3 days or more to seek treatment out of a total of 45158 for a percentage of 0.775%.