Firearm Injuries 2004 - 2014

Firearm Injuries is a data file with data from the National Archive of Criminal Justice Data regarding non-fatal firearm injuries in the US between 2004 and 2014. This data file is used by the New York Times to train journalist and editors in data analysis and visualization. In this post, we attempt to answer some of the questions posed by the training materials using ArgonStudio.

Which gun type is the most common?

  • First step is to load the CSV into the grid editor.
  • Note that all the columns are entirely numeric. Some of the column values are indicators which are referenced in this document. For example the column FIREARM_C is the firearm type code and is defined in the document as shown below.
  • We extract this data as CSV and use it to join with our file. Here is the extracted CSV.
  • Let us now do the join. Click on the grid menu bars (1) and choose Join CSV (2).
  • In the Join CSV dialog, choose the CSV file to join by click the file open button (1). Select the CSV file and open it. Pick the source column (3) to join with the target column (2). Leave the Join Type at Inner Join (4) and click Join Columns (5).
  • After joining (and changing the name to firearm), we see that firearm type is easier to identify.
  • Let us now compute some metrics using this data. Navigate to the Analytics tab and ensure that Grouped pill is active.
  • Specify terms for the bucketing function and choose firearm for the bucketing column (1).
  • Choose count for the metric function (2) and firearm for the metric field (3).
  • This procedure groups records using the firearm field and counts instances of firearm for each group. The result is the number of firearm related incidents by firearm type.

Who is most likely to shoot you?

  • Let us now look at the most common aggressor in firearm-related injuries. This value is stored in the field WHO_C with the values defined in the reference document as follows:
  • We again need to join the CSV defining the WHO_C column values with the injuries CSV as above. The result after joining the who definitions are shown below.
  • As before, we need to aggregate this data in the Analytics panel as follows - basically setting bucket field and metric field to who and sorted by descending.

In what percentage of shootings did victims wait 3 or more days before getting treatment?

This question needs a couple more steps compared to the ones above. The relevant fields are: TRDATE which is the date the injury in question was treated, and the INJDT_C which is the date of the injury. So the difference of these two dates is the number of days the victim waited to get the injury treated. We need to know the percentage of cases where the victim waited more than 3 days to seek treatment.

  • First we need to convert the above fields to the date type. These fields are specified in the format YYYYMMDD and we can convert the column to the date type. The following image shows the date columns (before conversion).
  • To convert these columns to the date type, click the column header menu-bar and choose Update Value.
  • In the dialog that pops up, ensure that Pre-configured tab is selected (1). Choose Convert to date from the drop-down (2). Click the drop-down at the end of the format input field (3) and choose a date format that matches the column data (4). The appropriate date format is copied to format-input field (4). Click the Update button in the dialog which applies the update.
  • Repeat the same for the INJDT_Ccolumn. The result of converting to date is shown below.
  • Let us now add a new column for computing the difference between treatment date and injured date. Fill the column as shown below. The column name is WAITED_FOR_TREATMENT (1), added after INJDT_C (2), and computes whether the victim waited for more than 3 days with the expression (3):
    row => datediff(row.TRDATE, row.INJDT_C, 'day') >= 3
  • In the following picture, we see that the column WAITED_FOR_TREATMENT has been added and set to true or false depending on whether the victim waited for 3 or more days to seek treatment.
  • We now need to create an aggregation to find out in how many cases did the victim wait for 3 or more days. Click the Analytics tab (1) and make the following selections in the Grouped tab (2). Choose WAITED_FOR_TREATMENT for bucket terms (3), count for the metric function and WAITED_FOR_TREATMENT again for the metric field (4). The results appear in the table below (5).
  • Finally we have the answer. In 350 cases, the victim waited for 3 days or more to seek treatment out of a total of 45158 for a percentage of 0.775%.