# Grouping CSV data for summarization

Grouping (or aggregation) is a useful operation when attempting to understand trends in data. Some examples:

- Assuming you have a file of purchase transactions, you may
want to sum sales by date. This is an example of applying
a
*metric*of*sum*to a set of transactions that have been*aggregated*by*date*. - Another example would be: finding the average life
expectancy of males and females from death records. In this
case, we would like to compute a
*metric*of*average*for a set of records*aggregated*by*sex*.

In this post, we show you how to load a CSV file into the grid editor, and compute aggregations on the data.

## Loading a CSV file into the grid editor

Let us begin by loading a CSV data concerning the currently ongoing COVID-19 crisis. This data file contains the number of cases by US state and county since Jan 21, 2020, as well as the number of deaths. The data has been compied by New York Times and is available here.

To learn how to load the file into the grid editor, see this article.

## Exploring the Analytics interface

- Once the file is loaded into the grid editor, click on
the
*Analytics*tab. - Ensure that the
*Grouped*pill is active (1). This panel allows you to aggregate data and compute metrics. - The way aggregation works is as follows: group the data
into
**buckets**, and compute a metric function for the rows in each bucket. - The bucketing function is currently set to
*terms*(2 and 3) which requests grouping a row by the value of a column. - Choose the column you want to group-by from the dropdown
(4). Let us
choose
*state*here. - The
*metric function*refers to the method used to compute the metric (5). We choose*sum*here to compute the total of the cases for each state. - The
*metric field*is the field for which the metric function is to be applied (6). We choose*cases*for the metric field.

## Computing totals for grouped data

We would like to determine the number of cases of COVID-19 for
each state. This means we need to *sum*
(2) the field *cases*
(3) for each *state*
(1). We also choose to sort by
state (*sort* set to *key
ascending*)(4).

After making all the selections (*sort* is optional), the
results appear in the table as shown.

## Grouping by multiple columns

Let us now group by the *state* as well as *county*
to determine county-level totals.

- To add multiple fields for grouping, click the
*add*button (1) at the end of the row as shown. - A new bucketing row is added. Make the selections as shown:
choose
*state*for the top row (1) and*county*for the bottom row (2). The results are updated as shown above.

## Summary

Grouping and aggregation is a crucial step in data analysis. In this article, we introduced how to use ArgonStudio to perform data analysis on CSV data. We learnt the following:

- Loading CSV data into the grid editor.
- Computing an aggregation using a metric function on a column value with grouped data.
- Using multiple columns in grouping.