Tallinn University
School of Natural Sciences and Health

Project summary

To find out whether there is a difference between the average age of homicide offenders and victims, two-sample t-test was considered to be conducted. However, some issues with data transformation proved the task to be more difficult than first anticipated.

Introduction

The FBI’s Crime Data Explorer (CDE) contains a large amount of data on criminal, noncriminal and law enforcement related activities. For the data analysis, expanded homicide offense and victim counts in the United States (2021) sorted by age, was chosen. After viewing the data, a research question was set: Is there a difference between the average age of offenders and the average age of victims?

Dataset description

Offender vs. victim demographics (Table 1) based on age was explored. The data describes expanded homicide offense characteristics in the United States (2021) and is provided by UCR (Uniform Crime Reporting) Program. The data can be viewed and downloaded from the following link: https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/explorer/crime/shr

Both datasets had to be separately downloaded. The variables include the age of offender and victim, and their quantities (‘value’). Packages knitr and kableExtra were used to modify the tables, making them easier to read.

Table 1. The amount of offenders and victims in each age category
Age Offenders Victims
20 to 29 4914 4434
30 to 39 2913 3414
10 to 19 2490 1782
Unknown 2169 191
40 to 49 1304 1999
50 to 59 745 1291
60 to 69 266 741
70 to 79 82 304
80 to 89 33 88
0 to 9 29 401
90 to Older 18 32


There are 11 entries in both datasets and 3 columns total.

Top 5 age categories for most offenders and victims were also viewed (Tables 2 and 3).

Table 2. Offenders
Age value
20 to 29 4914
30 to 39 2913
10 to 19 2490
Unknown 2169
40 to 49 1304
Table 3. Victims
Age value
20 to 29 4434
30 to 39 3414
40 to 49 1999
10 to 19 1782
50 to 59 1291


It looks like the most common age to be involved in homicide is between 10 to 49, including ‘unknown’ age. The most common age for becoming a victim of a homicide is around 10-59.

Scatter plots for both offenders and victims, created by using ggplot2 package, are positively skewed (Figures 1 and 2).

**Figure 1.** Age range  of the offenders

Figure 1. Age range of the offenders

**Figure 2.** Age range  of the victims

Figure 2. Age range of the victims

Methods used

two-sample t-test was considered to be the most suitable option to answer the following research question: Is there a difference between the average age of offenders and the average age of victims? Although the data does not follow the normal distribution, the groups are larger than 15, which means they are large enough to give valid results. Another assumption also applies: the two samples are independent. However, a problem had arisen when the third assumption was attempted to be checked. The data should have been transformed in a way that the first column represents the age categories and the second column represents the groups (offenders and victims) in a way, that the values are not summarized, but are instead separated. Unfortunately, after trying to correctly transform the data, the solution was not found and therefore F-test could not be carried out. The p-value of F-test would give us the indication whether the two groups have the same or different variance, which also helps in choosing a correct t-test to use. F-test would have been performed with the following code: var.test(Group ~ Age, data=data).

Usually, when data is skewed, two-sample Wilcoxon test is computed through wilcox.test() function. The code for two-sample t-tests is similar to the one used for F-test, which once again would require data transformation into a correct dataset.

Analysis

The hypotheses were set:

The null hypothesis: There is no difference between the means of the two groups.

The alternative hypothesis: There is a difference between the means of the two groups.

Discussion of the results

If the results from the t-test would give the p-value of below 0.05, which is the maximum alpha significance level, this would mean that the alternative hypothesis would be accepted. By looking at the plots and the numbers, it looks like there would not be a significant difference between the average ages of offenders and victims. However there are no numbers to confirm this assumption.

Factors that should be considered when discussing the results: one offender could have murdered more than one person during their lifetime, because the data are only for the year 2021; there could be more offenders and victims of that year that have not been discovered by the time of publishing the data; one offender could have murdered multiple victims in that same year.

Conclusion

Both homicide offenders and victims showed positive skewness, which was to be expected, as older population has less energy, means and strength to carry out such a crime. As for victims, reasons could vary from already being old enough to have less years to live, to being placed into a retirement home, where the person is more likely staying out of conflicts.

All in all, to get more telling results, other data should be included into such a project. For instance, the FBI’s Crime Data Explorer also has data on victim circumstances, victim’s relationship to the offender, and weapons used in murder. Another aspects to consider are different areas with different population densities, economic factors and employment rates, among others. However, this kind of in-depth analysis would require better planning in terms of time, and would be better to do in a team.