Data retrieved from: https://www.kaggle.com/sathutr/global-suicide-data?select=gender_rates.csv
import pandas as pd
import matplotlib.pyplot as plt
Here I declare 3 DataFrames by using the pd.read_csv method.
df = pd.read_csv('archive/suicide_by_age.csv')
df2 = pd.read_csv('archive/suicide-rates-by-country.csv')
df3 = pd.read_csv('archive/Male-Female-Ratio-of-Suicide-Rates.csv')
Comparing df2 and df3 using the head method shows the data that we are dealing with.
df2.head()
Entity | Code | Year | suicide rate (age-adjusted suicides per 100,000 people) | |
---|---|---|---|---|
0 | Afghanistan | AFG | 2002 | 6.867054 |
1 | Afghanistan | AFG | 2004 | 6.684385 |
2 | Afghanistan | AFG | 2005 | 6.684385 |
3 | Albania | ALB | 2002 | 2.792918 |
4 | Albania | ALB | 2004 | 7.699330 |
df3.head()
Entity | Code | Year | Male-Female Ratio of Suicide Rate | |
---|---|---|---|---|
0 | Afghanistan | AFG | 2004 | 0.566016 |
1 | Albania | ALB | 2004 | 1.539476 |
2 | Algeria | DZA | 2004 | 1.118298 |
3 | Andorra | AND | 2004 | 2.772599 |
4 | Angola | AGO | 2004 | 2.636721 |
This shows the data frame merging of data frames df2 and df3 into a single data frame named df4 by merging on 3 properties, dropping the null values, grouping by by Entity and Year, and then finally finding the mean on each instance of those groupings to be able to have each countries average suicide rates for both types, adjusted per 100000 people and Male-Female Ratio of Suicide Rate. Construction and manipulation of data frames is apparent from this cell.
df4 = pd.merge(df2, df3, how='outer', on=['Entity', 'Code', 'Year']).dropna().groupby(by=['Entity', "Year"]).mean()
df4
suicide rate (age-adjusted suicides per 100,000 people) | Male-Female Ratio of Suicide Rate | ||
---|---|---|---|
Entity | Year | ||
Afghanistan | 2004 | 6.684385 | 0.566016 |
Albania | 2004 | 7.699330 | 1.539476 |
Algeria | 2004 | 4.848770 | 1.118298 |
Andorra | 2004 | 5.362179 | 2.772599 |
Angola | 2004 | 14.554677 | 2.636721 |
... | ... | ... | ... |
Yugoslavia | 1988 | 14.999390 | 2.527963 |
1989 | 15.048520 | 2.631449 | |
1990 | 13.880890 | 2.669948 | |
Zambia | 2004 | 12.019036 | 2.077427 |
Zimbabwe | 2004 | 13.905267 | 2.008948 |
2504 rows × 2 columns
Here we can see the suicide rate per 100,000 people and Male-Female Ratio of Suicide Rate averaged for all countries in our dataset for each year. Is suicide rates increasing?
df5 =df4.groupby(by=['Year']).mean()
df5
suicide rate (age-adjusted suicides per 100,000 people) | Male-Female Ratio of Suicide Rate | |
---|---|---|
Year | ||
1950 | 10.026776 | 3.008680 |
1951 | 10.333570 | 2.811640 |
1952 | 11.264735 | 2.919882 |
1953 | 11.479295 | 3.042418 |
1954 | 11.707912 | 3.129976 |
1955 | 11.626099 | 3.279391 |
1956 | 12.068923 | 3.300673 |
1957 | 11.916282 | 3.515100 |
1958 | 11.965727 | 3.075630 |
1959 | 11.942115 | 3.056767 |
1960 | 11.745655 | 2.972248 |
1961 | 11.001529 | 3.190191 |
1962 | 10.803649 | 3.198725 |
1963 | 10.773288 | 3.927470 |
1964 | 10.929189 | 3.177087 |
1965 | 11.061357 | 3.699283 |
1966 | 11.250243 | 3.316163 |
1967 | 11.240608 | 3.555859 |
1968 | 11.752552 | 3.035522 |
1969 | 11.332905 | 3.510673 |
1970 | 11.587252 | 3.207020 |
1971 | 11.774202 | 3.101760 |
1972 | 11.683335 | 2.908741 |
1973 | 12.318205 | 2.826642 |
1974 | 12.424822 | 2.874226 |
1975 | 11.646933 | 2.827183 |
1976 | 11.647635 | 2.735638 |
1977 | 11.989533 | 2.843289 |
1978 | 11.742691 | 2.997005 |
1979 | 11.929195 | 3.062952 |
1980 | 13.218345 | 3.105267 |
1981 | 14.295451 | 3.481311 |
1982 | 14.763732 | 3.293275 |
1983 | 13.664348 | 3.182687 |
1984 | 13.268984 | 3.304341 |
1985 | 14.218020 | 3.581695 |
1986 | 13.487080 | 3.441512 |
1987 | 12.844557 | 3.222015 |
1988 | 13.063391 | 3.483667 |
1989 | 13.485136 | 3.533190 |
1990 | 13.219594 | 3.615464 |
1991 | 13.113166 | 3.696503 |
1992 | 13.258250 | 3.771750 |
1993 | 13.719073 | 3.809640 |
1994 | 13.705408 | 3.993234 |
1995 | 13.914370 | 3.986267 |
1996 | 12.919068 | 4.198938 |
1997 | 12.870577 | 4.128673 |
1998 | 13.001752 | 4.148647 |
1999 | 12.685070 | 4.206645 |
2000 | 12.000572 | 4.429818 |
2001 | 12.138550 | 4.247924 |
2002 | 12.503788 | 4.103080 |
2003 | 12.494491 | 4.310938 |
2004 | 9.816060 | 3.652665 |
From interpretting the plotted line graph it appears to show that males have a higher rate of suicide on average across countries and rates increased during the 90's to early 2000's, as well suicide rates seemed to be the highest per 100,000 people around 1982. This demonstrates ability to compare data when plotting.
plt.plot(df5['suicide rate (age-adjusted suicides per 100,000 people)'], label='Suicide Rate Per 100,000 People')
plt.plot(df5['Male-Female Ratio of Suicide Rate'], label='Male-Female Ratio of Suicide')
plt.xlabel('Year')
plt.ylabel('suicide rates')
plt.legend(loc='upper right')
<matplotlib.legend.Legend at 0x7fde7dc599d0>