This shows that we have 162382 observations, with a dates between 2020-01-30 and 2021-06-25.
Wrangling the Data
To aide simplicity, we first choose a selected number of areas for data to compare and wrangle the data slightly to provide what we’‘ll be needing.
Filter our areas. Just for local interest in this overview, we will look at Crawley, Horsham and Mid Sussex.
Select the columns we are interested in.
Mutate the data to add columns for 7-day Rolling Average of cases and deaths.
Crudely Plot our Data
Using the 7 day rolling average, to avoid a jagged chart, we show deaths and cases.
The Death Date is the date of death, given that they had Covid-19 within the past 28 days.
Chart for Cases and Deaths
Not the best way to display this data, but it shows the number of cases at a given time, and how the death rates followed the cases.
A boxplot with jitter to show the ‘range’ of data. Notice that each area have a similar mean and quartile data. Also that even the outliers are similar.
Maybe a better way would be to facet this data. Unfortunately, at the moment, our data is not truely ‘tidy’. In order to facet wrap, we need to tidy the cases7 and deaths7 columns.
“Tidy” data follows these rules:
Each variable in the data set is placed in its own column