Statistics Canada has made a case-level dataset available of COVID-19 cases in Canada. The dataset comprises information on cases for which a case report was submitted to the Public Health Agency of Canada (PHAC). 3,093 cases are included, which at the time of publication accounts for just under half of the total reported cases in Canada. The website for the dataset includes the following disclaimer.
Because the COVID-19 pandemic is rapidly evolving, these data are considered preliminary. The data published by Statistics Canada only account for those where a detailed case report was provided by the provincial or territorial jurisdiction to the Public Health Agency of Canada (PHAC). Statistics Canada’s detailed preliminary confirmed cases will not match the total case reporting done at the provincial and territorial levels which are reported daily by each jurisdiction and compiled by the PHAC. The discrepancy is due to delays associated with the submission of the detailed information, its capture and coding. Hence, Statistics Canada’s file on detailed case reporting is a subset of the total counts reported by the health authorities across Canada.
It’s important to remember that we don’t know what the criteria are for a case to be submitted to PHAC. Since the data doesn’t include information about geography, it’s quite possible that some regions are over-or-underrepresented. It’s also possible that severe or otherwise noteable cases are overrepresented. Despite these caveats, it’s possible that this dataset can provide useful insights on the situation in Canada.
Transmission types over time
The dataset includes a transmission type column which indicates for each case an inferred transmission type. These values include “Travel”, which indicates that the individual had contact with a travel-related case, or had travelled outside of Canada within 14 days prior of illness onset, and “Community” which indicates that the case could not be attributed to Travel.
The following chart plots the proportion of new cases attributed to community spread over time. The size of each point indicates the number of new cases on that date. I’ve also highlighted March 18th, which is the date that strong travel restrictions were enacted.
Hospitalization Rates by Age, and Age Rates by Hospitalization
In this section I look at how hospitalization rates vary by age group, and vice versa. The following table shows the number of cases falling into each age bucket and hospitalization status.
A large number of hospitalization statuses have not been recorded. In the next two charts I filter out the unknown statuses, but it’s important to note that this missing data could affect these estimates.
This chart shows the proportion of each age group that was hospitalized, out of cases where hospitalization data was recorded.
The next chart shows the cumulative proportion of hospitalizations made up by each age bucket. The This shows that, for instance, 18.3% of hospitalizations are people under 50, and 35.4% of hospitalizations are people under 60. Again, it’s important to remember that a large
The following chart shows the rates of intensive care for hospitalized cases by age bucket. Here I am not omitting cases where the intensive care status was not recorded.
coviddf %>% filter(hospitalization == 'Yes') %>% group_by(age_group) %>% summarise(intensive = mean(intensive == 'Yes')) %>% ggplot(aes(x = age_group, y = intensive, label = scales::percent(intensive, accuracy = 1))) + geom_col() + geom_text(nudge_y = .01) + labs(title = 'Proportion of hospitalized cases requiring intensive care by age group', y = 'Proportion requiring intensive care', x = 'Age Group', caption = CITATION) + scale_y_continuous(labels = scales::percent)
The dataset that I’m using can be viewed and downloaded below. Feel free to use this for your own analysis.
You can view the source code for this post on GitHub.
- Source: Public Health Agency of Canada, COVID-19 epidemiological reports, with contribution from Provincial/Territorial Ministries of Health.