As COVID-19 spreads across the globe, President Donald Trump has suggested that the virus will “go away” in April due to the warmer weather. The apparent reasoning behind that claim is that other respiratory viruses such as those that cause the common cold and seasonal flu tend to subside during the warmer months. Experts tend to agree that there is not yet any evidence suggesting that there is a seasonality to this particular strain of coronavirus known as SARS-CoV-2. Cases of COVID-19 have been appearing in the southern hemisphere over the course of their summer months, suggesting that the seasonal impact on the transmission of COVID-19 is not a substantial consideration.
This article discusses an analysis conducted on the quantitative effects of climate on the transmission rate of COVID-19 utilizing viral tracking and weather data in R. All code and data used in this analysis can be found here: https://github.com/QuantumAbyss/COVID19-Temperature-Analysis
The COVID-19 tracking site from Johns Hopkins University (https://coronavirus.jhu.edu/map.html) has a link to their GitHub that is updated daily with reports from the WHO, CDC, ECDC, NHC, and DXY to support their visualization. A significant assumption in this analysis is that issues with testing and and varying reporting practices across the different entities can be ignored.
Weather data for the US was pulled from an online tool published by ADM Associates (http://shinyapps.admenergy.com/app/getNOAA) that pulls from the National Oceanic & Atmospheric Administration (NOAA) database. Weather data for other countries was populated manually using Weather Underground (https://www.wunderground.com/).
Population density information for the highest population cities in the evaluated regions was manually compiled.
The selection of which region or country to include in the analysis was based on whether or not there were over 150 confirmed cases and if weather data was available from the primary airport station in the highest population cities for each evaluated region. The final set of regions provided a reasonable range of temperatures across widely varying climate zones.
Some significant effort was taken to prepare the COVID-19 tracking data set as the line-items contain an irregular timestamp field along with the number of confirmed cases at that time for a particular geographic area. How that geographic area is defined depends on the reporting entity responsible for the region. In the US, data on the number of confirmed cases was available at the county level for a short time, but, due to issues with timely reporting, transitioned to tracking by state. In order to adequately track the confirmed cases across time with reasonable accuracy, historical county level data was aggregated up to the state or country level.
Determining Transmission Rate
To determine a value for the viral growth rate, an exponential growth model was fit to each aggregated region of the form shown below. It is important to note that the exponential growth function here is not used for the purposes of forecasting, but merely to calculate a current growth rate.
In R, a Nonlinear Least Squares function was selected to fit this equation:
Where the model is free to select values for r and x that optimize the fit. The defined model ended up fitting the viral growth data well enough for all cases that the estimate for r could be used to compare the speed of viral transmission across the evaluated regions.
For each region, a viral growth rate r was determined, along with an average temperature, average relative humidity, and population density.
A simple linear regression model was applied on this data set with this set of terms, but was providing unusual results at first. Namely, the correlation between growth rate and population density was unusually low. It is admittedly a failing of this analysis that I am unable to control for additional potential high-influence factors on the growth rate such as the unique responses that each region will have in terms of preventative measures. However, it seems that variable may have appeared naturally out of this analysis anyway; the plot below shows that there does appear to be a linear relationship between the growth rate and population density, however there appear to be two distinct relationships. New York, France, Greece, Iran and Japan all have significantly lower growth rates for their population densities relative to the other evaluated regions. There are three possible reasons I can think of for this: 1) population density information may not be accurate — recall that this data was pulled from each region’s most densely populated city, as it was theorized that this would be the most accurate given that viruses tend to favor high population areas. However, the most densely populated city of each country/state may not be reflective of the actual conditions of the virus, which may be spread across many different cities. 2) the virus response, in terms of the implementation of social distancing, hygienic practices, etc. between these two groups may vary to such a degree that it has slowed the viral growth rate. Or, 3) the virus testing and reporting practices that are used to develop the estimate for the growth rate may be substantially different between these two groups.
Regardless of the reason, I introduced an additional term into the regression to account for this separation so that the features of interest, temperature and humidity, could be better isolated to determine potential impacts on the growth rate. The results of the linear regression model are shown below.
It is interesting to note that while the relative humidity term is not a statistically significant variable, the estimate indicates that the viral growth is slower in high humidity areas. This generally agrees with one of the reasons the common cold and seasonal flu become more common in the winter months: cold air tends to be less humid meaning there is a lower probability that pathogenic particles bind with atmospheric water vapor thus reducing their mobility.
Based on these results, and the high P-Values associated with the temperature and humidity terms, there is no statistically significant effect on COVID-19 transmission due to varying temperature or relative humidity. This indicates that we should not expect the growth of COVID-19 to be slowed by seasonal changes.