The model for the early-warning system uses real-time data from Twitter, Google searches, and mobility data from smartphones, among other data streams.
Deciding when to reopen and what to reopen have become almost like a big game of Russian roulette, and as states like Florida, Texas, and California have seen with huge spikes in coronavirus cases, plans can and are backfiring.
But a new model for an COVID-19 early-warning system that could predict outbreaks about two weeks before they occur, is drawing praise in the tech community. The model would give officials time to put effective containment measures in place.
In a paper posted on arXiv.org last week, an international team of scientists presented an algorithm that registered danger 14 days or more before case counts start to increase. The system uses real-time monitoring of Twitter, Google searches, and mobility data from smartphones, among other data streams.
SEE: Return to work: What the new normal will look like post-pandemic (free PDF) (TechRepublic)
“We estimate the timing of sharp changes in each data stream using a simple Bayesian model that calculates in near real-time the probability of exponential growth or decay,” the researchers wrote.
The team analyzed COVID-19-related real-time activity on social network posts, Google searches; anonymous mobility data from smartphones; and readings from the Kinsa Smart Thermometer. “This showed exponential growth roughly two to three weeks before comparable growth in confirmed COVID-19 cases and three to four weeks prior to comparable growth in COVID-19 deaths across the US over the last six months,” according to the paper.
Lian Jye Su, a principal analyst at ABI Research, said this approach could be very effective.
“In Singapore, for example, the government utilizes a combination of contact tracing, temperature monitoring, and anonymized mobility data to control the virus spread,” Su said. “While the count is still high among existing clusters, there have been no new clusters appearing despite the recent reopening. All new community cases are followed by swift actions from the authority to contain the virus spread.”
These measures are very effective—but only if used in combination, as well as in conjunction with steps like wearing masks and regular hand washing, he added.
Daniel Elman, a senior analyst at Nucleus Research, said it is impressive that the model provides visibility into what is happening before someone gets into a clinical setting. “This model is able to incorporate what people are doing so you have insight and now we have data” of people searching for “flu” or “COVID,” he said. There are no data points available if someone hasn’t gone to a doctor, he noted.
“So within that one to two weeks you may be worried…now they’re able to better consider those cases and incorporate them into the model—and maybe it’s not perfect but it’s better than it was.”
The data sources the researchers chose are all public records and easy to get and that adds value, Elman added.
“It can be continuously improved, and they can add new data to keep [the model] updated,” he explained. “The more data sources, the better. When you have to start anonymizing data and start buying it, that can get messy and it’s a great approach and they can get it out in the world quickly.”
Dr. Mauricio Santillana, one of the head researchers, told the New York Times that “In most infectious-disease modeling, you project different scenarios based on assumptions made up front…The difference is that our methods are responsive to immediate changes in behavior and we can incorporate those.”
Su said this reminds him of Google’s Flu Trends model developed in the 2000s. A reason for the Google model failure “was model decay, contributed by assumptions that no longer hold up due to a shift in human behavior and lack of adjustments to existing parameters,” he noted. “Given that COVID-19 is still new and the data that the researchers are using are both highly relevant and dynamic … the model is adaptive and responsive to immediate behavioral change.”
But Su added that he wonders whether the model takes into account the varying degrees of non-pharmaceutical intervention (NPI), such as mask wearing and regular hand washing mandates at the city, county or even community levels, “because that will also have an impact on the spread of the virus.”
Elman said he hopes the researchers will be further refining their data sets and incorporating consumer buying behaviors. “We’re trying to look at any change in people’s behaviors and … COVID is a big thing that would cause changes,” and this could be added to probability measures, he explained.
The data streams the researchers used were integrated with a sophisticated prediction model developed at Northeastern University based on how people move and interact in their communities, according to the Times article.
The team tested the predictive value of trends in the data streams by looking at how each correlated with case counts and deaths in March and April, in each state.
New York experienced a significant increase in COVID-related Twitter posts more than a week before case counts exploded in mid-March, the Times article noted, and relevant Google searches and Kinsa measures spiked several days beforehand.
Now, the algorithm is predicting that even though Nebraska and New Hampshire have flat case counts, they are likely to see a rise in cases in the coming weeks, the article said.
While Su said he can’t comment on the model’s accuracy since he has no visibility into how it was trained and developed, “my assumption is, if the mobility level remains high and NPI continues to be lacking in these two states, there will be a rise in cases. This is because even states with strict stay-at-home orders and mandatory face coverings previously [are seeing a] second wave of COVID-19 cases as these states reopen their schools and businesses.”