Strange IndiaStrange India


A female researchers gestures at water data displayed on a computer screen onboard a research ship.

Credit: Monty Rakusen/Getty

Last Thursday, 30 January, bioinformatician Niema Moshiri received a late-night message from a long-time collaborator urging him to back up the website of the United States Centers for Disease Control and Prevention (CDC). At the time, rumours had been circulating that the public-health agency, which tracks disease outbreaks and makes its data publicly available, would start removing pages from its website, in response to executive orders issued by President Donald Trump directing government departments to take down public information on gender and diversity.

Moshiri, who is based at the University of California, San Diego, and a self-described “data hoarder” who creates backups of his personal videos and online receipts and bills, was happy to help. “I never thought I would have to do it for information pages from the federal government,” he says.

In the past week, some US federal government websites containing important data sets and information on public health and demographics, such as global programmes to combat HIV and national surveys of chronic disease, have been taken down. Some have been reinstated, others not. “It was kind of shocking to me that they would just delete pages willy-nilly,” says Moshiri.

Moshiri is one of dozens of researchers in the United States and globally scrambling to capture public information on US federal government websites before it is tampered with or disappears. “A lot of people did similar parallel efforts, especially within their own areas of expertise,” says virologist Angela Rasmussen, based at the University of Saskatchewan in Saskatoon, Canada, who was the collaborator that messaged Moshiri. She stayed up till 2 a.m. last Friday manually downloading data sets, such as on influenza surveillance. Having created these backups, researchers are figuring out ways to make it publicly available, she adds.

The CDC and US Department of Health and Human Services, which is the parent agency to the CDC, say all changes to their websites are in accordance with Trump’s executive orders.

Global effort

Table of Contents

Over the weekend, Moshiri got in touch with Charles Gaba, a health-care policy and data analyst based near Detroit, Michigan.

Moshiri helped Gaba create an alphabetical list of every CDC web link — amounting to more than 7,000 pages, which Gaba manually redirected to the version on the Wayback Machine, a service maintained by the non-profit organization Internet Archive, based in San Francisco, California, which regularly archives web pages, including material from government websites such as that of the CDC. Gaba then posted the entire list on his blog. “It took a couple days,” says Gaba. “A lot of it is vital, and you don’t know what’s been, what’s still there, what isn’t there.” Gaba has since posted a similar list of the entire US Food and Drug Administration (FDA) website, organised by subject.

On his hard drive, Moshiri now has backups of the CDC website, and all the CDC’s data sets, the FDA and other government websites. Some of these he downloaded himself and others were first downloaded and shared by others online. He is also in the process of backing up the US Department of Agriculture website — all these sites amount to hundreds of thousands of files and more than 130 gigabytes of uncompressed data, and yet all could fit on a USB flash drive. “These are pretty tiny,” he says.

Moshiri hasn’t shared his backups publically. If his university agrees that it falls within his role as a faculty, Moshiri wants to publish an exact untouched copy of the CDC website. And his long-term goal is to back up every federal government website. “I have 100 terabytes of storage under my desk. In theory, I could stick all of these on.”

Kendra Alberts, an attorney with the public interest technology and media law firm Albert Sellars LLP in Philadelphia, says works produced by federal government employees as part of their jobs are in the public domain. Generally speaking, it is legal to download government datasets, make backups of government websites and share them, they say. In circumstances where copyrighted material is included in that data, making copies and sharing them would often fall under the doctrine of fair use if it is done for the purposes of research, advocacy or as a historical record to show what the site looked like at an earlier date, says Alberts.

The Internet Archive also supports an initiative to archive government web pages at the beginning and end of every US presidential term. James Jacobs, a librarian at Stanford University, who works on the End of Term Web Archive hopes that it could become a central place where scientists’ archives of the government websites can be accessed by the public.



Source link

By AUTHOR

Leave a Reply

Your email address will not be published. Required fields are marked *