Page 50 - GCN, Oct/Nov 2016
P. 50

Editor’s Choice: Open Data
Atmospheric Administration. Although the information is highly sought after, it hasn’t always been easy to get or use. The Commerce Data Us- ability Project changed that.
Launched in January, CDUP is a collection of online data tutorials that introduce a problem, explain how data solves it and provide step-by- step instructions on accessing data and making it action- able. That approach cuts the time it takes users to collect and process Commerce’s information from months to moments.
“Even though there’s this wealth of data, getting to it and using it is highly special- ized,” said Tyrone Grandi- son, former deputy chief data officer at Commerce. “It’s not completely open to everybody.”
Many companies, some
of which built multibillion- dollar industries using Com- merce data, had to spend
a fortune outsourcing the time-consuming work of col- lecting and sorting the data, for instance.
“It’s now a community initiative,” he said. “You have a groundswell of people who are using multiple different datasets from Commerce in different ways now wanting to share knowledge.”
The project started with four tutorials and now offers 12 on topics such as using NOAA’s severe-weather data for risk analysis, exploring American innovation via the U.S. Patent and Trademark Office’s data and tapping data from Census’ American Community Survey to focus nonprofits’ resources.
Usability
project unlocks
Commerce data
By making its data more usable, the Commerce Department is reaching more people and improving the quality of the insights they derive from that information
The Commerce Department has tens of thousands of datasets, thanks in part to its data-rich agencies such as the Census Bureau, the National Institute of Stan- dards and Technology and the National Oceanic and
BRIDGING — AND SECURING — DHS’ DATA SILOS
The DHS data framework makes it easier to search across classified and unclassified datasets
Some of the data used to fight terrorism is classified, but much of it is not. That makes it difficult to cross-reference and share information while still enforcing the appropriate level of security.
To address that problem, the Department of Homeland Security created the DHS Data Framework, which consists of two Hadoop data lakes (or data management platforms) that can handle large volumes of information. It also uses attribute-based access controls so that designated users can see data while protecting privacy, civil rights and civil liberties.
“There are a number of different problems that we’re looking to solve with the data framework,” said Paul Reynolds, director of the DHS Data Framework. “Many of them can’t be solved unless you bring the data into one location.”
Law enforcement officials who are investigating a ter-
rorism suspect, for instance, need to look at classified and unclassified data. Until the data framework, there wasn’t an efficient way to do that, especially not in real time, Reynolds said.
The system takes the unclassified data and moves it up to the classified networks, “so the data itself is still unclassified, but it’s sitting in a classified spot,” he said.
The classified and unclassified data sit in two separate Hadoop data lakes that use a cross-domain guard to share data in near-real time. When the framework is fully opera- tional, DHS officials expect to have 20 to 25 databases in the lakes. Right now, four are fully operational and nine are being populated.
And they aren’t small databases. Reynolds said one of them has about 70 billion records in it.
The framework is currently only being used for coun- terterrorism purposes, but he said he expects that it will ultimately be used for additional mission areas.
— Matt Leonard
50 GCN OCTOBER/NOVEMBER 2016 • GCN.COM


































































































   48   49   50   51   52