It’s often hard to find good data sets.
But it just got a lot easier.
Earlier this week Google announced a new Data Search mode that lets you do a Google search just for data sets.
My colleague Natasha Noy wrote in the Google Blog that
“There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.”
For people looking for online data, this is a godsend.
To use it, visit:
toolbox.google.com/datasetsearch
and do a search. Here’s one of the first things I tried to do… (Naturally, I checked local data that I probably would recognize..)
Notice that Dataset Search, like regular Google Search, uses Autocomplete. This is a wonderful behavior that will let you search a data space very quickly. (Caution: It doesn’t seem to reflect ALL of the possible completions, so use this feature carefully. )
And, naturally, when you get a dataset, read the metadata carefully. (We discussed this a while ago, Feb 15, 2014–“Read metadata carefully.”)
The next search I tried was for Stanford’s dataset of global warming information, I’ve always wanted a copy of it so I could do my own analysis.
I did the obvious search, and found not only three different providers of the datasets, but other, related datasets as well.
On the left hand side you’ll see a scrolling list of related datasets (that is, other data sets that match the query, but like regular Google results, are not ranked quite as high as the first hit).
You can use Control-F to do a text search within the page, but notice that the scrolling list of related datasets might go on-and-on-and-on… You can’t trust Control-F to search everything in that list. (It’s a “scroll on demand” list; Control-F only searches what’s “visible” and not the entire contents of the list.)
Note that you can ALSO use site: to restrict the sites that are searched for data.
A search tool like this one is only as good as the metadata that data publishers create. Maybe some enterprising SRS folk will publish a data set or two. (I certainly will try!)
If you publish data and don’t see it in the results, visit our instructions on our developers site which also includes a link to ask questions and provide feedback. Learn all about how to publish your own datasets here at the dataset publishing guidelines page.
We’ll have some future Challenges that will use Dataset search, I’m sure.
Search on (for data)!