Big data is watching you

Big data is watching you

JCC U talk to focus on the role of googled analysis in social research

Dr. Seth Stephens-Davidowitz and the 1999 Tenafly High School JV baseball team. He’s second from left in the bottom row.
Dr. Seth Stephens-Davidowitz and the 1999 Tenafly High School JV baseball team. He’s second from left in the bottom row.

People lie. This is not news.

They lie for all sorts of reasons, chief among them the need to look good to other people or to themselves. They lie to protect themselves, to protect other people, to show themselves as the people they would like to be, or as they know other people would like them to be.

Given all that, it is no surprise that people lie on surveys and to pollsters — and of course surveys and polls can be only as accurate as the information they report. And they lie on social media, all the time, through their teeth, because there is nothing more important on social media than a properly curated life.

But there is something that people don’t lie to.

They don’t lie to Google.

They don’t lie because the only way to get the information they need is to ask for it, as clearly and directly as possible. And they don’t lie because they think no one is watching.

And that’s only partly true.

Google, as it turns out, can be a shockingly reliable source of information and predictions. It is also one of the frontiers where the battle between individual privacy and the need for data that will improve lives more broadly will be fought.

Seth Stephens-Davidowitz, who grew up in Alpine, went to high school in Tenafly, where he played on the baseball team, earned an undergraduate degree at Stanford and a Ph.D. in economics from Harvard, worked at Google — and he’s just 35! — and now studies and writes, in outlets including the New York Times, about big data, will be at the Kaplen JCC in Tenafly to explore how big data changes our understanding of the world (and also how it does not). (See box for more information.)

He’ll be talking about his new book, “Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are.” The impetus for the book, and for his work since 2011, when he still was in graduate school, was his discovery of Google Trends. “I became obsessed with it,” he said; that’s because he realized that it is so honest and therefore so revealing. “If you ask people in Mississippi, where it is hard to be gay, if they are gay, for a survey, very few people will say yes,” he said. “But there are the same percentage of searches for gay porn online from Mississippi as from New York. Online, people are honest.”

His doctoral dissertation was based on his study of data he gathered from Google Trends; he examined three areas — racism, child abuse, and voting behavior. “It’s hard to predict who will turn out to vote from surveys, but it turns out that you can predict voter turnout with high accuracy from searches about how and where to vote,” he said. Neither people who always vote nor people who never vote make such searches, he added, but no one has to scramble for that information. It’s the swing voters who make the difference.

Dr. Seth Stephens-Davidowitz

“One of the reasons that Hillary did worse than expected in the election was because black turnout was lower than the polls suggested, but there was a huge drop in searches about where and when to vote” from areas with large African-American populations, he said. And it predicted the voter turnout early; “by mid-October, those searches already had predictive power,” Dr. Stephens-Davidowitz said.

It’s fairly easy to track racism through searches too, he added. “I am shocked by how often people make racist searches on Google, particularly searches mocking African-Americans. And I also was surprised by the location of the searches. I thought that they would be concentrated n the deep South, but there also are high numbers of them from upstate New York, western Pennsylvania, industrial Michigan, and eastern Ohio. A lot of places where Trump’s support was highest.

“Google Trends shows that the searches rise every year on Martin Luther King Day, by 30 percent. And then they rose a lot when Obama first was elected; they rose to historic levels. No matter what people were saying publicly, a lot of people had a racist response to his election.”

Of course, trends and discoveries on Google Trends can be only relatively historic; the tool itself has been around only since 2008, and the data it shows has been available since 2004, Dr. Stephens-Davidowitz said.

Although he is an economist by training, Dr. Stephens-Davidowitz considers himself to be a data scientist. It’s a new field, he said; it didn’t exist even as recently as when he was in graduate school. And it’s a burgeoning one.

To be clear, he said, Google does not provide individual information. Researchers can tell the geographic area and the time, but nothing more granular than that. “It shows you patterns in the aggregate, but it doesn’t tell you how each individual will behave.” His own behavior is proof of that. Last weekend, Dr. Stephens-Davidowitz, who also is a New York Times columnist, wrote a piece about what data analysis shows about music preferences. It is clear, he found, that people fall in love with the music of their early teens. And his favorite song is Bruce Springsteen’s “Born to Run,” popular well before he was a fetus, much less a teen, even an early one. “Most people, I think, are more standard, but I’m an outlier,” he said.

Although the data he searches are anonymous, there are tensions between privacy and availability, he added. “You can know the number of searches made, but you can’t know who made the search,” he said. “Google is sensitive to that. They want to protect users’ privacy, so you can’t figure anything out about any individual. That data does exist, but Google is very protective of it, even internally, because of its sensitive nature.

“People ask if they should even be making Google searches, and I always tell them that it probably is the best place to leave your data, because of the company’s financial incentive to protect its users’ privacy. You want to be more careful using smaller websites, which have less money” to spend on protecting data. “The big data companies are the best places to leave your private data.”

There is a moral incentive to share aggregated data, he said. “There are diseases that can be cured. There are some diseases that doctors have figured out how to cure by figuring out their causes, and they have done that by looking at the places in the world that do and do not suffer from it.” And they do that by seeing where searches for information about diseases and their symptoms originate. “It would be unfortunate if this information couldn’t be made available.”

Take, for example, pancreatic cancer, which can be cured if it is caught early enough, Dr. Stephens-Davidowitz said. “Researchers have looked for symptoms, searched for and have found really subtle patterns. They have found that if they search indigestion, followed by abdominal pain, that is a risk factor, and it was unknown to the medical community. That shows the power of the data.

“The question is how do we harness this data while protecting user privacy?

“The ideal would be to have scientists have access to the data, and that user privacy would be protected. We are trying to find that balance. But if you have a relative with pancreatic cancer, you would want Google to have been studying that data.”

Once his talk is over, Dr. Stephens-Davidowitz will have a question-and-answer period. He’ll be glad to take questions on any subject, but because the talk is in Tenafly, there is one subject in particular that he’d be glad to discuss then. “I hope someone asks me about my Tenafly baseball career,” he said.

Who: Dr. Seth Stephens-Davidowitz

What: Will talk about big data

Where: At the Kaplen JCC on the Palisades, 411 East Clinton Ave. in Tenafly

When: On Thursday, February 22, at 10:45 a.m.

Why: For the JCC U’s first session that day


Who: Dr. Brian Rose

What: Will talk about the Hollywood star system

When: at 12:45 p.m.

How much: $35 for JCC members; $45 for nonmembers for the whole day, which begins with coffee at 10:30 and includes a break for lunch (which is not included).

For more information: Call (201) 408-1454.

read more: