Is It OK that Google Owns Us?

eWEEK.com - June 17, 2007
Lisa Vaas

Given Google's overwhelming popularity, chances are that most consumers are going to put their privacy on the line.

Google's continuously raked over the coals regarding the massive amounts of PII (personally identifiable information) it collects, what it does with it, how long it retains that data and what the company might do with it if its merger with DoubleClick goes ahead.

That's all been ratcheted up to fever pitch over the past few weeks, with two new privacy headlines: complaints being voiced about Google's new Street View service's photographs getting too close for comfort and Privacy International's having flunked Google on its privacy policies and procedures in a report published June 9.

The fury boils down to one question: whether or not it's OK for Google to own us.

Make no mistake, Google owns you. The ways in which it owns you are laid out in a complaint filed by EPIC (Electronic Privacy Information Center) and other privacy groups with the Federal Trade Commission over Google's proposed merger with targeted advertising company DoubleClick. Here's the list of data that Google collects and retains and the technologies through which the company gets it, from the complaint:

Google search: any search term a user enters into Google;
Google Desktop: an index of the user's computer files, e-mails, music, photos, and chat and Web browser history;
Google Talk: instant-message chats between users;
Google Maps: address information requested, often including the user's home address for use in obtaining directions;
Google Mail (Gmail): a user's e-mail history, with default settings set to retain emails "forever";
Google Calendar: a user's schedule as inputted by the user;
Google Orkut: social networking tool storing personal information such as name, location, relationship status, etc.;
Google Reader: which ATOM/RSS feeds a user reads;
Google Video/YouTube: videos watched by user;
Google Checkout: credit card/payment information for use on other sites.

That list at some point will also likely include Google Gears (now in beta), an open-source browser extension that uses JavaScript APIs to allow users to work on Web applications when they're offline. Google Gears will be the mesh between the Internet and a local store of data kept in a user's fully searchable relational database, and thus the search giant will also gain access to data at the desktop application level.

What does Google do with all that data?

Then there are the sins that Google commits with all that data, according to Privacy International. PI's list, which the NGO says is "by no means" complete, as quoted from its site:

Google account holders that regularly use even a few of Google's services must accept that the company retains a large quantity of information about that user, often for an unstated or indefinite length of time, without clear limitation on subsequent use or disclosure, and without an opportunity to delete or withdraw personal data even if the user wishes to terminate the service.
Google maintains records of all search strings and the associated IP addresses and time stamps for at least 18 to 24 months (although Google recently announced that it would only retain data for 18 months) and does not provide users with an expungement option. While it is true that many U.S.- based companies have not yet established a time frame for retention, there is a prevailing view among privacy experts that 18 to 24 months is unacceptable and possibly unlawful in many parts of the world.
Google has access to additional personal information, including hobbies, employment, address and phone number, contained within user profiles in Orkut. Google often maintains these records even after a user has deleted his profile or removed information from Orkut.
Google collects all search results entered through Google Toolbar and identifies all Google Toolbar users with a unique cookie that allows Google to track the user's Web movement. Google does not indicate how long the information collected through Google Toolbar is retained, nor does it offer users a data expungement option in connection with the service.
Google fails to follow generally accepted privacy practices such as the OECD Privacy Guidelines and elements of European Union data protection law. As detailed in the EPIC complaint, Google also fails to adopt additional privacy provisions with respect to specific Google services.
Google logs search queries in a manner that makes them personally identifiable but fails to provide users with the ability to edit or otherwise expunge records of their previous searches.
Google fails to give users access to log information generated through their interaction with Google Maps, Google Video, Google Talk, Google Reader, Blogger and other services.

Few would contest the fact that Google collects a vast array of PII. Whether Google can be trusted not to do evil with that laundry list of PII is debatable. For a demonstration of Google's trustworthiness, the Google faithful point to the search company's having refused to comply with a subpoena from the U.S. Department of Justice demanding log entries on its searches—a demand that Google competitors AOL, Microsoft and Yahoo obeyed as the government investigated how often children might stumble upon pornography while using search engines.

Even those who want Google to retain PII for far less time than it does give credit to Google for refusing to comply with the subpoena. But, privacy advocates say, the fact that the DOJ subpoenaed the data in the first place proves that government officials are hungry to track citizens' and noncitizens' Internet doings—as are, of course, law enforcement agencies and criminals, as well.

"We supported [Google] when they made that decision" about refusing the subpoena, said Marc Rotenberg, executive director of EPIC. "We also said we thought it was a mistake for Google to retain so much user information [in the first place]. As long as they do retain it, privacy will be at risk."

One only has to look to AOL's August 2006 publishing of confidential information belonging to 658,000 of its subscribers to know what Rotenberg's talking about. Accidents happen. AOL went on to apologize for the inadvertent disclosure, but even its unpublishing of the database didn't make things better—mirror sites were already up. Once data is out, it's out.

An unfair portrayal of Google?

Still, Google stands by its pledge to do no evil, and when it comes to privacy, the company insists it's being misrepresented, saying that the PI report had inaccuracies and mistakes. In a recent discussion with eWEEK, Google Deputy General Counsel Nicole Wong couldn't specify any mistakes per se in the report but had much to say about what she called Google's unfair portrayal in it.

For one thing, there's the data retention policy thing. Google in May announced it would anonymize its search logs after 18-24 months.

Google's proud of that. "As far as I know we are only major [search] company to announce a log anonymization policy and limit it to 24 months," Wong said.

Privacy International and other privacy groups, of course, didn't think much of Google drawing the shades of anonymity across PII after keeping it in the limelight of its research and potential mishandling for between one and a half to two years, and they were equally unimpressed with Google's more recent decision to anonymize data after 18 months, as opposed to up to 24.

Beth Givens, director and founder of the Privacy Rights Clearinghouse, points to European metasearch engine Ixquick as being an example of one search company that manages to do just fine without lengthy retention of data. As Ixquick states on one of its privacy policy pages, the search company deletes users' privacy data within 48 hours—the only search engine to do so, it claims.

As Google has pointed out in its blogs, the rationale behind keeping the data so long is that Google uses it to improve services and protect them against security and other abuses, Wong said. "We tried to balance providing robust service and improving our service."

To what extent does Google go to improve search, such that it needs more than a year to squeeze the pulp out of our PII? Google, understandably, keeps its work on mathematical algorithms close to the vest. The company does point to a June 3 article in The New York Times on the topic, which was published after the newspaper was given a rare inside look at what The Times called "a crucial part of Google's inner sanctum, a department called 'search quality' that the company treats like a state secret." (For The Times' article, free registration is required.)

According to The Times, Google's search-quality team makes on average a half-dozen major and minor changes weekly to the "vast nest" of mathematical formulas powering the company's search engine. As Google engineer Amit Singhal told the newspaper, in Google's quest to fend off competitors' search engine efforts, such as those from Yahoo and Microsoft, the goal of search has evolved from finding a user what he or she typed to finding what he or she wants. For example, Google's formulas have evolved to the point of knowing that users who run a search on "apples" are likely interested in fruit, whereas those who type in "Apple" are thinking about iPods or Macs. Google has even enabled its search engine to compensate for searches that are hazily worded or even mistakenly typed in.

What's confronting the search giant now are problems such as Web spam, where pages filled with ads manage to pop up at the top of search listings. Another problem the search team recently worked on was that of users trying to find local businesses lacking a substantial amount of links to them—obviously an issue to an advertising-fueled business such as Google.

The local-business dilemma was discovered after receiving a complaint about a Palo Alto shop failing to come up in searches. Of course, the company can't rush to fix every complaint, because each fix could break something else in search—similar to when Microsoft patches a vulnerability and then has to test scores of operating systems and application/operating system combinations. What that means, Singhal told The Times, is that the company doesn't react on the first complaint—instead, it lets things "simmer."

Simmering takes time. Search company Ixquick, the one touted by privacy groups, hadn't responded to inquiries about the depth and nature of its search optimization by the time this article was posted, but chances are good that 48 hours of data retention doesn't allow for much simmering.

One particular area of focus for Google that involves very personal information—a user's individual searches as determined by IP address, which only works for users of Gmail—is the vast number of "signals" the company uses to determine page ranking. One signal—the company now identifies more than 200—is a person's individual search history. History is taken into account to determine whether a search return is appropriate for an individual in the context of his or her past searches. The example given by the Times is the search history of a marine biologist compared with that of a sports fan when either searches on the term "dolphins."

This, of course, is an area that greatly concerns privacy advocates. Multiple organizations have proved that individuals can be identified through their search strings, given that we tend to search on friends, relatives, local addresses and businesses, and more.

Danny Sullivan, a blogger who concentrates on search at searchengineland.com, has written a blog that inspects the Privacy International report assertion by refuting most of its charges against Google and pointing out the report's weaknesses, including a reliance on subjective, unmeasurable input such as newspaper articles to come up with its rankings.

Even Sullivan considers personally identifiable profiles of individual searchers to be a legitimate concern to privacy advocates—certainly more legitimate than what he calls "old-school" concerns about "fairly anonymous" cookie data and IP addresses being a privacy concern. But, he pointed out, if privacy advocates are going to be concerned about those individual profiles, they should also start worrying about similar profiles kept by Microsoft and Yahoo, both of whom passed the PI's privacy ranking.

Enter the law

But beyond the search optimization rationale, Google has another justification behind its 18 months of data retention: the law.

"While shorter retention periods are good for privacy, longer retention periods are needed for security, innovation and compliance reasons," wrote Peter Fleischer, Google's global privacy counsel, in the posting in which he announced that Google would anonymize data after 18 months.

Legal compliance is a compelling justification for data retention. The problem is, nobody seems to be able to locate the laws that Google is talking about. Google acknowledges that its data retention period is based on parameters being discussed now in the European Union as opposed to any existing laws. A Google spokesperson points to a site run by European Digital Rights that tracks legal maneuverings around data retention in the EU, providing a round-up of implementation status on a country-by-country basis. "The status is changing almost daily," she wrote in an e-mail exchange.

No doubt that's true, but the majority of data retention laws being discussed or implemented pertain to ISPs or telephony providers; only one, in Germany, appears to pertain to e-mail providers.

"I would like Google to point out specific legislation that requires a private company in the search business to retain data," said PI's Simon Davies. "I can't. I'm not aware of any such law. There is data retention in Europe, but it doesn't apply to keeping search strings for 18 months. If we're talking about a week, perhaps we'll have room for negotiation. But I suspect Google, like other major players, is on the wrong highway. Whatever techniques they're requiring shouldn't require retention for that long a time."

Either way, Davies said, the process of data retention requires "full scrutiny."

What does Google have to say about the validity of other criticisms in the PI report? The one thing that Google grants it could do better on—maybe, if the charge is in fact legitimate—is being clear on its policies.

"If we're not being clear, shame on us because we should be," Wong said. "We try hard to be."

One thing privacy advocates would like to see Google do is to get a privacy czar. One of PI's complaints was that nobody at Google got back to the organization when contacted about privacy concerns.

"Google was invited to provide any data that would help its case," Davies said. "We tried to reach Google at Mountain View…I suppose it would have been [in May]. Five, six days before publication of the report I called Peter Fleischer, [Google's] global privacy lead, and warned him the report was coming out and Google wasn't looking good. I asked Peter to send me anything we could take into consideration in finalizing the report, and nothing came back. Peter did ask me to come to Paris to meet, but it was a busy week. The last thing I was going to do was come to Paris to be one of 23 other organizations."

If Google had provided the PI with a response, privacy advocates say, the company likely would have come off looking a lot better in the report. They point to this omission as being an indication that the company needs a clearer path to reporting privacy issues.

"Google needs a privacy officer," said Beth Givens, director and founder of the Privacy Rights Clearinghouse, when asked what steps privacy experts believe would help Google shape up.

Google finds the notion odd, pointing not only to on-staff privacy experts Fleischer and Wong but to the product development lifecycle now in place at Google, instituted when Wong was brought on-board, in which every product launched includes on its team a lawyer trained on privacy issues who works with product development from the get-go.

The back and forth will continue for the foreseeable future, particularly given Google's proposed merger with DoubleClick. Some say that in the end it's up to consumers to police the information they give to Google or to anybody, but in fact Google garners information from the simplest action as performing a search.

Consumers always have options to Google. Or, rather, when it comes to privacy, given that Yahoo and Microsoft are hardly more privacy sensitive, there is only one option: Ixquick.

Take your pick: If the choice comes down to being owned by Google and using Ixquick, given Google's overwhelming popularity, chances are that most consumers are going to put their privacy on the line.

Editor's Note: This story was updated to include comments from Google.

Is It OK that Google Owns Us?

eWEEK.com - June 17, 2007 Lisa Vaas

Given Google's overwhelming popularity, chances are that most consumers are going to put their privacy on the line.

eWEEK.com - June 17, 2007
Lisa Vaas