Blog :: General

The Danger of Google’s Massive Harvesting

It’s pretty safe to say that most users are well aware that companies like Google, Facebook, LinkedIn and hundreds of others are harvesting data out of their customer’s end user devices. What many aren’t aware of is that you don’t even need to be visiting their web sites or actively using their services for them to be constantly streaming data from your Internet connected device.  For example, I haven’t even intentionally visited Google.com today and look who the top two hosts my laptop is reaching out to this morning (.1e100.net = Google):

Looking down through the hosts listed in the above NetFlow analyzer, you can see Akamai Technologies, Amazon AWS and gotomeeting.  Some of the IP addresses that didn’t resolve to a hostname belong to companies we know well (e.g. #24 = 23.97.61.137 is Microsoft).  Other IP addresses that didn’t resolve go to sites such as McAfee.

Some Harvesting is Good

I agree that some of this data transfer is necessary as I certainly want McAfee making sure that everyone at our company has the latest antivirus updates.  However, some vendors seem to be doing it a bit excessively.  For example, if I filter on just 1e100.net (Google), you can see that the biggest gap between data transfers from my PC to Google is 14 minutes. I’d like to know why they need to connect this frequently and what data they are taking from my devices.

NetFlow report

 

While you might be thinking that you could ask the IT team to block the IP address above, it isn’t that easy.  It appears that Google is using multiple hosts to send the data to.  This was probably done by Google to scale the massive amounts of uploads from hundreds of millions of users.  It may have also been done to make it a bit more difficult for people to block the connections especially if the IP addresses are changing or if the same IP addresses are used by services you want to have access to (E.g. Gmail).

A 7×24 Data Heist

All day long, even when we walk away from our computers, Google, Apple and others are taking data from our devices using an encrypted connection.  Even on my iPhone which doesn’t use Chrome, I see connections to Google and Apple.  In one hour, my iPhone sent data to 17 different IP addresses all controlled by Apple.  What are they taking when I’m not even using my phone?

NetFlow and IPFIX

Some vendors may or may not share details of what they are harvesting but, you can be sure you agreed to it in the End User License Agreement (EULA).  Check out this page on “What Google Knows About You”.  I always log out of Gmail and Facebook but, sometimes I forget to.  I learned that Google was saving the websites I’ve visited since 2008.

What is at Risk

In speaking with friends and co-workers about this massive data collection, I find that most users suspected that it was going on but, most are not aware of the frequency and the extent of the detail.  Some even shrug and say that they have nothing to hide or say “what do I care”.  I want to point out that consumers do need to be concerned for a few different reasons.

  • On mobile devices, all of this data transfer is chewing up your data plan and could lead to the payment of overage charges.
  • In many companies, each employee has a laptop and a mobile device on the network. If each device were constantly streaming 100 kilobits/second back to the cloud, this adds up.  Ten thousand devices = 1Gb/second.
  • Intellectual property theft could be a problem as well. Imagine if a company has a team of engineers researching a technology that they are planning to build and release to the market.  If companies like Google are collecting all of the web sites the engineers are visiting, Google probably has decent insight into what the company’s next product or strategic move might be related to.  If Google is hacked and the data is compromised, the sensitive information could end up for sale on the dark web.

See below, my laptop is averaging over 100kb/s for all of these uploads.

hunt for malware

Harvesting Worse than Malware

This harvesting of data from end users is going up every year.  It’s increasing the volumes of NetFlow and IPFIX that need to be collected, it is increasing our operational costs and it is increasing our risk of intellectual property theft.  Some might argue that the exfiltration of data by companies we agreed to let take it could be worse than what the industry considers to be malware.

Suggestions

Your company will be compromised.  Make sure your incident response plans include reviewing the collected NetFlow and IPFIX records.  Network traffic analytics has become an important part of security planning.  Much of the data being collected from us is going to sites hosted by Akamai and Amazon.  We can be sure that some malware will be loading what they take from us to these same platforms.  Because of this, be prepared to review DNS records and correlate them with flow data.  Watch this webcast on Pivoting During the Hunt for Malware to learn more.