> This ... header ... will not contain any personally identifiable information
> a seed number which is randomly selected on first run ... chosen between 0 and 7999 (13 bits of entropy)
They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
> Experiments may be further limited by country (determined by your IP address)
They even admit to inspecting the IP address...
> operating system, Chrome version and other parameters.
...and many additional sources of entropy.
[1] why 24 bits instead of 32? The LSB of the address might be zeroed if the packet is affected by Googles faux-"anonymization" feature ( https://news.ycombinator.com/item?id=15167059 )
> > Experiments may be further limited by country (determined by your IP address)
> They even admit to inspecting the IP address...
I don't think that sentence admits what you say? Chrome could be determining which experiments to run client-side.
Of course, when you visit a Google property, they needs must inspect your IP address to send a response to you, at a minimum. That goes for any site you might choose to visit. The existence of sufficient entropy to personally identify a site visitor is not a state secret. They do not need this chrome experiment seed to identify you, if that's a goal.
Yeah, it's not a "state secret" but it's not common knowledge either. Their privacy policy says that specific header can't be used to identify you, but fails to mention it can be combined with other information to make browser fingerprinting trivial.
If you don't know how all this works, which is true for most human beings, their privacy policy might give you the wrong impression.
> says that specific header can't be used to identify you
That's not what it says. It says the header won't contain PII, which is true. It can be linked to PII, but so can literally every bit of information you send to Google while logged into or otherwise using their services. A disclaimer to this effect would not have any purpose.
That's the whole point. Using any Google service means they can easily personally identify you, that's what the privacy policy should explain.
That's their policy towards privacy, you don't have any. For some reason I can't fathom, you claim mentioning this in their privacy policy "would not have any purpose". Instead of honesty, their privacy policy is a wonder of public relations where it seems like they care deeply about protecting your privacy.
We disagree about the purpose of privacy policies. I believe that privacy policies should describe how data will be used, not how it could be used. I just don't think a policy describing how data could be used is very useful, because it's going to be the same for all services.
Under this formulation, Google's policy is (presumably, lacking any data to the contrary) honest with respect to this value.
"I believe that privacy policies should describe how the data will be used, not how it could be used."
Google's policy does not tell the user how her data will be used by Google's customers. The policy states Google will use the data to "provide better services". That is deliberately vague. That is the "purpose", but how exactly is the data used to achieve that purpose. There are no specifics with which a user could object.
Google does not only serve the search engine user, the email user, the YouTube user, etc. Its business is not free services. As such the policy is misleading as to what are the "Services" it may use the data to improve. Google's business is providing online ad services.
The truth is that Google collects data to provide better services to advertisers. The policy reads as if it only collects data to provide better services to users. The "free" services are just bait to draw users in. The data is collected to improve online ad services.
> The truth is that Google collects data to provide better services to advertisers.
I understand that that is what you believe, but I do not think this is factually true about the data collected from this Chrome header. I believe that Chrome team collects it in order to understand the impact of Chrome experiments on performance.
> I believe that privacy policies should describe how data will be used, not how it could be used.
This is key. If you subscribe to the "how it could be used" version, then even say possessing an android phone would be a violation of the privacy policy. Which is absurd.
Per your observation, I would argue that the intent of the privacy policy as quoted above is pretty clear. When the policy says that the identifier doesn't contain PII, I believe that is meant to convey that it will not be used to identify you. But it's true that that use is not explicitly excluded. I'm not a lawyer so I couldn't tell you if being weasely in this way would count as fraud or not. Otoh, I suspect that Google is actually abiding by the spirit of the policy they wrote because honestly they have little to gain and much to lose by violating it.
If I log in to my Google account once, they can associate that browser id with my account. Even if I log out, clear my cookies (and probably use the incognito mode), Google will be able to identify and follow me all over the Web.
I don't know about your PII thing, but it's personal data under the GDPR.
AIUI GDPR restricts the handling and use of PII, not its existence. So it's PII under GDPR. Is Google misusing it? If so, that's an issue. If not, then it's kinda pointless to observe that it's PII under some possibly distinct legal definition than the one Google is using in its privacy policy.
I don't math very much, but I would guess the intersection of these sets of people is nil: people who 1) use VPN to avoid tracking by Google 2) still log in to Google services from one of their networks and not the other 3) use the same Chrome profile on both. But suppose some small number exist who adopt this illogical and contradictory pattern of behavior. If Google is using this token for the purpose of tracking this tiny set of people when the vast majority could be tracked more easily via conventional means, it would imply that they are far more competent than I give them credit for.
> They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not? If the IPv4 address, user-agent string, or some other behavior is sufficient to reveal a great deal of stuff, we have a more serious problem than that 13 bits. I agree that 13-bit seed is a concern. But I am wondering if it is a concern per se, or its orchestration with something else. Of course, how/whether Google keeps those data also matters.
>Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not?
At the very least, having those 13 bits of entropy along with a /24 subnet allows you to have device-level granularity, whereas a /24 subnet may be shared by hundreds of households.
... which is crazy unrealistic, since it's "PII" that can only stay "private" by collective agreement of every node in the network, but no accounting for the reality of network architecture in passing law, I guess.
Maybe a deep expectation of anonymity while accessing a worldwide network of cooperative machines is something people should stop telling the public they should expect?
Under GDPR you can use all the PII you reasonably need to provide expected services, you don't even need separate consent. But, if you have PII, the moment you use it for other purposes, or obtain/retain/share without proper cause, you are breaking the law.
IMHO, that is very reasonable.
Real world example - giving your phone number and information to your car mechanic / doctor / bank teller / plumber is reasonable. Using that information to score girls or ask donation for a puppy shelter would be considered improper.
I totally agree, and I think the GDPR is also reasonable in that it allows you to use the IP address for essential security reasons, such as blocking bad actors based on IP address - it doesn't say "thou shalt not track IP addresses", it says you need consent if you're going to use it for anything that isn't essential for security or in your end user's best interest.
Or they can stay 'private' by not being stored or correlated with other user data. GDPR doesn't apply to the network itself, it applies to whoever is using it.
"Stored" is definitely the purpose of a router. "Correlated" can be necessary for debugging routing issues (or client-server connection issues that are tied to the intermediary fabric near the client doing something weird; hard to determine if an entire subnet is acting up if you aren't allowed to maintain state on errors correlated to IP address).
I care. I care that I even if I log off, even if I use a vpn, even if I go into incognito mode, they still can associate my requests with the account I initially logged in.
The problem is any website can do that. Incognito-bypassing fingerprinting is difficult to prevent, unless you use something like uMatrix to disallow JavaScript from everything but a few select domains.
This is a collection of random-ish unique-ish attributes. Any collection of such things can be used to track you, like installed fonts, installed extensions, etc. If this were just a set of meaningless encoded random numbers, then it's essentially a kind of cookie, but that's not what it is. This is (claimed to be) a collection of information that's useful and possibly needed by some backends when testing new Chrome features. It tells servers what your Chrome browser supports. The information is probably similar to "optimizeytvids=1,betajsparser=1".
So, the only question is if Google is actually using this to help fingerprint users in addition to the pragmatic use case. It certainly could be used that way, and it's possible they are, but they have so many other ways of doing that with much higher fidelity / entropy if they want to. If this were intended as a sneaky undisclosed fingerprinting technique, I think they would've ensured it was actually 100% unique per installation, with a state space in the trillions, rather than 8000.
Yes, this could be so sneaky that they took this into consideration and made it low-entropy to create plausible deniability while still being able to increase entropy when doing composite fingerprinting, but I think it's pretty unlikely. Also, 99% of the time they could probably just use use Google Analytics and Google login cookies to do this anyway.
Maybe one actually useful non-advertising usage could be reCAPTCHA ?
If you read carefully, it says nowhere than there is the limit to 8000. There is this limit of 8000 only if you disable usage statistics / crash reports.
Sorry about that, too late to edit it now. That is an important detail. If there are 32 or more different feature flags, then that's 4 billion unique states, which would be an effective fingerprint.
I still think it's pretty unlikely they're using it in that way or would in the future, and I think Google fuzzing this for those who opt out of telemetry is probably a signal of good faith in this instance. They realize the privacy implications and provide a way to disengage, even if they don't intend to abuse the information.
But of course the potential for abuse always remains. And the potential for (arguably) non-abusive tracking, like the possibility of it being used for bot detection by reCAPTCHA, as you say.
reCAPTCHA is the most abusive type of tracking. Google simply denys you usage of captcha if you do not give them enough personal information. It doesn't matter if you enter the captcha correctly 20 times. It won't let you in.
This is part of the bot detection, though. It's probably not "not enough personal information", it's "this truly seems like it is unlikely to be a legitimate device/person", due to the huge datasets they're working with. Same with Cloudflare and Tor. Once you operate a security service anywhere near that scale, you start to understand there are inherent challenges and tradeoffs like these,
reCAPTCHA increasingly doesn't even give me a captcha. Instead, they simply deny me from even trying; They send this instead of the challenge:
<div>
<noscript>
Please enable JavaScript to
get a reCAPTCHA challenge.<br>
</noscript>
<div class="if-js-enabled">
Please upgrade to a
<a href="[1]">supported browser</a>
to get a reCAPTCHA challenge.
</div>
<br><br>
<a href="[2]" target="_blank">
Why is this happening to me?</a>
</div>
They probably don't like my non-standard user agent string and they definitely don't like that I block a lot of their spyware, but reCAPTCHA used to work properly for many years with the same/similar browser configuration.
Normally you would only expect to be identified and tracked when using Google services when logged in. The significance of this post is that they would be able to identify and track you across all your usage of that browser installation regardless of if you've logged out, or say in an incognito window.
Yes you are missing something important. Once they've tied the browser ID to your personal account they can track you across all google properties, even the ones that you didn't log into.
Unless you're running some extension that emulates FF's container tabs or something, it logs you into all G services. It would matter, though, if this header is still sent in incognito sessions.
I still don't understand. When I log into gmail, it logs me into all Google services. If I am worried about being tracked, surely my first mistake is logging in in the first place? Or visiting in the first place? After all, even if I click "log out," I'm only trusting Google that they unlinked the browser state from the account. If I trust them to do that, I don't see why I shouldn't trust them to ignore this experiment flag from Chrome, or at least not use it for tracking. If I don't trust them to avoid using the experiment state, I don't really see how you can trust them for anything.
Anyway, if you're not building Chrome from source, then you have to trust that they aren't putting anything bad in it. And if you are building chrome from source, you can observe that they only send this experiment ID to certain domains, and they already know who you are on those domains anyway.
I think the argument is they have other methods like cookies they could also use. The fact you trust them not to use those methods extends to this form of tracking.
> This ... header ... will not contain any personally identifiable information
> a seed number which is randomly selected on first run ... chosen between 0 and 7999 (13 bits of entropy)
They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
> Experiments may be further limited by country (determined by your IP address)
They even admit to inspecting the IP address...
> operating system, Chrome version and other parameters.
...and many additional sources of entropy.
[1] why 24 bits instead of 32? The LSB of the address might be zeroed if the packet is affected by Googles faux-"anonymization" feature ( https://news.ycombinator.com/item?id=15167059 )