A lot of geolocation data on the market is anonymized, following medium-lived unique IDs that aren't able to be mapped to other identifiers. The problem with that is that if you have precise locations, or enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs. You can purchase address and resident listings from a number of different data vendors, and by checking where the device returns to at night you can figure its home address. Then if you find information on the residents (work locations, schools, etc.), you see if said device goes where each resident of the home address is likely to go, and you now have a pretty good idea of exactly who the device belongs to.
Right, there's probably no other phone in the world that typically stops for hours within 1000 feet of my bed and typically stops on Monday-Friday within 1000 feet of my work-desk.
If all you've got is full political power and control over propaganda networks, your won't get the USSR. You'll get Hungary between 2010 and 2026. It works well, but in the critical moments when things start going wrong you need to kill people to maintain power, or else your nascent autocracy collapses as quick as Orban's.
I'm no fun of Stalin, but this meme about 20+ million victims needs to be purged.
"The scholarly consensus affirms that archival materials declassified in 1991 contain irrefutable data far superior to sources used prior to 1991, such as statements from emigres and other informants.
Before the dissolution of the Soviet Union and the archival revelations, some historians estimated that the numbers killed by Stalin's regime were 20 million or higher. After the Soviet Union dissolved, evidence from the Soviet archives was declassified and researchers were allowed to study it. This contained official records of 799,455 executions (1921–1953), around 1.5 to 1.7 million deaths in the Gulag, some 390,000[ deaths during the dekulakization forced resettlement, and up to 400,000 deaths of persons deported during the 1940s, with a total of about 3.3 million officially recorded victims in these categories. According to historian Stephen Wheatcroft, approximately 1 million of these deaths were "purposive" while the rest happened through neglect and irresponsibility. The deaths of at least 5.5 to 6.5 million persons in the Soviet famine of 1932–1933 are sometimes included with the victims of the Stalin era." [0]
I think this begs the question of what anonymous data means. Sure my visit to HN is "anonymous" in that it doesn't say "abustamam visited this site" but piece together all the other visits that have my "anonymous ID" then eventually it paints a pretty nice picture of who I am.
We should have learned this lesson 20 years ago when researchers were able to deanonymize a lot of the Netflix Prize dataset, which contained nothing except movie ratings and their associated dates.
If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.
exactly. calling it 'anonymized' is pure security theater once you have enough data points to map out someones daily routine.
waiting for legislation or eulas to fix this is a lost cause since adtech always finds a loophole. the fix has to be architectural. moving toward stateless proxies that strip device identifiers at the edge before they even hit upstream servers. if the payload never touches a persistent db there is literally nothing to de-anonymize. stateless infra is the only sane way forward
Companies exist that de-anonymize other data brokers data. Lets the other data brokers claim they have anonymized data while end end users get everything.
> enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs
I think a lot of people don't realize the power of a big enough sample size. With enough samples even something pretty innocent looking like your daily step counter could make you identifiable.
As far as I know we don't have large enough databases to make this happen in practice, but I don't think this is impossible in the future.
Location and identity are inextricably linked. You can't destroy identity without also destroying location and location is critical for myriad purposes.
The analytic reconstruction of identity from location is far more sophisticated than the scenarios people imagine. You don't need to know where they live to figure out who they are. Every human leaves a fingerprint in space-time.
From what I've seen none of this is that complex, one could simply 'draw a circle around your house' and get all the "anonymized" device pings and just trace those.
IMO we should ban gathering this data without a warrant or specific contractual agreement between the device owner and entity aggregating the data. As much as congress loves to claim the interstate commerce theory of everything, this seems like a slam dunk.
The problem with all these discussions about banning stuff is that privacy is always on the back foot. It's by design. People who want to surveil and manipulate us are actively investigating new ways of doing it, they get paid for it and they risk nothing in the long run. All of these discussions about specifics are just reactions. They aren't even reactions to the surveillance itself, but rather to a discovery by someone that a new surveillance machine has been constructed and launched.
So the current feedback process involves: construction → exploitation → reporting → public awareness → legislation. This is too slow. Moreover, operating in this environment is exhausting.
We need a different feedback loop altogether. I'm not sure which one would work best, but something different needs to be considered.
Once wealthy and powerful people realize how this can be used to track them they will start cracking down. One of many examples for how underrated access to location data is for unauthorized people, it is a primary way that the military locates and kills targets in foreign countries. It is surprising all of the data is so freely available with data brokers. Or in some cases from the app companies themselves, if you're willing to make it worth the trouble for them.
Most people don't realize how bad geolocated data is for a free society. I can buy data from a broker, geo-fence your house address, and then I'm able to see all the places where you went, who you associate with, and identify all you associate with by tracking them to addresses. All of this happens with anonymized device identifiers. It is the wet dream of a company such as Palantir and all governments who desire absolute control over their populations.
Let’s just stretch copyright to cover movement/location as a protected creative expression. It’s somewhat ridiculous but we’ve already established case law and technology for handling/mishandling protected assets.
It's frustrating how much of the system already is a hassle for everyone who already does the things right.
I make some apps that use precise location. We don't sell the data, we share it with the interested party who the app user is working for only. The user is fully aware when the data is collected and when not, they can even turn it off at will. Nobody even cares who has the device, just that they're delivering a package or something to that effect. It's all good.
Then I hit the app-stores and it's like pulling teeth with paperwork and rejections (depends on the reviewer, sometimes you fly through, sometimes very much not).
I have to attest that we don't do shit with an email that may or may not appear in a field, that we don't do porn (I have no idea where that reviewer got that idea). Some reviewer misread something and now I have to explain that no we don't use contacts, we never do... wtf. More delays.
I'm privacy minded so I don't so much mind the IDEA that I have to attest to all these things, but all the hoops I have to run through is because the bad guys who steal all this info and sell it do their thing, but they're still doing their bad thing just fine ...
Meanwhile my app is stuck in review.
It feels like being pulled over randomly by the cops because "well there's a lot of speeders out there" when I'm not one of them. It's all hassle for everyone doing it right, and for the bad guys or the apps that clearly don't have to follow the rules that everyone else does it has no impact on them.
When I had the opportunity to peer into public records, I found some extremely intriguing stuff.
There was one person with a feminine name who showed up with a “home address” that would correspond to being my “neighbor” at home, at my clinic, at church, when I went to college, etc. All the years corresponded correctly, and the addresses were some residential place about a block or less away from the places where I went.
For all I know, this person was either fictional or an innocent bystander. She did appear to have a Facebook account or two. I was never able to directly contact her. But I found it very strange and I wondered what would be gained by doxxxing me in this manner?
Of course this has nothing directly to do with GPS coordinates, but imagine if the GPS began to be part of your public record as well, or on your credit report. Imagine if it was entered into the public record what coffee house you visited every morning, or if there were errors in this record.
There needs to be a believeable legal framework behind this.
Imagine a option on your iPhone that says “Enable this to allow geo-location tracking for organisations registered under the NOADSJUSTPUBLICGOOD Act” - then any wifi endpoint could locate you as long based on signal strength etc and that data could only be made available to people registered under the act.
Would we see new understanding of how people move around in cities, would we see better traffic information, Inthink so - as long as people believe that there are real teeth to the laws and they enforced loudly and publically.
We should embrace the benefits of a society wide epidemiology experiment - the benefits for public health are incredible. (Add to that supply chain logistics on open ledgers and many of the new things that just were not possible before and the future of open transparent but well regulated democracies is bright.
I'm of the opinion now that posting videos online without the explicit permission of EVERYONE in the video should be illegal. It's one thing to take a video and keep it on your phone but if you share it outside of your family and only your family, then it needs to have the expressed consent of everyone whose face is on it otherwise it should be a crime.
The previous views on privacy didn't take into account the fact that everyone now has video cameras and people are incentivized to violate privacy to make money as influencers. I think people's privacies need to be protected and I think that means making laws around it much, much stricter. This includes things like location data, it shouldn't be sold or exposed at all.
The examples show Android devices. How does Webloc track iOS devices given Apple doesn't allow unique IDs and allows the user to disable the ad ID? I wish these articles would go into a bit more detail for the technical reader.
Does anyone know of any groups that are organizing and lobbying to get things like this into law? I know about the EFF but they seem to be more focused on documenting and reporting instead of lobbying and getting things passed.
Perhaps you saw the news about GM reaching an FTC settlement because they were tracking the locations cars they made at all times and selling the information to LexisNexis. You might have left the articles with a belief that GM agreed to stop.
Their settlement allows them to sell precise locations -- but with 'anonymous ids' instead of names, and also allows them to sell data with name/personal information but only zipcode resolution location information.
I had a theory that the way to solve this was a location intelligence data union which sold safely anonymised aggregates and shared the profits, while also litigating on behalf of members under available legislation to stop other people using their data.
Alas, I was stymied by not having any cash to work on it, and the unit economics were not very VC friendly (at least I assume that’s one of the reasons why I didn’t get any traction from VCs).
Smartphones, mobile apps, mobile networks, and WiFi stopped being your friends around 2015-2016. Now it's just a matter of how much data can be harvested from device sensors in real time until reaching a pain point which doesn't exist.
Soon Geolocation will be tied to Age! Then you can meet locals and congratulate them on their birthday. The movie Minority Report was way too timid in its prediction here. Age up everything! \o/
I want geolocation to not be sold. Yet, I do not believe we have been successful in banning the sale of cocaine and elephant tusks. What makes us think this will be an easier problem to solve?
198 comments
It's a rhetorical fiction the ad industry tells itself.
"The scholarly consensus affirms that archival materials declassified in 1991 contain irrefutable data far superior to sources used prior to 1991, such as statements from emigres and other informants.
Before the dissolution of the Soviet Union and the archival revelations, some historians estimated that the numbers killed by Stalin's regime were 20 million or higher. After the Soviet Union dissolved, evidence from the Soviet archives was declassified and researchers were allowed to study it. This contained official records of 799,455 executions (1921–1953), around 1.5 to 1.7 million deaths in the Gulag, some 390,000[ deaths during the dekulakization forced resettlement, and up to 400,000 deaths of persons deported during the 1940s, with a total of about 3.3 million officially recorded victims in these categories. According to historian Stephen Wheatcroft, approximately 1 million of these deaths were "purposive" while the rest happened through neglect and irresponsibility. The deaths of at least 5.5 to 6.5 million persons in the Soviet famine of 1932–1933 are sometimes included with the victims of the Stalin era." [0]
https://en.wikipedia.org/wiki/Excess_mortality_under_Joseph_...
> I'm no fun of Stalin
I would argue for the generality of this characterization
Edit: It's a rhetorical fiction the ad industry tells us.
https://arxiv.org/abs/cs/0610105
If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.
waiting for legislation or eulas to fix this is a lost cause since adtech always finds a loophole. the fix has to be architectural. moving toward stateless proxies that strip device identifiers at the edge before they even hit upstream servers. if the payload never touches a persistent db there is literally nothing to de-anonymize. stateless infra is the only sane way forward
> enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs
I think a lot of people don't realize the power of a big enough sample size. With enough samples even something pretty innocent looking like your daily step counter could make you identifiable.
As far as I know we don't have large enough databases to make this happen in practice, but I don't think this is impossible in the future.
The analytic reconstruction of identity from location is far more sophisticated than the scenarios people imagine. You don't need to know where they live to figure out who they are. Every human leaves a fingerprint in space-time.
> A lot of geolocation data on the market is anonymized
A lot isn't good enough.
So the current feedback process involves: construction → exploitation → reporting → public awareness → legislation. This is too slow. Moreover, operating in this environment is exhausting.
We need a different feedback loop altogether. I'm not sure which one would work best, but something different needs to be considered.
Until that changes you're going to be stuck.
Something as simple as the data protections act 1998 (https://en.wikipedia.org/wiki/Data_Protection_Act_1998) would kneecap a lot of the shady shit that goes on in the USA.
I make some apps that use precise location. We don't sell the data, we share it with the interested party who the app user is working for only. The user is fully aware when the data is collected and when not, they can even turn it off at will. Nobody even cares who has the device, just that they're delivering a package or something to that effect. It's all good.
Then I hit the app-stores and it's like pulling teeth with paperwork and rejections (depends on the reviewer, sometimes you fly through, sometimes very much not).
I have to attest that we don't do shit with an email that may or may not appear in a field, that we don't do porn (I have no idea where that reviewer got that idea). Some reviewer misread something and now I have to explain that no we don't use contacts, we never do... wtf. More delays.
I'm privacy minded so I don't so much mind the IDEA that I have to attest to all these things, but all the hoops I have to run through is because the bad guys who steal all this info and sell it do their thing, but they're still doing their bad thing just fine ...
Meanwhile my app is stuck in review.
It feels like being pulled over randomly by the cops because "well there's a lot of speeders out there" when I'm not one of them. It's all hassle for everyone doing it right, and for the bad guys or the apps that clearly don't have to follow the rules that everyone else does it has no impact on them.
The whole system is borked.
https://citizenlab.ca/research/analysis-of-penlinks-ad-based...
There was one person with a feminine name who showed up with a “home address” that would correspond to being my “neighbor” at home, at my clinic, at church, when I went to college, etc. All the years corresponded correctly, and the addresses were some residential place about a block or less away from the places where I went.
For all I know, this person was either fictional or an innocent bystander. She did appear to have a Facebook account or two. I was never able to directly contact her. But I found it very strange and I wondered what would be gained by doxxxing me in this manner?
Of course this has nothing directly to do with GPS coordinates, but imagine if the GPS began to be part of your public record as well, or on your credit report. Imagine if it was entered into the public record what coffee house you visited every morning, or if there were errors in this record.
Imagine a option on your iPhone that says “Enable this to allow geo-location tracking for organisations registered under the NOADSJUSTPUBLICGOOD Act” - then any wifi endpoint could locate you as long based on signal strength etc and that data could only be made available to people registered under the act.
Would we see new understanding of how people move around in cities, would we see better traffic information, Inthink so - as long as people believe that there are real teeth to the laws and they enforced loudly and publically.
We should embrace the benefits of a society wide epidemiology experiment - the benefits for public health are incredible. (Add to that supply chain logistics on open ledgers and many of the new things that just were not possible before and the future of open transparent but well regulated democracies is bright.
Let me know if you spot one.
What about: "If something bad happens because of the data your company shared or lost, it is criminally and financially liable?"
The previous views on privacy didn't take into account the fact that everyone now has video cameras and people are incentivized to violate privacy to make money as influencers. I think people's privacies need to be protected and I think that means making laws around it much, much stricter. This includes things like location data, it shouldn't be sold or exposed at all.
Their settlement allows them to sell precise locations -- but with 'anonymous ids' instead of names, and also allows them to sell data with name/personal information but only zipcode resolution location information.
Alas, I was stymied by not having any cash to work on it, and the unit economics were not very VC friendly (at least I assume that’s one of the reasons why I didn’t get any traction from VCs).
Missed opportunity by the EU when they wrote GDPR.