> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.
I'm guessing it's not trivial to fix without breaking other things? The weakness seems to be that anyone can turn UUIDs into details like email. But I assume this functionality is necessary for other flows so they can't just turn off all UUID->email/profile look ups. And similarly hiding author UUIDs on posts also isn't trivial.
Conceptually, I agree it should be easy, but I suspect they're stuck with legacy code and behaviors that rely on the current system. Not breaking anything else while fixing this is likely the time consuming part.
But a user's email isn't always forbidden. The API endpoint which turns UUIDs into a user email presumably also has use cases where you do want to expose the user email. For example, when seeing a list of people you've already invited via email to collaborate with, or listing users within your organization, etc. So a user's email isn't always forbidden PII, it depends on the context.
The trouble is the UUID->email endpoint has no idea what the context is and that endpoint alone can't decide if it should expose email or not. And then public Notion docs publicly expose author UUIDs.
Their mistake was architecting things this way. From day 1 they should have cleanly separated public identifiers from privileged ones. Or have more bespoke endpoints for looking up a UUID's email for each of the narrow contexts in which this is allowed. They didn't do this, and they certainly should have, but fixing this mess is likely a non-trivial amount of work. Though I bet it could be done immediately if they really cared and didn't mind other things breaking.
I'm absolutely not defending their choice to expose emails in this way. They should have addressed this years ago when it was first reported, and I want them shamed for failing to care. But just trying to say it's likely not a one line fix.
You literally don’t know that. Add this to the mammoth file titled “HN comments in which the author makes some completely unsubstantiated technical claim”
It literally is easy to fix. For example they could shut down the servers. Which is what they should do immediately if there is no faster fix for a privacy leak like that.
Recently I checked back on Notion after a year or so of not seeing it. I was going to recommend it to someone as an example of hypertext, but I see now it calls itself an "AI workplace that works for you" and "Your AI everything app". This company means nothing now, seriously what happened.
First: This is documented and we also warn users when they publish a page. But, that’s not good enough!
Second: We don’t like this and are looking at ways to fix this either by removing the PII from the public endpoints or by replacing it with an email proxy similar to GitHub’s equivalent functionality for public commits.
P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
Very timely. I literally ran a Claude prompt "compare and contrast Notion vs Obsidian" and flipped over to HN while it was thinking, and this comes up. Thanks HN!
I've been toying around an architecture that sets things up such that the data for each user is actually stored with each user and only materialized on demand, such that many data leaks would yield little since the server doesn't actually store most of the user data. I mention this since this sorts of leaks are inevitable as long as people are fallible. I feel the correct solution is to not store user data to begin with.
some problems I've identified:
1. suppose you have x users and y groups, of which require some subset of x. joining the data on demand can become expensive, O(x*y).
2. the main usefulness of such an architecture is if the data itself is stored with the user, but as group sizes y increase, a single user's data being offline makes aggregate usecases more difficult. this would lend itself to replicating the data server side, but that would defeat the purpose
3. assuming the previous two are solved, which is very difficult to say the least, how do you secure the data for the user such that someone who knows about this architecture can't just go to the clients and trivially scrape all of the data (per user)?
4. how do you allow for these features without allowing people to modify their data in ways you don't want to allow? encryption?
a concrete example of this would be if HN had it so that each user had a sqlite database that stored all of the posts made per user. then, HN server would actually go and fetch the data for each of the posters to then show the regular page. presumably here if a data of a given user is inaccessible then their data would be omitted.
I love Notion and use it extremely heavily. I've also built a few integrations with Notion. I think it's a great app that uses AI very well, and they continue improving. Hopefully they fix this though! Also, their API has recently been upgraded quite a bit and now supports database views as a first class object. I have a few other small requests regarding their public API.
Two things. One, surely I'm not the only one who knew this data was being stored? Two, calling it a "leak" feels like a stretch when the data was publicly accessible by design from the start.
Yes, some users probably didn't realize their edits to public pages were saved publicly, and that's a legitimate UX complaint. But some of the responsibility has to sit with the user. Otherwise we'd be running daily headlines about Meta "leaking" user data to every advertiser with a checkbook.
I reported this and several other issues with public pages almost six years ago. Some of them were fixed after many years - but they're very slow to handle it. I never received any bug bounty or anything.
I really dislike Notion. Its public API is full of bizarre arbitrary limitations, like a rich text database field can only contain max 100 “child blocks”, where each change in formatting consumes one child block-but its web UI doesn’t have this issue. Yes, I realise the undocumented private API that the web UI uses doesn’t have this issue either-but I shouldn’t have to, and I haven’t.
I don’t love Confluence, but at least it doesn’t do this to me.
154 comments
> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.
The flaw itself is absurd but then just accepting it as "by design" makes it even worse.
Conceptually, I agree it should be easy, but I suspect they're stuck with legacy code and behaviors that rely on the current system. Not breaking anything else while fixing this is likely the time consuming part.
The trouble is the UUID->email endpoint has no idea what the context is and that endpoint alone can't decide if it should expose email or not. And then public Notion docs publicly expose author UUIDs.
Their mistake was architecting things this way. From day 1 they should have cleanly separated public identifiers from privileged ones. Or have more bespoke endpoints for looking up a UUID's email for each of the narrow contexts in which this is allowed. They didn't do this, and they certainly should have, but fixing this mess is likely a non-trivial amount of work. Though I bet it could be done immediately if they really cared and didn't mind other things breaking.
I'm absolutely not defending their choice to expose emails in this way. They should have addressed this years ago when it was first reported, and I want them shamed for failing to care. But just trying to say it's likely not a one line fix.
It is not a public marker, it’s PII.
They can easily withold information they put out intenionally.
If you can't easily architect around it, then don't do what you're trying to do.
"Oh I needed to disclose user data in order to make more money" isn't an acceptable excuse.
> Oh I needed to disclose user data in order to make more money
hmm maybe they should've paywalled?
First: This is documented and we also warn users when they publish a page. But, that’s not good enough!
Second: We don’t like this and are looking at ways to fix this either by removing the PII from the public endpoints or by replacing it with an email proxy similar to GitHub’s equivalent functionality for public commits.
P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
some problems I've identified:
1. suppose you have x users and y groups, of which require some subset of x. joining the data on demand can become expensive, O(x*y).
2. the main usefulness of such an architecture is if the data itself is stored with the user, but as group sizes y increase, a single user's data being offline makes aggregate usecases more difficult. this would lend itself to replicating the data server side, but that would defeat the purpose
3. assuming the previous two are solved, which is very difficult to say the least, how do you secure the data for the user such that someone who knows about this architecture can't just go to the clients and trivially scrape all of the data (per user)?
4. how do you allow for these features without allowing people to modify their data in ways you don't want to allow? encryption?
a concrete example of this would be if HN had it so that each user had a sqlite database that stored all of the posts made per user. then, HN server would actually go and fetch the data for each of the posters to then show the regular page. presumably here if a data of a given user is inaccessible then their data would be omitted.
Yes, some users probably didn't realize their edits to public pages were saved publicly, and that's a legitimate UX complaint. But some of the responsibility has to sit with the user. Otherwise we'd be running daily headlines about Meta "leaking" user data to every advertiser with a checkbook.
Here's a Reddit post just as confirmation: https://www.reddit.com/r/Notion/comments/hqyxid/possible_sec.... I also reported it privately two months prior, of course.
I don’t love Confluence, but at least it doesn’t do this to me.
Tells me everything I need to know about this industry. No regard or seriousness to security at all.