> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.
I'm guessing it's not trivial to fix without breaking other things? The weakness seems to be that anyone can turn UUIDs into details like email. But I assume this functionality is necessary for other flows so they can't just turn off all UUID->email/profile look ups. And similarly hiding author UUIDs on posts also isn't trivial.
Conceptually, I agree it should be easy, but I suspect they're stuck with legacy code and behaviors that rely on the current system. Not breaking anything else while fixing this is likely the time consuming part.
But a user's email isn't always forbidden. The API endpoint which turns UUIDs into a user email presumably also has use cases where you do want to expose the user email. For example, when seeing a list of people you've already invited via email to collaborate with, or listing users within your organization, etc. So a user's email isn't always forbidden PII, it depends on the context.
The trouble is the UUID->email endpoint has no idea what the context is and that endpoint alone can't decide if it should expose email or not. And then public Notion docs publicly expose author UUIDs.
Their mistake was architecting things this way. From day 1 they should have cleanly separated public identifiers from privileged ones. Or have more bespoke endpoints for looking up a UUID's email for each of the narrow contexts in which this is allowed. They didn't do this, and they certainly should have, but fixing this mess is likely a non-trivial amount of work. Though I bet it could be done immediately if they really cared and didn't mind other things breaking.
I'm absolutely not defending their choice to expose emails in this way. They should have addressed this years ago when it was first reported, and I want them shamed for failing to care. But just trying to say it's likely not a one line fix.
You literally don’t know that. Add this to the mammoth file titled “HN comments in which the author makes some completely unsubstantiated technical claim”
It literally is easy to fix. For example they could shut down the servers. Which is what they should do immediately if there is no faster fix for a privacy leak like that.
Theres just a higher form of malicious stupidity, where the people who own these platforms can be selectively, maliciously stupid where it comes to security.
Recently I checked back on Notion after a year or so of not seeing it. I was going to recommend it to someone as an example of hypertext, but I see now it calls itself an "AI workplace that works for you" and "Your AI everything app". This company means nothing now, seriously what happened.
I haven't used Notion the last couple of years either, but there was a multi-year period where someone at each of the companies I was at would champion it, convince someone high enough to transition the team to it, and it would slow the team down so much. There was a joke at one point amongst coworkers that it might not be bad subterfuge to get someone hired at a rival in order to introduce Notion there.
Anyways, I think Notion has a learning curve that is a little longer than one expects. I can believe that with some dedicated learning time I could be turned into a believer. But I also distinctly had the impression that it was one of those things where it saved a ton of time for a few narrow-visioned people (the people who championed it), but added meaningful time to everyone else's. Those people were largely project managers or operations folks, and transitively the leaders they reported to. It heavily threw the switch towards "legibility" over reality.
It's like when someone new to a messy project, creates a spreadsheet, and says, "Let not overthink this, everybody just fill in your project details in your row". If your work, which you are the expert on, doesn't fit nicely into the person's columns, it's not easy for you to fill out. Meanwhile, the person who created the spreadsheet, gets what looks like a neat and orderly answer to everything. All the messy things—which are or at least have in them the correct status of the thing—will be masked under a clean and simple, but rather incorrect, thing. That spreadsheet will also travel far specifically because it's neat and therefore portable. There aren't a bunch of "it depends" in it.
They’ve basically positioned themselves as a workplace app for years now. A fully integrated project management and documentation really is just asking for AI to be part of it
I think it does all of this really well... Especially as someone coming from the dystopia of permissions management that is Atlassian, I really like notion.
It never meant anything. Motion has always tried to be everything, do everything and work for absolutely everyone and that has always meant it was just a jumbled mess of pure waste of computing cycles. Notion has always been a disgrace of an app and a service—shoving AI into it is just the natural next step for a “whatever” company such as this.
While I don't share your condemnation, I do share your critique of the design. When I tried Notion--really wanting to like it--I could find essentially zero documentation about how to do use those numerous features. After wasting a ton of time trying to get a document template I wanted, I gave up and went back to simpler tools.
After making one of the least worst rich editors out there on the web, they needed to keep their developers and designers busy (while not having time to fix privacy bugs).
Like every other AI tool it mainly seems to exist to produce productivity porn. Summarize the meetings nobody could be bothered to summarize. Write the docs nobody can be bothered to read or write. Communicate as an end, not a means, because the company your work for has transitioned into the dead-weight phase.
First: This is documented and we also warn users when they publish a page. But, that’s not good enough!
Second: We don’t like this and are looking at ways to fix this either by removing the PII from the public endpoints or by replacing it with an email proxy similar to GitHub’s equivalent functionality for public commits.
P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
Can you share the warning? I made a public page and would say it was not clear to me this was a consequence of doing that. The warning as I remember it (a month ago) makes it sound like the information on this page is going to be public -- not - oh yeah the email addresses of everyone who edited this page will also be leaked.
Considering it was reported in 2022, and it is obviously an error, I don't think it is unfair for people here to have expected it to be fixed by now since it was first reported.
You should explain WHY that is not the case, or else accept that everyone's takeaway about this is that you've KNOWN you've been leaking your users' data for FOUR YEARS and have done nothing about it by CHOICE.
> P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
Ignoring the “the bug was raised four years ago” part and assuming you just mean it isn't as easy as that and might break other things: what other things could resolving this potentially break? If the issue is that the PII needs to be present for private/authenticated views, would not making it unavailable everywhere including there, and fixing that later, be the better option over leaving the PII present for public views for a second longer?
For a PII leak like this, why do you think it's OK to wait for "looking at ways to fix this"? It you can't do anything better you shut those endpoints (or whole servers) down IMMEDIATELY and then deal with the fallout afterwards. Your attitude towards this is beyond unacceptable.
> P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
Nonsense! It is a 1 minute fix. You just don't want to take a $ hit from inconveniencing users by breaking another part of your app.
Pull your thumb out and do the right thing. Implement the 1 minute fix, and then spend the rest of the week or month fixing the other parts of your app that might break as a result of fixing this.
Very timely. I literally ran a Claude prompt "compare and contrast Notion vs Obsidian" and flipped over to HN while it was thinking, and this comes up. Thanks HN!
I've been toying around an architecture that sets things up such that the data for each user is actually stored with each user and only materialized on demand, such that many data leaks would yield little since the server doesn't actually store most of the user data. I mention this since this sorts of leaks are inevitable as long as people are fallible. I feel the correct solution is to not store user data to begin with.
some problems I've identified:
1. suppose you have x users and y groups, of which require some subset of x. joining the data on demand can become expensive, O(x*y).
2. the main usefulness of such an architecture is if the data itself is stored with the user, but as group sizes y increase, a single user's data being offline makes aggregate usecases more difficult. this would lend itself to replicating the data server side, but that would defeat the purpose
3. assuming the previous two are solved, which is very difficult to say the least, how do you secure the data for the user such that someone who knows about this architecture can't just go to the clients and trivially scrape all of the data (per user)?
4. how do you allow for these features without allowing people to modify their data in ways you don't want to allow? encryption?
a concrete example of this would be if HN had it so that each user had a sqlite database that stored all of the posts made per user. then, HN server would actually go and fetch the data for each of the posters to then show the regular page. presumably here if a data of a given user is inaccessible then their data would be omitted.
I love Notion and use it extremely heavily. I've also built a few integrations with Notion. I think it's a great app that uses AI very well, and they continue improving. Hopefully they fix this though! Also, their API has recently been upgraded quite a bit and now supports database views as a first class object. I have a few other small requests regarding their public API.
Two things. One, surely I'm not the only one who knew this data was being stored? Two, calling it a "leak" feels like a stretch when the data was publicly accessible by design from the start.
Yes, some users probably didn't realize their edits to public pages were saved publicly, and that's a legitimate UX complaint. But some of the responsibility has to sit with the user. Otherwise we'd be running daily headlines about Meta "leaking" user data to every advertiser with a checkbook.
I reported this and several other issues with public pages almost six years ago. Some of them were fixed after many years - but they're very slow to handle it. I never received any bug bounty or anything.
I really dislike Notion. Its public API is full of bizarre arbitrary limitations, like a rich text database field can only contain max 100 “child blocks”, where each change in formatting consumes one child block-but its web UI doesn’t have this issue. Yes, I realise the undocumented private API that the web UI uses doesn’t have this issue either-but I shouldn’t have to, and I haven’t.
I don’t love Confluence, but at least it doesn’t do this to me.
154 comments
> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.
The flaw itself is absurd but then just accepting it as "by design" makes it even worse.
Conceptually, I agree it should be easy, but I suspect they're stuck with legacy code and behaviors that rely on the current system. Not breaking anything else while fixing this is likely the time consuming part.
The trouble is the UUID->email endpoint has no idea what the context is and that endpoint alone can't decide if it should expose email or not. And then public Notion docs publicly expose author UUIDs.
Their mistake was architecting things this way. From day 1 they should have cleanly separated public identifiers from privileged ones. Or have more bespoke endpoints for looking up a UUID's email for each of the narrow contexts in which this is allowed. They didn't do this, and they certainly should have, but fixing this mess is likely a non-trivial amount of work. Though I bet it could be done immediately if they really cared and didn't mind other things breaking.
I'm absolutely not defending their choice to expose emails in this way. They should have addressed this years ago when it was first reported, and I want them shamed for failing to care. But just trying to say it's likely not a one line fix.
It is not a public marker, it’s PII.
They can easily withold information they put out intenionally.
If you can't easily architect around it, then don't do what you're trying to do.
"Oh I needed to disclose user data in order to make more money" isn't an acceptable excuse.
> Oh I needed to disclose user data in order to make more money
hmm maybe they should've paywalled?
Anyways, I think Notion has a learning curve that is a little longer than one expects. I can believe that with some dedicated learning time I could be turned into a believer. But I also distinctly had the impression that it was one of those things where it saved a ton of time for a few narrow-visioned people (the people who championed it), but added meaningful time to everyone else's. Those people were largely project managers or operations folks, and transitively the leaders they reported to. It heavily threw the switch towards "legibility" over reality.
It's like when someone new to a messy project, creates a spreadsheet, and says, "Let not overthink this, everybody just fill in your project details in your row". If your work, which you are the expert on, doesn't fit nicely into the person's columns, it's not easy for you to fill out. Meanwhile, the person who created the spreadsheet, gets what looks like a neat and orderly answer to everything. All the messy things—which are or at least have in them the correct status of the thing—will be masked under a clean and simple, but rather incorrect, thing. That spreadsheet will also travel far specifically because it's neat and therefore portable. There aren't a bunch of "it depends" in it.
It never meant anything. Motion has always tried to be everything, do everything and work for absolutely everyone and that has always meant it was just a jumbled mess of pure waste of computing cycles. Notion has always been a disgrace of an app and a service—shoving AI into it is just the natural next step for a “whatever” company such as this.
Like every other AI tool it mainly seems to exist to produce productivity porn. Summarize the meetings nobody could be bothered to summarize. Write the docs nobody can be bothered to read or write. Communicate as an end, not a means, because the company your work for has transitioned into the dead-weight phase.
> I was going to recommend it to someone as an example of hypertext
What does this mean?
First: This is documented and we also warn users when they publish a page. But, that’s not good enough!
Second: We don’t like this and are looking at ways to fix this either by removing the PII from the public endpoints or by replacing it with an email proxy similar to GitHub’s equivalent functionality for public commits.
P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
> P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
4 years.
Ignoring the “the bug was raised four years ago” part and assuming you just mean it isn't as easy as that and might break other things: what other things could resolving this potentially break? If the issue is that the PII needs to be present for private/authenticated views, would not making it unavailable everywhere including there, and fixing that later, be the better option over leaving the PII present for public views for a second longer?
What are you doing to address the support issues that allowed such a privacy issue to remain after being reported?
What are you doing to address the issues with the company's prioritisation framework that allowed such a privacy issue to remain for 4 years?
Which authorities are you reporting the privacy issue to in line with local requirements?
> P.S: Some folks here have speculated that this should be a 1 minute fix. Unfortunately that is not the case. :(
Nonsense! It is a 1 minute fix. You just don't want to take a $ hit from inconveniencing users by breaking another part of your app.
Pull your thumb out and do the right thing. Implement the 1 minute fix, and then spend the rest of the week or month fixing the other parts of your app that might break as a result of fixing this.
some problems I've identified:
1. suppose you have x users and y groups, of which require some subset of x. joining the data on demand can become expensive, O(x*y).
2. the main usefulness of such an architecture is if the data itself is stored with the user, but as group sizes y increase, a single user's data being offline makes aggregate usecases more difficult. this would lend itself to replicating the data server side, but that would defeat the purpose
3. assuming the previous two are solved, which is very difficult to say the least, how do you secure the data for the user such that someone who knows about this architecture can't just go to the clients and trivially scrape all of the data (per user)?
4. how do you allow for these features without allowing people to modify their data in ways you don't want to allow? encryption?
a concrete example of this would be if HN had it so that each user had a sqlite database that stored all of the posts made per user. then, HN server would actually go and fetch the data for each of the posters to then show the regular page. presumably here if a data of a given user is inaccessible then their data would be omitted.
Yes, some users probably didn't realize their edits to public pages were saved publicly, and that's a legitimate UX complaint. But some of the responsibility has to sit with the user. Otherwise we'd be running daily headlines about Meta "leaking" user data to every advertiser with a checkbook.
Here's a Reddit post just as confirmation: https://www.reddit.com/r/Notion/comments/hqyxid/possible_sec.... I also reported it privately two months prior, of course.
I don’t love Confluence, but at least it doesn’t do this to me.
Tells me everything I need to know about this industry. No regard or seriousness to security at all.