The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.
The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).
Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.
"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.
This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.
I came here to say something similar. As someone who works in a field that applies machine learning but is not purely focused on it, I interact with people who think that arXiv is the only relevant platform and that they don't need to submit their work to any journal, as well as people who still think that preprints don't count at all and that data isn't published until it's printed in an academic journal. It can feel like a clash of worlds.
I think both sides could learn from the other. In the case of ML, I understand the desire to move fast and that average time to publication of 250-300 days in some of the top-tier journals can feel like an unnecessary burden. But having been on both sides of peer review, there is value to the system and it has made for better work.
Not doing any of it follows the same spirit as not benchmarking your approach against more than maybe one alternative and that already as an after-thought. Or benchmaxxing but not exploring the actual real-world consequences, time and cost trade offs, etc.
Now, is academic publishing perfect? Of course not, very very far from it. It desperately needs to be reformed to keep it economically accessible, time efficient for both authors, editors and peer reviewers and to prevent the "hot topic of the day" from dominating journals and making sure that peer review aligns with the needs of the community and actually improves the quality of the work, rather than having "malicious peer review" to get some citations or pet peeves in.
Given the power that the ML field holds and the interesting experiments with open review, I would wish for the field to engage more with the scientific system at large and perhaps try to drive reforms and improve it, rather than completely abandoning it and treating a PDF hosting service as a journal (ofc, preprints would still be desirable and are important, but they can not carry the entire field alone).
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, ...
In my experience as a publishing scientist, this is partly because publishing with "reputable" journals is an increasingly onerous process, with exorbitant fees, enshittified UIs, and useless reviews. The alternative is to upload to arXiv and move on with your life.
> and with just enough moderation to not devolve into spam and chaos
arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.
I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
This just isn't true. arXiv is not a venue. There's no place that gives you credit for arXiv papers. No one cares if you cite an arXiv paper or some random website. The vast vast majority of papers that have any kind of attention or citations are published in another venue.
> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high
Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university
I'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.
Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?
I'm conscious I might be over-simplifying things, but curious to see what I am missing.
It's not that hard to make a mirror or arXiv. Basically, anybody who can pay for hosting (which, I suppose, isn't very cheap now when the whole world uses it). It's a problem to make users switch, because academia seems to have this weird tradition of resisting all practices that, god forbid, might improve global research capabilities and move forward the scientific progress. But then, if arXiv actually becomes unusable, I suppose they won't really have much choice than to switch?
And, FWIW, I do think that arXiv truly has a vast potential to be improved. It is currently in the position to change the whole process of how the research results are shared, yet it is still, as others have said, only a PDF hosting. And since the universities couldn't break out of the whole Elsevier & co. scam despite the internet existing for the 30 years, to me, breaking free from the university affiliation sounds like a good thing.
But, of course, I am talking only about the possibilities being out there. I know nothing about the people in charge of the whole endeavor, and ultimately in depends on them only, if it sails or sinks.
I go back to xxx.lanl.gov days - that is, the beginning. Back then it was all physics, some math and a little quantitative finance (not bitcoin). And the quality was pretty good because it was a preprint archive. In fact, a headline from 2000:
The APS is establishing, in cooperation with Brookhaven National Laboratory, the first electronic mirror in the United States for the Los Alamos e-Print Archive.
Today, from the landing page, it describes itself as "arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of [long list]. Materials on this site are not peer-reviewed by arXiv.
Well, that's a large part of the problem. A lot of the stuff there now will never see a journal (even of dubious quality) and there is limited filtering of what new submissions will be stored. GIGO.
Best thing ArXiv could do is go back to their roots - limit the fields and return to preprint only. Spin off the comp sci stuff for sure to someone else along with all its headaches.
I am sure it’s a dumb idea but why is there a problem for say the National Science Foundation or something to run a website that replicates ArXiv - if you are from an accredited university or whatever you can publish papers, fulfilling the “pdf store” function.
Then getting peer reviewed is a harder process but one can see some form of credit on the site coming from doing a decent reviewers job.
With 300K for the CEO, its enshittification will commence imminently. It will now serve to maximize revenue. Just wait and watch while they issue a premium membership, payment requirements for authors, and other revenue generators to please their investors.
This is exactly what happened last time when scientific publishing got cornered. Journals run by departments and research groups were spun out or sold off to publishers and independent orgs. And they continued to slowly boil the frog over 50 years with fees and gate keeping.
Its especially problematic because while ArXiv love to claim to be working for open science, they don't default to open licensing. Much of the publications they host are not Open Access, and are only read access. So there is definitely the potential to close things off at some point in the future, when some CEO need to increase value.
The endorsement system is a real barrier for independent researchers. I've been trying to get endorsed for cs.NE for weeks — the work is published on aiXiv with video results, but without an institutional email or personal connection to an existing author, you're stuck. Glad to see arXiv thinking about independence — hope they also rethink access for non-institutional researchers.
277 comments
The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).
Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.
"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.
[1] https://tech.cornell.edu/arxiv/
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.
This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.
> arXiv fulfills its function better the less power it has as an institution
It is an interesting instance of the rule of least power, https://en.wikipedia.org/wiki/Rule_of_least_power.
I think both sides could learn from the other. In the case of ML, I understand the desire to move fast and that average time to publication of 250-300 days in some of the top-tier journals can feel like an unnecessary burden. But having been on both sides of peer review, there is value to the system and it has made for better work.
Not doing any of it follows the same spirit as not benchmarking your approach against more than maybe one alternative and that already as an after-thought. Or benchmaxxing but not exploring the actual real-world consequences, time and cost trade offs, etc.
Now, is academic publishing perfect? Of course not, very very far from it. It desperately needs to be reformed to keep it economically accessible, time efficient for both authors, editors and peer reviewers and to prevent the "hot topic of the day" from dominating journals and making sure that peer review aligns with the needs of the community and actually improves the quality of the work, rather than having "malicious peer review" to get some citations or pet peeves in.
Given the power that the ML field holds and the interesting experiments with open review, I would wish for the field to engage more with the scientific system at large and perhaps try to drive reforms and improve it, rather than completely abandoning it and treating a PDF hosting service as a journal (ofc, preprints would still be desirable and are important, but they can not carry the entire field alone).
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, ...
In my experience as a publishing scientist, this is partly because publishing with "reputable" journals is an increasingly onerous process, with exorbitant fees, enshittified UIs, and useless reviews. The alternative is to upload to arXiv and move on with your life.
> and with just enough moderation to not devolve into spam and chaos
arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.
I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
This just isn't true. arXiv is not a venue. There's no place that gives you credit for arXiv papers. No one cares if you cite an arXiv paper or some random website. The vast vast majority of papers that have any kind of attention or citations are published in another venue.
> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high
Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university
Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?
I'm conscious I might be over-simplifying things, but curious to see what I am missing.
And, FWIW, I do think that arXiv truly has a vast potential to be improved. It is currently in the position to change the whole process of how the research results are shared, yet it is still, as others have said, only a PDF hosting. And since the universities couldn't break out of the whole Elsevier & co. scam despite the internet existing for the 30 years, to me, breaking free from the university affiliation sounds like a good thing.
But, of course, I am talking only about the possibilities being out there. I know nothing about the people in charge of the whole endeavor, and ultimately in depends on them only, if it sails or sinks.
APS and BNL Host XXX e-Print Archive Mirror Feb. 1, 2000
The APS is establishing, in cooperation with Brookhaven National Laboratory, the first electronic mirror in the United States for the Los Alamos e-Print Archive.
Today, from the landing page, it describes itself as "arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of [long list]. Materials on this site are not peer-reviewed by arXiv.
Well, that's a large part of the problem. A lot of the stuff there now will never see a journal (even of dubious quality) and there is limited filtering of what new submissions will be stored. GIGO.
Best thing ArXiv could do is go back to their roots - limit the fields and return to preprint only. Spin off the comp sci stuff for sure to someone else along with all its headaches.
fixed: url
Then getting peer reviewed is a harder process but one can see some form of credit on the site coming from doing a decent reviewers job.
I suspect I am missing a lot of nuance …
Its especially problematic because while ArXiv love to claim to be working for open science, they don't default to open licensing. Much of the publications they host are not Open Access, and are only read access. So there is definitely the potential to close things off at some point in the future, when some CEO need to increase value.