ArXiv declares independence from Cornell (science.org)

by bookstore-romeo 277 comments 811 points
Read article View on HN

277 comments

[−] frankling_ 57d ago
The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

[−] swiftcoder 57d ago

> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high

Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university

[−] halperter 57d ago
[−] whiplash451 57d ago
I'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.

Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?

I'm conscious I might be over-simplifying things, but curious to see what I am missing.

[−] psalminen 57d ago
I might be missing something, but I still don't get the why. I don't see any "problem" that needs to be solved.
[−] krick 57d ago
It's not that hard to make a mirror or arXiv. Basically, anybody who can pay for hosting (which, I suppose, isn't very cheap now when the whole world uses it). It's a problem to make users switch, because academia seems to have this weird tradition of resisting all practices that, god forbid, might improve global research capabilities and move forward the scientific progress. But then, if arXiv actually becomes unusable, I suppose they won't really have much choice than to switch?

And, FWIW, I do think that arXiv truly has a vast potential to be improved. It is currently in the position to change the whole process of how the research results are shared, yet it is still, as others have said, only a PDF hosting. And since the universities couldn't break out of the whole Elsevier & co. scam despite the internet existing for the 30 years, to me, breaking free from the university affiliation sounds like a good thing.

But, of course, I am talking only about the possibilities being out there. I know nothing about the people in charge of the whole endeavor, and ultimately in depends on them only, if it sails or sinks.

[−] beezle 56d ago
I go back to xxx.lanl.gov days - that is, the beginning. Back then it was all physics, some math and a little quantitative finance (not bitcoin). And the quality was pretty good because it was a preprint archive. In fact, a headline from 2000:

APS and BNL Host XXX e-Print Archive Mirror Feb. 1, 2000

The APS is establishing, in cooperation with Brookhaven National Laboratory, the first electronic mirror in the United States for the Los Alamos e-Print Archive.

Today, from the landing page, it describes itself as "arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of [long list]. Materials on this site are not peer-reviewed by arXiv.

Well, that's a large part of the problem. A lot of the stuff there now will never see a journal (even of dubious quality) and there is limited filtering of what new submissions will be stored. GIGO.

Best thing ArXiv could do is go back to their roots - limit the fields and return to preprint only. Spin off the comp sci stuff for sure to someone else along with all its headaches.

fixed: url

[−] lifeisstillgood 57d ago
I am sure it’s a dumb idea but why is there a problem for say the National Science Foundation or something to run a website that replicates ArXiv - if you are from an accredited university or whatever you can publish papers, fulfilling the “pdf store” function.

Then getting peer reviewed is a harder process but one can see some form of credit on the site coming from doing a decent reviewers job.

I suspect I am missing a lot of nuance …

[−] taormina 57d ago
Given that Cornell charges what, $50k a year as an Ivy League, $300k feels like almost nothing.
[−] OutOfHere 57d ago
With 300K for the CEO, its enshittification will commence imminently. It will now serve to maximize revenue. Just wait and watch while they issue a premium membership, payment requirements for authors, and other revenue generators to please their investors.
[−] tokai 57d ago
This is exactly what happened last time when scientific publishing got cornered. Journals run by departments and research groups were spun out or sold off to publishers and independent orgs. And they continued to slowly boil the frog over 50 years with fees and gate keeping.

Its especially problematic because while ArXiv love to claim to be working for open science, they don't default to open licensing. Much of the publications they host are not Open Access, and are only read access. So there is definitely the potential to close things off at some point in the future, when some CEO need to increase value.

[−] ide0666 57d ago
The endorsement system is a real barrier for independent researchers. I've been trying to get endorsed for cs.NE for weeks — the work is published on aiXiv with video results, but without an institutional email or personal connection to an existing author, you're stuck. Glad to see arXiv thinking about independence — hope they also rethink access for non-institutional researchers.
[−] dataflow 57d ago
This sounds terrible. Of course there's a huge risk of it becoming made for-profit. It almost makes you wonder if the academic publishers are behind this push somehow.

Could they not have made it into some legal structure that puts universities at the top? Say, with a bunch of universities owning shares that comprise the entirety of the ownership of arXiv, but that would allow arXiv to independently raise funds?

[−] tamimy 56d ago
It's quite interesting to see that a lot of opinions here think ArXiv will turn to shit because it will go "corporate". Are there any examples where this has not been the case?
[−] contubernio 57d ago
What is worrisome about this development, and corollary actions like the hiring of a CEO with a $300,000/year salary, is that the essentially independent and community based platform will disappear. The ArXiv exists because mathematicians and physicists, and later computer scientists and engineers, posted there, freely, their work, with minimal attention to licensing and other commercial aspects. It has thrived because it required no peer review and made interesting things accessible quickly to whomever cared to read them.

A setup as a US-based "non-profit" is worrisome, if only because 300K is an obscene salary even in a for-profit setting. That the US-based posters can't see this is evidence of the basic problem which is that the US, both left and right, has been taken over by a neoliberal feudal antidemocratic nativist mindset that is anathema to the sort of free interchange of ideas that underlay the ArXiv's development in the hands of mathematicians and physicists now swept aside and ignored by machine learning grifters and technicians who program computers.

[−] asimpleusecase 57d ago
I wonder if there are plans to licence the content for AI training
[−] hereme888 57d ago
From my limited experience, arXiv appears to include many low-quality, unreproducible papers, and some are straight-up self-marketing rather than serious scientific work.
[−] bonoboTP 57d ago
I fear their Mozilla-ification and Wikipedia-ification. Scope creep, various outreach feel-good programs, ballooning costs, lost focus etc. And other types of enshittification.

Any change to the basic premise will be a negative step.

They should just be boring quiet unopininionated neutral background infrastructure.

[−] tornikeo 57d ago
Now the question is, will arxiv wage a decade long bloody war with Cornell, using heavy infantry (PhD students), archers (reviewers) and field artillery (AI slop papers), or will the independence be mostly peaceful? Only time can tell.
[−] shevy-java 57d ago
"Recently arXiv’s growth has accelerated. Since 2022, it has expanded its staff to 27, in large part to deal with a 50% increase in submitted manuscripts."

I am wary of that. IMO the business model is damaged therein. You can say in 2022 we had 27; bankrupt in 2030.

[−] AccessScan 57d ago
Going independent makes sense for arXiv. But the more interesting part is what it tells us about how we fund the stuff that actually keeps research moving. arXiv runs on about seven million dollars a year and handles hundreds of thousands of papers. That's roughly twenty bucks a paper. This is the backbone of how physicists, computer scientists, and mathematicians share work. Traditional publishers charge thousands per article. The math is almost laughable. arXiv has never had an efficiency problem. The problem is that we've just accepted that something this important should survive on voluntary contributions and the occasional donation saving the day. Look at what happened with bioRxiv and medRxiv when they spun off into openRxiv. That only happened about a year ago. Nobody knows yet if it actually works long-term or if it just kicks the money problems down the road. But both platforms, totally separately, came to the same conclusion. We need to leave the university. That says something. Universities aren't built to fund outside infrastructure forever. Their budgets follow enrollment, grants, and endowment performance. That doesn't line up with the steady, predictable funding arXiv needs to keep the lights on. Ginsparg calling it a "Perils of Pauline" situation is probably the most honest thing anyone said about this. Everyone treats arXiv like it will always be there. But it's been one bad year away from serious trouble for most of its life. The real test for the nonprofit won't be the first few years. Cornell and Simons have that covered. It'll be five or ten years from now when the excitement fades and they're competing for donor money against whatever the next crisis in academic publishing turns out to be. The worry about AI-generated junk is actually where independence could help. A university-hosted arXiv could only spend so much on moderation tools. An independent org with a focused mission can make that a real budget priority. Whether they can keep up with the flood of low-quality submissions is a different question entirely.