This is a very cool query tool that I haven't seen before, thanks! (Also the syntax drives me a little batty).
I tried modifying it to give me authors whose first publication (any publication at all) happened after 60 years old, but also who had at least one wdt:P800 work. I got people like Cato the Elder, Josephus, and William of Tyre.
I tried again for only people born in the 20th century, and I got some results (plus quite a bit of wrong answers, presumably something about the query or data)! Oddly quite a few of the results are from criminals who wrote an autobiography after their release, including Henri Charrière and the infamous Nazi, Albert Speer.
that's kind of what P800 (notable work) is doing, but you can try some approximations to "major work" with "has both an English Wikipedia page and a Goodreads link":
> asked LLMs to compile list of 10-20 writers considered canon in each decade since 1800, then identify all their notable works and years of publication. After some iterations with coding agents I got over 2,000 works by 200 authors.
Wait, so the source data is just LLM hallucinations? It makes sense to use an LLM to build the data collection, but not to build your source data.
This is in my opinion a better use of tech that has an error rate (hallucination), you just assume that its a fuzzy search, and sample the results to see how you did. I'd like to see a few from the results for sure!
It feels a lot like storing your data as an essay in a Word doc instead of a spreadsheet. It can work and all of the math is probably correct, but it's very much the wrong tool when the structured data was right there to be used instead.
The structure data is scattered all over the place. This does the very important thing of aggregating them, and bringing them together. If you had to manually do that it could take weeks.
What do you mean by due diligence here? Manually checking 2000 citations sounds a lot harder to me than just pulling the data from a reliable source to start with.
I think this is pretty common across different creative forms albeit with different age ranges but constrained at the higher end.
So the greatest physics, maths, poetry and pop music are done by people in their 20s.
Literature (esp novels) seems to occupy an older range, perhaps 30s to 50s. Perhaps classical music and philosophy also? I don't know about the visual arts.
I interpret it as the former requiring the creative fireworks of youthful neural elasticity and the latter the depth we associate with lived experience and wisdom.
Naturally there are outliers (general relativity in Einstein's early 30s, Shakespeare word play till his late 40s) but I think in general these rules of thumb seem to be a good guide for the very highest achievers and for the most creative periods for us mere mortals.
I think this graph is a great illustration about how anonymising data is hard. It's very easy to isolate individual authors from this list, because there are clear diagonal lines because the year and age are increasing in lockstep. This also suggests there aren't actually that many authors in this collection, because of these strong diagonals everywhere.
There's probably also some erroneous data here with a bunch of points representing material written by people at age 34 between about 1920 and 1940 (an obvious horizontal line) when most of the rest of the graph doesn't show any strong horizontal bias for a specific age.
Opened it just to check if Saramago was there, and indeed, he is.
For most of his professional life he was a journalist. He published his second novel at 55, only found his narrative style at almost 60, then wrote 15 novels (and won a Nobel) after that. What an amazing career.
It’s difficult to be a truly interesting person with a unique perspective on life, and have the skills to transmute that experience into a work of art, when you’re young. You simply haven’t logged the hours in the world, and I kind of don’t trust your opinion on something if you haven’t.
Not sure if I’d call him a major writer, but Raymond Chandler is one of my favorites and I think he’s a good example. To me there is a fundamental difference between his crime stories, which show the results of corporate life, alcoholism, personal tragedy, war, etc. and a more modern crime writer that’s just writing a genre piece with all the right pieces, but no actual personal experience.
Well the canonical example is Diana Athill who had a long and distinguished career at a literary editor for people Phillip Roth, John Updike, Margaret Atwood, Jack Kerouac and others, then retired at the age of 75 and started writing her own novels and memoirs and is considered one of the greatest writers in English of the 20th century. “After a funeral” is I think the one of hers I read and it’s amazing
"The accepted notion is that age confers a spirit of reconciliation and serenity on late works, often expressed in terms of a miraculous transfiguration of reality....But what of artistic lateness not as harmony and resolution, but as intransigence, difficulty, and contradiction? What if age and ill health don’t produce serenity at all? "
This is a disappointing statistical modelling technique.
The author asked LLMs to produce lists of data which are readily available on the likes of wikipedia. Author date of birth, list of publications, and publication release date are all fairly easy to get hold of. They just need formatted appropriately. The LLMs produced a few false positives, and missed out some prominent works.
I get that this is just the author working in public & writing about what they're up to, but the number of avoidable errors introduced by the methodology make reading it a poor use of time.
> In trying to come up with some good examples I asked LLMs. (…)
> So I tried to cast the net more broadly and asked LLMs (…)
> EDIT: also hunted down several mistakes, as one would expect from LLMs; thanks to commenters.
This is a slop post. You can’t trust any of the data. It’s baffling and worrying the author apparently understands mistakes from LLMs are to be expected but still decided to publish without doing due diligence.
For me my 60's was the best time to start writing fiction, before then I always had excuses why I would not write, now with much more free time, experience and no money worries, I can think back on all those thousands of novels I read, knowing I could write a better one. Writing is also one of the cheapest retirement hobbies you can have and you are also more likely to experiment across different genres as you are not pandering to an audience.
It feels like a natural result of life expectancy increasing over 70 (world wide average) only in 2021 and a number of years past publication being required for something to be deemed a major work means it is natural that there are few today. Something like 100%, 110%, and 120% if life expectancy at the author’s time of birth might be a more useful measure today.
> Also interestingly, the trend in that graph keeps going up in recent years… but it looks to me like this is driven by lack of major works from young authors. It may be how my sample is constructed.
Isn't that because older authors have had more time to gain notoriety, their earlier works to be deemed 'major' in retrospect?
Douglas Southall Freeman wrote the definitive biography of Robert E Lee over twenty years, publishing it when he was 49; he then went on to publish his seven volume biography on George Washington when he was 62 (he finished the sixth volume on the day he died; the seventh was completed by his research assistants).
There are a suspicously large number of very straight diagonal lines on those graphs with identical slopes. I might predict that they are individual famous authors that released a lot of works, but the slopes are all identical. What's going on there?
Beyond the data science interest, isn't this sort of charting powered by the "my time's running out and I still haven't left my mark in history" intrusive thought? Purely from a fitting perspective I'd wager the correlation is close to zero, because "major works" will be different in a century, and again changed in two. Shakespeare was not very popular in the 17th per wikipedia. As George Orwell put it, it's much easier to write when you do it for a purpose that matters to you. Hugo wrote Notre-Dame mostly to rant about architecture; creating a major work for the purpose of staving off fears of being forgotten I feel is not enough in itself
Most of the literature by Srila Prabhupada used in most universities around the world was written well over the age of 75: https://prabhupadabooks.com/books
98 comments
This seems like the kind of thing that should be more widely known, and have some good tutorials written for it :)
https://www.wikidata.org/wiki/Wikidata:Introduction
And you can find lots of SPARQL examples here:
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/...
I tried modifying it to give me authors whose first publication (any publication at all) happened after 60 years old, but also who had at least one wdt:P800 work. I got people like Cato the Elder, Josephus, and William of Tyre.
I tried again for only people born in the 20th century, and I got some results (plus quite a bit of wrong answers, presumably something about the query or data)! Oddly quite a few of the results are from criminals who wrote an autobiography after their release, including Henri Charrière and the infamous Nazi, Albert Speer.
> asked LLMs to compile list of 10-20 writers considered canon in each decade since 1800, then identify all their notable works and years of publication. After some iterations with coding agents I got over 2,000 works by 200 authors.
Wait, so the source data is just LLM hallucinations? It makes sense to use an LLM to build the data collection, but not to build your source data.
So the greatest physics, maths, poetry and pop music are done by people in their 20s.
Literature (esp novels) seems to occupy an older range, perhaps 30s to 50s. Perhaps classical music and philosophy also? I don't know about the visual arts.
I interpret it as the former requiring the creative fireworks of youthful neural elasticity and the latter the depth we associate with lived experience and wisdom.
Naturally there are outliers (general relativity in Einstein's early 30s, Shakespeare word play till his late 40s) but I think in general these rules of thumb seem to be a good guide for the very highest achievers and for the most creative periods for us mere mortals.
Mediocrity of course is unconstrained by age.
I think this graph is a great illustration about how anonymising data is hard. It's very easy to isolate individual authors from this list, because there are clear diagonal lines because the year and age are increasing in lockstep. This also suggests there aren't actually that many authors in this collection, because of these strong diagonals everywhere.
There's probably also some erroneous data here with a bunch of points representing material written by people at age 34 between about 1920 and 1940 (an obvious horizontal line) when most of the rest of the graph doesn't show any strong horizontal bias for a specific age.
For most of his professional life he was a journalist. He published his second novel at 55, only found his narrative style at almost 60, then wrote 15 novels (and won a Nobel) after that. What an amazing career.
Not sure if I’d call him a major writer, but Raymond Chandler is one of my favorites and I think he’s a good example. To me there is a fundamental difference between his crime stories, which show the results of corporate life, alcoholism, personal tragedy, war, etc. and a more modern crime writer that’s just writing a genre piece with all the right pieces, but no actual personal experience.
https://en.wikipedia.org/wiki/Diana_Athill
Thoughts on Late Style by Edward Said https://www.edwardsaid.org/articles/thoughts-on-late-style/
The author asked LLMs to produce lists of data which are readily available on the likes of wikipedia. Author date of birth, list of publications, and publication release date are all fairly easy to get hold of. They just need formatted appropriately. The LLMs produced a few false positives, and missed out some prominent works.
I get that this is just the author working in public & writing about what they're up to, but the number of avoidable errors introduced by the methodology make reading it a poor use of time.
> In trying to come up with some good examples I asked LLMs. (…)
> So I tried to cast the net more broadly and asked LLMs (…)
> EDIT: also hunted down several mistakes, as one would expect from LLMs; thanks to commenters.
This is a slop post. You can’t trust any of the data. It’s baffling and worrying the author apparently understands mistakes from LLMs are to be expected but still decided to publish without doing due diligence.
> Also interestingly, the trend in that graph keeps going up in recent years… but it looks to me like this is driven by lack of major works from young authors. It may be how my sample is constructed.
Isn't that because older authors have had more time to gain notoriety, their earlier works to be deemed 'major' in retrospect?