The revenge of the data scientist

[−] Flashtoo 44d ago

These are good practices to keep in mind when setting up GenAI solutions, but I'm not convinced that this part of the job will allow "data scientist" as a profession to thrive. Here's my pessimistic take.

Data scientists were appreciated largely because of their ability to create models that unlock business value. Model creation was a dark magic that you needed strong mathematical skills to perform - or at least that's the image, even if in reality you just slap XGBoost on a problem and call it a day. Data scientists were enablers and value creators.

With GenAI, value creation is apparently done by the LLM provider and whoever in your company calls the API, which could really be any engineering team. Coaxing the right behavior out of the LLM is a bit of black magic in itself, but it's not something that requires deep mathematical knowledge. Knowing how gradients are calculated in a decoder-only transformer doesn't really help you make the LLM follow instructions. In fact, all your business stakeholders are constantly prompting chatbots themselves, so even if you provide some expertise here they will just see you as someone doing the same thing they do when they summarize an email.

So that leaves the part the OP discusses: evaluation and monitoring. These are not sexy tasks and from the point of view of business stakeholders they are not the primary value add. In fact, they are barriers that get in the way of taking the POC someone slapped together in Copilot (it works!) and putting that solution in production. It's not even strictly necessary if you just want to move fast and break things. Appreciation for this kind of work is most present in large risk-averse companies, but even there it can be tricky to convince management that this is a job that needs to be done by a highly paid statistician with a graduate degree.

What's the way forward? Convince management that people with the job title "data scientist" should be allowed to gatekeep building LLM solutions? Maybe I'm overestimating how good the average AI-aware software engineer is at this stuff, but I don't see the professional moat.

[−] redhale 44d ago

I agree with your take.

I don't really see why evals are assumed to be exclusively in the domain of data scientists. In my experience SWEs-turned-AI Engineers are much better suited to building agents. Some struggle more than others, but "evals as automated tests" is, imo, so obvious a mental model, and can be so well adapted to by good SWEs, that data scientists have no real role on many "agent" projects.

I'm not saying this is good or bad, just that it's what I'm observing in practice.

For context, I'm a SWE-turned-AI Engineer, so I may be biased :)

[−] samusiam 44d ago

I think there's a lot of methodological expertise that goes into collecting good eval data. For example, in many cases you need human labelers with the right expertise, well designed tasks, well defined constructs, and you need to hit interrater agreement targets and troubleshoot when you don't. Good label data is a prerequisite to the stuff that can probably be automated by the AI agent (improving the system to optimize a metric measured against ground truth labels). Data scientists and research scientists are more likely to have this skillset. And it takes time to pick up and learn the nuances.

[−] alexhans 44d ago

[dead]

[−] jamesblonde 44d ago

I say this quite a lot to data scientists who are now building agents:
1. think of the context data as training data for your requests (the LLM performs in-context learning based on your provided context data)
2. think of evals as test data to evaluate the performance of your agents. Collect them from agent traces and label them manually. If you want to "train" a LLM to act as a judge to label traces, then again, you will need lots of good quality examples (training data) as the LLM-as-a-Judge does in-context learning as well.

From my book - https://www.amazon.com/Building-Machine-Learning-Systems-Fea...

[−] pbronez 44d ago

Yup, agree. “Evaluations” = Tests

Gets pretty meta when you’re evaluating a model which needs to evaluate the output of another agent… gotta pin things down to ground truth somewhere.

[−] daemonk 44d ago

I have a data science/engineering background. From my perspective, using AI is like mining the solution space for optimality. The solution space is the combinatorics of the billions of parameters and their cardinalities. You try to narrow down the search space with your prompt and hopefully guide your mining with more semantic-based heuristics towards your optimal solution.

You might hit a local maxima or go down a blind path. I tend to completely start my code base from scratch every week. I would make things more generic, remove unnecessary complexity, or add new features. And hope that can move me past the local maxima.

[−] kokken 44d ago

I don't understand the framing of the assumption.

Was the data scientist role only about building NLP models? Are the LLms gonna build Churn prediction models? Tell the PM why stopping the A/B test halfway through is a bad idea? Push back on loony ideas of applying ML to predicting sales from user horoscopes?

Maybe the role is a bit tinier in scope than 10 years ago, but I see it as a good thing. If you looked at DS positions on job search sites the role descriptions would be all over the place, maybe now at least we'll see it consolidate.

[−] schnitzelstoat 44d ago

Exactly - in my company we had some NLP models in Customer Service (bag-of-words for classifying tickets) but everywhere else it was just classification or regression problems.

So yeah, the bag-of-words model got replaced with a chatbot several years ago (when chatbots were all the rage back in like 2017) and will probably get replaced again with an LLM-enhanced chatbot soon. But the meat and potatoes are those classification and regression models and they aren't going anywhere.

[−] efavdb 44d ago

I'm a data-scientist now, and a fan of claude code for implementing things. But I have to say, I'm constantly surprised by how "dumb" chatgpt is as a math research partner. I will ask it a math question I'm thinking about, get a confident answer back, only to realize hours to days later that it was 180 degrees backwards. I'm so frustrated right now with this that I'm almost ready to stop asking it such questions at all. I'm aware this seems to contrast strongly with other math-people's enthusiasm e.g., Terrance Tao. Unclear why my mileage varies.

Much of my work takes the form above -- in other words figuring out what to do. once i've decided, it can of course spit out the boilerplate code much faster than I could, and I appreciate that. But for the moment I think I still have some job security thanks to the first issue.

The revenge of the data scientist (hamel.dev)

37 comments