Everything old is new again: memory optimization

[−] muskstinks 50d ago

I'm always confused as hell how little insight we have in memory consumption.

I look at memory profiles of rnomal apps and often think "what is burning that memory".

Modern compression works so well, whats happening? Open your taskmaster and look through apps and you might ask yourself this.

For example (lets ignore chrome, ms teams and all the other bloat) sublime consumes 200mb. I have 4 text files open. What is it doing?

Alone for chrome to implement tab suspend took YEARS despite everyone being aware of the issue. And addons existed which were able to do this.

I bought more ram just for chrome...

[−] 1vuio0pswjnm7 50d ago

Been waiting for online commentary about programming to start acknowledging this situation as it pertains to writing programs

Memory and storage are not "cheap" anymore. Power may also rise in cost

Under these conditions, memory usage and binary size are irrefutably relevant^1

To some, this might feel like going backwards in time toward the mainframe era. Another current HN item with over 100 points, "Hold on to your hardware", reflects on how consumer hardware may change as a result

To me, the past was a time of greater software efficiency; arguably this was necessitated by cost. Perhaps higher costs in the present and future could lead to better software quality. But whether today's programmers are up for the challenge is debatable. It's like young people in finance whose only experience is in a world with "zero" interest rates. It's easier to whine about lowering rates than to adapt

With the money and poltical support available to "AI" companies, the incentive for efficiency of any kind is lacking. Perhaps their "no limits" operations, e.g., its effects on supply, may provide an incentive for others' efficiency

1. As an underpowered computer user that compiles own OS and writes own simple programs, I've always rejected large binary size and excessive memory use, even in times of "abundance"

[−] canpan 50d ago

String views were a solid addition to C++. Still underutilized. It does not matter which language you are using when you make thousands of tiny memory allocations during parsing. https://en.cppreference.com/w/cpp/string/basic_string_view.h...

[−] baud9600 49d ago

Strange days we live in. Python and C++? What about a line of bash:

tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn

I’d like to know the memory profile of this. The bottleneck is obviously sort which buffers everything in memory. So if we replace this with awk using a hash map to keep count of unique words, then it’s a much smaller data set in memory:

tr -s '[:space:]' '\n' < file.txt | awk '{c[$0]++} END{for(w in c) print c[w], w}' | sort -rn

I’m guessing this will beat Python and C++?

[−] griffindor 50d ago

Nice!

> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.

I wish I knew the input size when attempting to estimate, but I suppose part of the challenge is also estimating the runtime's startup memory usage too.

> Compute the result into a hash table whose keys are string views, not strings

If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM. Is that included in the memory usage figures?

Nonetheless, it's a nice optimization that the kernel chooses which hash table keys to keep hot.

The other perspective on this is that we sought out languages like Python/Ruby because the development cost was high, relative to the hardware. Hardware is now more expensive, but development costs are cheaper too.

The take away: expect more push towards efficiency!

[−] gwbas1c 50d ago

A lot of frameworks that use variants of "mark and sweep" garbage collection instead of automatic reference counting are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.

I wonder if frameworks like dotnet or JVM will introduce reference counting as a way to lower the RAM footprint?

[−] tzot 50d ago

Well, we can use memoryview for the dict generation avoiding creation of string objects until the time for the output:

    import re, operator
    def count_words(filename):
        with open(filename, 'rb') as fp:
            data= memoryview(fp.read())
        word_counts= {}
        for match in re.finditer(br'\S+', data):
            word= data[match.start(): match.end()]
            try:
                word_counts[word]+= 1
            except KeyError:
                word_counts[word]= 1
        word_counts= sorted(word_counts.items(), key=operator.itemgetter(1), reverse=True)
        for word, count in word_counts:
            print(word.tobytes().decode(), count)

We could also use mmap.mmap.

[−] fix4fun 50d ago

Digression: Nowadays when RAM is expensive good old zram is gaining popularity ;) Try to check on trends.google.com . Since 2025-09 search for it doubled ;)

[−] bcjdjsndon 50d ago

A few things

- since GC languages became prevalent, and maybe high level programming in general, coders arent as economic with their designs. Memory isn't something a coder should worry about apparently.

- far more people code apps in web languages because they don't know anything else. These are anywhere from 5-10 levels of abstraction away from the metal, naturally inefficient.

- increasing scope... I can only describe this one by example, web browsers must implement all manner of standards etc that it's become a mammoth task, especially compared to 90s. Same for compilers, oses, heck even computers thenselves were all one-man jobs at some point because things were simpler cos we knew less.

[−] 6510 48d ago

Someone on youtube[0] suggested we use (something like) KolibriOS[1] in a vm as a kind of webbrowser. Then we can have snappy effective desktop apps for everything cross platform(!)

The 12 MB OS looks surprisingly mature. We are so conditioned for bloat that with each click I'm surprised how fast it responds. I don't remember ever being surprised by the same thing twice in a row but here it stays surprising how everything opens in the next frame even on very poor hardware.

Besides from a vm I read there is a synery[3] client for it[4]. I've used crappy pc's on extra screens and having a dedicated machine for a single application is fun, useful and it makes old stuff useful. You can run heavy applications on the main computer, it's unimportant the extra cant do it.

[0] - https://www.youtube.com/watch?v=v3NVKOsWkQs

[1] - https://www.kolibrios.org/en

[3] - https://www.youtube.com/watch?v=tlt7X0H5GJw

[4] - https://board.kolibrios.org/viewtopic.php?t=2544

[−] dgb23 50d ago

Not a C++ programmer and I think the solution is neat.

But it's not necessarily an apples to apples comparison. It's not unfair to python because of the runtime overhead. It's unfair because it's a different algorithm with fundamentally different memory characteristics.

A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count. For most people that would be the first/naive approach as well when they programmed something like this I think. And it would showcase what the actual overhead of the python version is.

[−] zahlman 49d ago

> This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code.

I don't know if the implementation is written in a "low-level" way to be more accessible to users of other programming languages, but it can certainly be done more simply leveraging the standard library:

  from collections import Counter
  import sys

  with open(sys.argv[1]) as f:
      words = Counter(word for line in f for word in line.split())

  for word, count in words.most_common():
      print(count, word)

At the very least, manually creating a (count, word) list from the dict items and then sorting and reversing it in-place is ignoring common idioms. sorted creates a copy already, and it can be passed a sort key and an option to sort in reverse order. A pure dict version could be:

  import sys

  with open(sys.argv[1]) as f:
    counts = {}
    for line in f:
      for word in line.split():
        counts[word] = counts.get(word, 0) + 1

  stats = sorted(counts.items(), key=lambda item: item[1], reverse=True)

  for word, count in stats:
      print(count, word)

(No, of course none of this is going to improve memory consumption meaningfully; maybe it's even worse, although intuitively I expect it to make very little difference either way. But I really feel like if you're going to pay the price for Python, you should get this kind of convenience out of it.)

Anyway, none of this is exactly revelatory. I was hoping we'd see some deeper investigation of what is actually being allocated. (Although I guess really the author's goal is to promote this Pystd project. It does look pretty neat.)

[−] veunes 50d ago

Not "C++ everywhere again" but maybe "understanding memory again"

[−] tombert 50d ago

I've been rewriting a lot of my stuff in Rust to save memory.

Rust is high-level enough to still be fun for me (tokio gives me most of the concurrency goodies I like), but the memory usage is often like 1/10th or less compared to what I would write in Clojure.

Even though I love me some lisp, pretty much all my Clojure utilities are in Rust land now.

[−] kristianp 49d ago

How much memory does the C++ compiler use when compiling the program? I wonder how that compares to the python program? Not a completely unrelated metric.

Would the rust compiler use much more memory compiling a comparable program to the C++ version?

[−] yakkomajuri 49d ago

The abrupt ending was funny and then I realized the author is Finnish and it all made sense.

Nice post.

(P.S. I'm also Finnish)

[−] wbsun 49d ago

It’s less about 'old vs. new' and more about the evolving trade-offs dictated by the constraints of the era. There have always been engineers trying to squeeze every last drop of performance out of the bits available to them.

[−] perching_aix 49d ago

Since we're doing this, I do wonder then: is going from a 1.3K input file to 21K peak memory usage (16x) really optimal?

It's certainly a lot better than 1000x, sure, but still surprised me.

[−] 90d 50d ago

Speaking about optimization, is Windows just too far gone at this point? It is comical the amount of resources it uses at "idle".

[−] est 50d ago

I think py version can be shortened as:

from collections import Counter

stats = Counter(x.strip() for l in open(sys.argv[1]) for x in l)

[−] callamdelaney 50d ago

I shove everything in memory, it's a design decision. Memory is still cheap, relatively.

[−] biorach 50d ago

"copyright infringement factories"

[−] gostsamo 50d ago

> how much memory a native code version of the same functionality would use.

native to what? how c++ is more native than python?

[−] amelius 50d ago

> AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full blast

The ultimate bittersweet revenge would be to run our algorithms inside the RAM owned by these cloud companies. Should be possible using free accounts.

[−] yieldcrv 50d ago

as long as you know what architecture questions to ask, agentic coding can help with this next phase of optimization really quickly

delaying comp sci differentiation for a few months

I wonder if assembly based solutions will become in vogue

Everything old is new again: memory optimization (nibblestew.blogspot.com)

163 comments