Queueing Requests Queues Your Capacity Problems, Too

[−] mrngm 46d ago

That reminds me of this talk[0] by Gil Tene called "How NOT to Measure Latency" at the Strangeloop conference in 2015 (or read this blog post[1] that contains the most important points).

[0] https://www.youtube.com/watch?v=lJ8ydIuPFeU

[1] https://bravenewgeek.com/everything-you-know-about-latency-i...

[−] avidiax 43d ago

When I give system design interviews, candidates that start adding queues reflexively to the design always do poorly.

Queueing is only useful for a few cases, IMO:

* The request is expensive to reject. For example, the inputs to the rejected request also came from expensive requests or operations (like a file upload). So rejecting the request because of load will multiply the load on other parts of the system. You still need backpressure or forwardpressure (autoscaling).

* Losing a request is expensive, delaying the result is not. Usually you want a suitably configured durable queueing system (e.g. Kafka) if you have this scenario.

* A very short queue is acceptable if it's necessary that downstream resources are kept 100% busy. A good example of this is in a router, the output to a slower link might queue 1-2 packets so that there is always something to send, which maximizes throughput.

* If you have very bursty traffic, you can smooth the bursts to fit in your capacity. But this runs the danger of having the queue always full, which you have to manage with load shedding (either automated or manual).

----

An underappreciated queue type is LIFO (last-in, first-out). It sounds unfair, but it keeps you from moving the median response time at the cost of the maximum response time, and it behaves well when full. It fails over into either responding quickly or just rejecting requests when full, so it works well for dealing with bursty traffic.

[−] andrewstuart 43d ago

The author speculates about ways to deal with an overloaded queue.

Kingmans Formula says that as you approach 100% utilization, waiting times explode.

The correct way to deal with this is bounded queue lengths and back pressure. I.e don’t deal with an overloaded queue, don’t allow an overloaded queue.

[−] zamalek 42d ago

> Here’s an exchange I had on twitter a few months ago:

The purple account is just plain wrong. Classically, the full architecture is this (keeping in mind that all rules are sometimes broken):

* CQRS is the linchpin.

* You generally only queue commands (writes). A few hundreds of ms of latency on those typically won't be noticed by users.

* Reads happen from either a read replica or cache.

The problem the author faces are caused by cherry-picking bits of the full picture.

A queue is a load smoothing operator. Things are going to go bad one way or another if you exceed capacity, a queue at least guarantees progress (up to a point). It's also a great metric to use to scale your worker count.

> What will you do when your queue is full

If your queue fills up you need to start rejecting requests. If you have a public facing API there's a good chance that there will be badly behaved clients that don't back off correctly - so you'll need a way to IP ban them until things calm down. AWS has API Gateway and Azure APIM that can help with this.

If you're separating commands and queries you should _typically_ see more headroom.

[−] mankyd 43d ago

Use a stack? LIFO.

As long as you have capacity to keep it mostly empty, it's fine. When requests backup, at least some people will still get quick responses, instead of making everyone suffer.

[−] throwaway290 43d ago

Hit with machine generated art, so awful. Is the rest of it also generated?

Queueing Requests Queues Your Capacity Problems, Too (pushtoprod.substack.com)

25 comments