It’s Time for Some Queueing Theory

jinpan · on June 27, 2019

Another fun surprising result from queuing theory is that if you have a buffer of pending RPC's that have some deadline, it is better to service them in a "unfair" LIFO order, as opposed to a fairer FIFO order. [0] goes into more gory details.

In the bank example from the article, LIFO would dramatically shorten the median wait time (~5 minutes), at the cost of really upsetting customers [1]. In the case of a support queue, where the waiters are blind to each other, this cost could be reconsidered ...

[0] https://arxiv.org/pdf/1008.4895.pdf

[1] from the article, "We really, really hate it when someone shows up after us but gets served before us.

_v7gu · on June 27, 2019

After a while people would grow wise of this tactic and start redialing after waiting for some time. We could build a whole lot of efficient and beautiful systems, if only not for the people...

SilasX · on June 27, 2019

Exactly. I've never had the time to dig into it, but these LIFO-optimality results always strike me as being non-physical or introducing some implicit hard-to-model effects.[1]

Like you say, people can just re-enter the queue, or (in meatspace) form a meta-queue that vies for entering the moment it has an open slot, reproducing FIFO all over again.

Furthermore, you get a lot of pro-cooperation effects (necessary for queues to work at all) by giving people "skin in the game" in the form of valuing their place in line. Once people lose nothing by inventing a new name and re-entering the line, that's all gone, and they no longer have an incentive not to be disruptive, which throws off disproportionate negative utility onto the rest of the system.

One day, I promise, I will unpack this result.

[1] My comment from 2015 expressing similar reservations: https://news.ycombinator.com/item?id=10182781

papln · on June 27, 2019

What's wrong with degrading from LIFO to a random queue due to client retry, when you're starting from a FIFO that is already failing client expectations?

SilasX · on June 28, 2019

Nothing, if it's just used for that (narrow, least-bad, degraded) situation. I was addressing the more general point that LIFO queues have some actionable superiority over FIFO in the general case because of the theoretical results cited in the literature.

gpderetta · on June 27, 2019

Spherical customers in a vacuum.

papln · on June 27, 2019

What's the problem with this? redialing essentially converts the FIFO/LIFO queue to a random queue, which isn't particularly worse.

Ultimately, the utility of "lower 50%ile" vs "lower 99.9%ile" is a subjective choice.

The value in LIFO is the observation that real people would rather their 10-minute request fail and require a retry due to LIFO (since they are already disatisfied with the original service*) than their 1second request take 1 minute due to FIFO.

_v7gu · on June 27, 2019

People feel cheated when you request them to wait, but they can get better outcomes by redialing. It violates the trust between the call center and the dialer. Leaving people to wait forever is a bad, bad, bad move.

Alternative scenario: Someone figures out the best way of dealing with LIFO is spamming requests until one gets through and you end up with several multitudes of queuing work.

oarabbus_ · on July 1, 2019

On the contrary, a system is only beautiful if it successfully serves the people.

jules · on June 27, 2019

That's just a result of your loss function. If you consider going over the deadline by a lot equally bad as going over the deadline a little, then of course. This doesn't apply to people waiting.

macleginn · on June 27, 2019

The paper also suggests that they should turn some customers away altogether to achieve lower MWT, which makes this method even less appealing ("We must drop a small fraction of packets in order to dramatically improve delay for the remaining ones. <...> [Our] experiments, combined with those of [10] suggest strongly that the drop rate scaling may be inconsequential in many real world applications.")

burtonator · on June 27, 2019

This isn't really unsurprising though. The problem is that your outliers are going to rise significantly.

You're basically trading some higher latency for lower median latencies.

radomir_cernoch · on June 27, 2019

This explains a lot when applied to replying to e-mails.

pstuart · on June 27, 2019

I thought the queuing algorithm for banks was to line up transactions so the largest go through first to increase the opportunity for overdraft fees.

awshepard · on June 28, 2019

I heard from a customer service rep at a bank that they did that based on customer feedback. The general reasoning being that large amounts were most likely to be things like housing payments and car payments, where missing payments would incur larger charges from those companies, and/or put housing or transportation at risk.

ahmedalsudani · on June 27, 2019

Debits first, credits last—just as the lord commanded.

gumby · on June 27, 2019

By the way "debit" means "given" -- as in you "give" the money to the bank. "credit" means the money is being taken from them.

Tells you a lot about their attitude to your money!

irishsultan · on June 27, 2019

Debit comes from a latin verb that means to owe, and credit from a verb that means to trust.

So a debit card is for the money that the bank owes you, and a credit card is for money that the bank trusts you will pay them back.

dullgiulio · on June 27, 2019

Sure, which implies the money I store in the bank is debit they owe me. But worded that way I might be induced to think I deserve some kind of interest payment...

jtms · on June 27, 2019

It doesn’t just imply it, that is exactly what it is. The bank is loaning your money out and then some (multiplied by whatever the current reserve ratios are). They quite literally don’t have the money you deposited, but just owe it back to you if you ask for it. That’s one of the ways a bank makes money.

WorldMaker · on June 27, 2019

It's easy to forget in an era where checking and savings accounts have interest rates so low the average person earns no interest on their cash.

c3534l · on June 28, 2019

That's the etymological fallacy. Debit and credit just mean left and right side of the ledger.

davidw · on June 27, 2019

I'd definitely appreciate some 'next steps for the working programmer' that aren't a deep dive into math. What we need to be able to do is realize what sort of problem we're facing, how to describe it, and maybe what solutions might look like or where to learn more about them.

At a certain company, we were dealing with a limited number of phone numbers that could be used in on-line ads to get people to call for more information. Reuse the numbers too quickly and you wouldn't be able to track what people were calling about. Having too many idle numbers cost money.

Seemed like an interesting problem to me and it looked like "queuing theory" even if I didn't know the details. I suggested we could probably figure out a way to optimize our usage of the numbers. Blank looks and "let's just use 10 numbers per client and see how that goes".

I'm definitely someone who's wary of coding up a big, complex solution as the first iteration, so I could live with "let's start with a number", but I think that needed to feed into a further iteration using the knowledge we gained and a bit more thinking about the problem.

kragen · on June 28, 2019

Maybe the next steps for the working programmer are, in fact, a deep dive into math.

davidw · on June 28, 2019

Adam Smith's words about specialization apply to programming too.

One of the best places I ever worked, I sat right next to a guy doing computer vision algorithms. He had a PhD, and was extremely smart and knowledgeable and fun to talk with.

He also wasn't the most... practical programmer, and I was able to give him a variety of suggestions about how to improve his code, and of course I had ideas about data modelling and web programming that he didn't know much about.

Neither one of us knew much about the details of the firmware, or the optics (mirrors and lenses and such) that another colleague was working on.

I guess the point is that for some of us, the point is that we're probably not good candidates to become domain experts, but knowing enough to find help is probably a good move.

kragen · on June 28, 2019

Not learning math because it's not what you want to specialize in is self-defeating. It's like not learning to read and write, or save money, or communicate with other people, because it's not what you want to specialize in.

Math isn't a domain. It's the technique of abstraction we apply to any domain to make it tractable for computers, and it's the technique of reasoning we apply to any domain to figure out where our thinking has gone wrong. At least, any objective domain — except in rare cases, math isn't that useful for figuring out why your wife feels you don't love her anymore. But for you to program something on a computer, someone needs to mathematize it first. This can be done well or badly.

jph · on June 27, 2019

Great article! I'm adding it to the queueing theory intro repo: https://github.com/joelparkerhenderson/queueing_theory

z3t4 · on June 27, 2019

You should not stare blindly on the average. It's in the 1% slowest requests the dragons are hidden.

vitus · on June 27, 2019

Not when you're looking at high utilization, as presented in the example in the article.

For that particular case (a M/M/1 queue with arrival rate of 5.8 customers per hour and a service rate of 6 per hour), even your median response time is 3.5 hours.

Unbounded FIFOs are _really_ bad once you get to high utilization.

takk309 · on June 27, 2019

If any one is interested in a diving into queuing theory as it pertains to vehicular traffic, check out chapters 11 and 12 of Traffic Flow Fundamentals by Adolf May. [1] Chapter 11 discusses shockwaves and moving queues. Chapter 12 discusses both deterministic and stochastic queuing.

[1] https://www.scribd.com/document/253416450/Traffic-Flow-Funda...

numlock86 · on June 27, 2019

From the article:

> We assume customer arrivals and customer service times are random (details later).

Where are those details? I expected some math about the random distribution and how that adds up, but I can't even find the corresponding text passage to explain anything related to that.

moandcompany · on June 27, 2019

They probably intend to discuss the Poisson distribution which is commonly used to characterize random customer inter-arrival times, and similar or other distributions for how long it takes to process the customer's request.

https://www.wikiwand.com/en/Poisson_distribution

Queueing theory is a pretty common topic in Electrical Engineering and Computer Science curriculum related to scheduling or computer networks. Also part of Operations Research curriculum.

Probably worth reading more about Leonard Kleinrock's work if you are interested in this topic and the context where it was applied:

https://www.wikiwand.com/en/Leonard_Kleinrock

Animats · on June 27, 2019

Reading that, years ago, was what led me to figure out how to approach congestion control. Everybody was reading Kleinrock and assuming random arrival rates. Klienrock had done a study of Western Union Plan 55-A, which was Sendmail for telegrams, built with paper tape for buffering and telephone switches for routing. That led to the original analyses of how the Arpanet should work.

The trouble with that approach is that it assumes packets just arrive randomly no matter what the network is doing. That was true for Plan 55-A; delays in the network had little influence on the submission rate for new telegrams. But it's not true once you put a protocol with backpressure, like TCP, on top of the raw packets. Now congestion becomes a feedback control system. I figured that out in the early 1980s and wrote RFC 970.[1]

The idea of having one big line was introduced by banks some time in the 1970s. It helped some. But the big breakthrough came from Walter Wriston [2], CEO of what is now Citibank, who pushed his people to develop in-bank terminals and then ATMs, to get rid of the lines entirely.

A classic piece of advice from retail consultants is "never place an obstacle in front of a customer who is ready to buy". Amazon's "one-click ordering" is the classic on-line example. Gap stores used to be noted for getting this. They had big, empty counters and more than enough people ready to check customers out. All that crap retailers put at checkouts? Few customers buy it, and it limits how much merchandise the customer can put on the counter.

The big queuing theory insight for line management is that if you don't have some idle time, in an open loop situation the line length will go to infinity. In the real world, you start to lose customers. That's how closed loop feedback reacts to retailer incompetence.

[1] https://tools.ietf.org/html/rfc970

[2] https://en.wikipedia.org/wiki/Walter_Wriston

iamnotacrook · on June 27, 2019

"All that crap retailers put at checkouts? Few customers buy it, and it limits how much merchandise the customer can put on the counter."

People select less stuff when they're shopping because they're concerned it won't fit on the counter, or they take a bunch of stuff there but end up not paying for it because it won't fit, or what?

setr · on June 27, 2019

I think the issue is that it slows down processing time, reducing overall throughput, for little to no gain

And the bigger the line, the more likely customers drop out (at least, I do, especially if the purchase is small/insignificant)

Animats · on June 27, 2019

People select less stuff when they're shopping because they're concerned it won't fit on the counter.

Yes. This is an issue for stores that don't use shopping carts.

soVeryTired · on June 27, 2019

> They probably intend to discuss the Poisson distribution which is commonly used to characterize random customer inter-arrival times

Wouldn't that be the exponential distribution? The poisson distribution would tell you how many customers would arrive in any given time interval.

moandcompany · on June 27, 2019

Arrival processes are often characterized using Poisson distributions

Service processes are often characterized using exponential distributions

Some wikipedia rabbit holes to dive down:

https://en.wikipedia.org/wiki/Queueing_theory

https://en.wikipedia.org/wiki/Kendall%27s_notation

https://en.wikipedia.org/wiki/Jackson_network

Pazzaz · on June 27, 2019

That is actually a quote from the John Cook article[0] and so you can find the details of the random process there.

[0]: https://www.johndcook.com/blog/2008/10/21/what-happens-when-...

lapinot · on June 27, 2019

Indeed some stuff is missing... Poisson distributions model independently random events that have a given average rate. Its variable counts the number of events occurring before t. Waiting time between events is an exponential distribution (derived by computing the probability that 0 event occur before t).

Fun application: suppose you want to take a bus, on average they are independently 10mins apart (rate 1/10). On average when you arrive uniformly random, the next bus will arrive in 10mins (and likewise the last one was 10mins ago). This can be intuitively understood because you have more chances to arrive at a time when buses are further apart. The mean of the waiting time is 1/rate and not 1/(2*rate) like we could imagine.

buckminster · on June 27, 2019

Surprisingly, the distribution is not important. See Little's Law.

https://en.m.wikipedia.org/wiki/Little%27s_law

plorkyeran · on June 27, 2019

Little's Law is not directly applicable here. LL tells us the average wait time given the arrival rate and average size of the queue. Only one of those three things are known, and the distribution is relevant for calculating a second (letting us use LL to find the third).

There's a really obvious counterexample to "the distribution doesn't matter": with a uniform distribution, the average wait time will be zero.

tomsthumb · on June 27, 2019

There’s a great Kubecon talk on Queueing Theory if you want a bit more:

https://youtube.com/watch?v=yf6wSsOFqdI

tombert · on June 27, 2019

About a year ago, I was waiting in line at Dollar Tree (guilty pleasure), and I had to wait almost a half-hour in line. Even though there were two cashiers, they were constantly communicating back and forth, introducing a constant blocking factor and effectively reducing it one queue, and I was trying to think about how to model this with pi-calculus while I was waiting.

Obviously there's a math for basically anything, but reading this makes me glad I'm not the only one who has thought about this. This article has given me some reading fodder for this weekend...thanks!

wodenokoto · on June 27, 2019

Not really any theory, but some interesting anecdotes and research results.

I’d wished the author at least explained the 5 hour average wait. I mean, if the queue started at 30 people, then I can see it, but not if we start at 0.

isaacg · on June 27, 2019

In that setting, the line queue is 30 people long, on average. Sometimes more, sometimes less. Because 5.8 people arrive an hour out of the maximum of 6 that can be served an hour, and 6/(6-5.8)=30, the average queue length is 30.

oftenwrong · on June 27, 2019

Relevant to queueing theory:

"Stop Rate Limiting! Capacity Management Done Right" by Jon Moore

https://youtube.com/watch?v=m64SWl9bfvk

hinkley · on June 27, 2019

My favorite bit from queuing theory is the observation that if you execute the shortest tasks first, the average wait time for any one task is minimized.

jacques_chester · on June 28, 2019

You can go further. There's a result that if you know how much processing time is left for a job, and can pre-empt it, then the most effective scheduling rule is "Shortest Remaining Processing Time" or SRPT.

Harchol-Balter calls this the All-Can-Win Theorem, because it counterintuitively shows that jobs with more time to go being interrupted by jobs with less time to go will be better off.

https://www.cs.cmu.edu/~harchol/all-can-win.pdf

lapinot · on June 27, 2019

Are you sure? Aren't you talking about the average wait time for all tasks combined (sum of wait times) instead of the wait time for any one task. Obviously the optimal wait time any one task is 0, if you start right away. Intuitively i'd say if you sort the queue by size and always process the shortest task, you may indefinitely stall some long tasks.

hinkley · on June 27, 2019

I think that depends on whether your queue is a fixed data set or a stream of new entries.

I learned queuing theory in a class about multimedia, and I came away with a good theoretical understanding but my practical understanding was rubbish. When synchronizing two streams of tasks, the queue is not finite, but the pattern is fixed. For A/V it might be 22 frames per second, best effort, and exactly one second of audio per second.

But any queue with even a vague notion of deadlines will only let so many tasks 'cut in line'. At some point the undesirable task gets scheduled, even in the face of a constant stream of other work. Or it gets dropped. From what I understand, in a realtime system a low priority, long task might just be aborted. Because it's never a good time to run it.

isaacg · on June 27, 2019

You're right, it's average wait time over all tasks. However, processing the shortest tasks first doesn't delay the long tasks much, if length is reasonably variable, because the short tasks are almost negligible to the long tasks, and the long take don't need to wait on even longer tasks.

If long tasks are stalling indefinitely under this policy, then under FCFS the queue length with grow without bound, and that's not really much better.

hinkley · on June 27, 2019

Yes, the average wait time, full stop.

Now that I'm rereading, I'm not even sure what 'average wait time for one task' was supposed to mean. Averages are aggregate data about multiple samples.

amelius · on June 27, 2019

Curious, does Linux take this into account when it schedules processes?

(Of course, it can't know the execution time of a process beforehand, but it can deprioritize long-running processes, since the ETA for a long-running process is larger than for a short-running process).

Filligree · on June 27, 2019

Of a sort. It down-prioritizes processes which were previously descheduled by timer, and up-prioritizes those which are waiting for I/O. This generally has the desired effect of prioritizing interactive work.

If you mean prioritization based on total CPU time then no, I don't think it does that, but the above does approximate prioritization based on a (very small) sliding window of CPU use.

hinkley · on June 27, 2019

I wonder if that policy is purely empirical or if the people who decided to favor I/O had a more sophisticated understanding of the system.

Since a lot of algorithms have a log(n) component to them, either expressly or due to CPU physics, calling a function twice as often with half as much data may reduce call time a little bit, even a hair. I could see someone making a design decision based on a 2% disparity between two alternatives.

c3534l · on June 28, 2019

I'm always annoyed that whenever someone purports to have an article or book on queue theory, no one actually has any math. So I'm just left with a bunch of anectdotes and no understanding.

pcvarmint · on June 28, 2019

Myron Hlynka

https://web2.uwindsor.ca/math/hlynka/queue.html

lcuff · on June 27, 2019

Does it get more light-weight than this? Grrr. It's time for fluff.

kyllo · on June 27, 2019

I had never heard of queueing theory until I read about it in a supply chain management textbook, and would have appreciated being at least aware of its existence much earlier than that. This is light, but it's a great way to introduce the topic to people who have never been exposed to it.

By the way, I also worked in a supply chain management department of about two thousand people for three years and I never heard anyone else mention queueing theory while I was there, so now I really wish at least this blog post were required reading for everyone in that department.

winrid · on June 27, 2019

Thought I'd hear about some AMPQ... :)

nurettin · on June 27, 2019

What is there to hear about AMQP? Make exchanges, bind queues and send to routing keys. Add cluster and load balance for bonus points.

Now fast-food queues, that is an interesting problem.