Discourse gives spinny icon and requires restart to fix

bmbouter · September 7, 2022, 3:28pm

Periodically, e.g. maybe every 2-3 weeks the discourse site becomes unusable. I believe a request to the admins to “restart” the service fixes it.

To help figure this out, I’ve opened this community question in the upstream discourse. It has questions that only the infra folks can answer, can one of the infra folks join in there please?

Thanks!

misc · September 7, 2022, 8:57pm

So, I had a theory, which was that the spinning wheel is caused by a failure to get past some intermediate proxy. And as I wanted to post about it, the spinning wheel appeared.

So I tried to play with timeout on the proxy, no luck (my theory being “too much people are connected and polling, our proxy is using some default config that could be too low”) Then I looked at the graph, nothing egregious (as I really think haproxy should handle 5 clients…). Then I tried restarting as I did before, and no luck again.

My last attempt was to switch from 2 puma workers to 1 (puma being, if I am not wrong, the application server we use for Ruby ?), assuming some weird deadlock that get triggered only when lots of people use it (eg, something we wouldn’t have seen during test, but we would notice as more and more people use the forum, as @bmbouter noted elsewhere).

And it seems to have fixed the issue, somehow ?

I do not like magic fix I can’t explain, so for now, I am not touching to anything (and because it is late), but please ping me if it break again.

Maybe it got fixed because the pod need to be restarted, but only after some time as there is a lock somewhere that expire, or something like that (and my puma worker change is not why it got fixed).

bmbouter · September 15, 2022, 6:14pm

I haven’t seen the issue since then, but we’ve also gone longer periods without it occurring. I’ll post here the next time I do see a problem, and hopefully your change did resolve it. Thank you!

x9c4 · October 11, 2022, 7:17am

It’s been a month and i haven’t seen it happen again.
Thanks for fixing it.