Page 1 of 2

[S] Site shutdowns

Posted: Wed Apr 18, 2018 6:15 pm
by DWaM
(Moves from the Help and Support section)

I'm not sure how productive this will be, but at this point, I feel like this is something that should at least be addressed at some point.

Initially, after the host change, we understood that it wasn't quite what the previous one was -- the shutdowns that were initially were noticeable, but never terribly long and never terribly frequent. However, now, we've sort of reached a point where the site appears to be going down a few times within a ridiculously short timespan. Now, it's obvious that none of this is Unas' fault. The issue is evidently with the host. But it feels like something should be said on the matter publicly so that trial makers are aware of what's going on. I fear if there's no communication in regards to this, people who aren't used to this new state of things will just bail on the site if they find hours of work lost because the site happened to go down at the wrong moment.

Basically, it would be nice if some of these things are addressed:
  • Could we get information on why the site is potentially going down on this new host? Is it an issue that can be fixed?
  • Is there some sort of frequency or regularity to the times the site goes down so that we can at least prepare ourselves for it, and trial makers can properly know not to make any major progress in that time?
  • If not, is there some way we can communicate to people just now finding the site that the site going down is a potential issue?
  • Is it possible to make someone else, in addition to Unas, be able to bring the site back online, so that Unas doesn't have to be the one pinged every time this happens? Especially given the frequency, it's safe to say Unas will often find himself busy doing... well, more important things.
  • Are there any plans for perhaps changing the host and eliminating this issue altogether?
I admit some of these are probably a bit tough to answer and on some I can already sort of guess the answer to, but I feel like it's important to at least talk about it publicly and address the issue head-on at this point.

Especially given the fact that we may be looking at an influx of cases this summer due to a game jam being (potentially) hosted on the AA/Court Records discord. If, by that time, the site ends up being perceived as unreliable due to shutdowns, people might end up not using AAO at all, which would be a damn shame.

Thank you in advance.

Re: [S] Site shutdowns

Posted: Thu Apr 19, 2018 12:45 am
by Enthalpy
DWaM wrote:It's important to at least talk about it publicly and address the issue head-on at this point.

Especially given the fact that we may be looking at an influx of cases this summer due to a game jam being (potentially) hosted on the AA/Court Records discord. If, by that time, the site ends up being perceived as unreliable due to shutdowns, people might end up not using AAO at all, which would be a damn shame.
Absolutely agreed. It's crossed my mind a time or two that it will be really bad if AAO doesn't recover from one of these outages, or if something worse than normal happens.

I don't have authority over anything server or hosting relating, so I can only relate what Unas has already told me.
DWaM wrote:Could we get information on why the site is potentially going down on this new host? Is it an issue that can be fixed?
We'd like information ourselves. Here's what we know:
  • The webhosting platform, Scaleway, has not been helpful in identifying the problem.
  • Unas has checked the physical machine that does the hosting and hasn't found a problem.
  • The outage in November gave a different error message than the other ones and likely has a different cause. This is the only shutdown that gave server logs. mysqld ran out of memory, then the OS did, and in a last-ditch effort to prevent a server crash, another program killed mysqld. Shortly before this happened, there was a large number of something called "innoDB semaphores" and apache2 threads. Unas has logs for this and shared some with me, but I can't read them well enough to know if sharing them is safe to do.
  • The latest outage had a different cause and was due to a problem with Scaleway that affected multiple websites.
DWaM wrote:Is there some sort of frequency or regularity to the times the site goes down so that we can at least prepare ourselves for it, and trial makers can properly know not to make any major progress in that time?
No. We still don't know why it happens, let alone when.
DWaM wrote:If not, is there some way we can communicate to people just now finding the site that the site going down is a potential issue?
Yes. I can put up an Announcement explaining things easily enough, though I'd prefer to wait to give Unas a chance to chime in, since he knows more about the outages than I do. (I've sent him an e-mail about this topic and will give him a week before I put up an announcement.)
DWaM wrote:Is it possible to make someone else, in addition to Unas, be able to bring the site back online, so that Unas doesn't have to be the one pinged every time this happens? Especially given the frequency, it's safe to say Unas will often find himself busy doing... well, more important things.
I don't know if this is technically possible, but I'd be willing to take on this role.
DWaM wrote:Are there any plans for perhaps changing the host and eliminating this issue altogether?
None that I know of, but Unas may have some.

Let me know if I can do anything else.

Re: [S] Site shutdowns

Posted: Sun Apr 22, 2018 5:13 pm
by Unas
Enth summed it up quite well.

Basically, the server tends to go in a state called "kernel panic" and not respond to anything. When this happens, unfortunately, the crash is so severe that there is no log written, so there is no proper way to investigate the cause after forcing a reboot.
Once, however, we were lucky, and the server only "half-crashed" (ie the database was killed, but the kernel and apache servers still up), which allowed me to get some meaningful logs highlighting RAM issues. Not quite sure whether it was the same issue, but I tend to think that it was.
Back then, I tried to tweak the server's performance limits so it would not use up all the memory, but apparently without much effect, judging from continued occurrences of the problem - and unfortunately I didn't take the time to look at it again until today...

As for last week's issue, it was basically the same, except that at the same time there was also a wider issue on Scaleway's network that was preventing reboot, so I had to wait until they fixed it before I was able to trigger the reboot that brought the site back online...

As far as what can be done about this, well...
  • I just added additional restrictions to the server's performance settings. We'll see if it behaves better from now on.
  • Years ago, I developed support for serving the AAO static files from a separate server, to decrease the load on the main one - but I never took the time to actually set it up.
    If the issue still occurs, I guess I could set up an additional server and use it for that. Thankfully, these servers are cheap enough, it wouldn't ruin me to set up a second one, even though I'd rather avoid it if not necessary...
  • Unfortunately, as far I know, I can't give anyone else access to reboot the machine without giving them my personal credentials to the host's admin console.
    Given these credentials are also linked to my payment information, I ovbiously won't give them to anyone.
When is the competition you're talking about supposed to take place ?

Re: [S] Site shutdowns

Posted: Wed May 23, 2018 12:52 am
by Exedeb
The Game Jam is going to start from July 1 to August 11.

For more info, see this webpage.

Re: [S] Site shutdowns

Posted: Fri May 25, 2018 9:41 pm
by Unas
Thanks. If we experience new site shutdowns by then, I'll set up an additional server.

Re: [S] Site shutdowns

Posted: Fri Jun 08, 2018 4:09 pm
by energizerspark
I'm assuming I'm not the only person that couldn't connect yesterday?

Re: [S] Site shutdowns

Posted: Fri Jun 08, 2018 5:13 pm
by Southern Corn
Nope. Wasn't able to for most of the day either.

Re: [S] Site shutdowns

Posted: Fri Jun 08, 2018 10:43 pm
by Gosicrystal
Me too.

Re: [S] Site shutdowns

Posted: Sun Jun 10, 2018 12:25 am
by Enthalpy
It was, as you surmised, another site shutdown. I wouldn't be surprised to hear word from Unas on this.

Re: [S] Site shutdowns

Posted: Fri Jun 22, 2018 12:36 am
by Unas
Hi there,

Sorry for taking care of that a bit late... Anyway, as promised, since another shutdown occurred in June, I've decided to set up an additional server to serve all static files of AAO (this includes trial pictures, sounds and music from the default AAO asset-base).

I've just finalised the setup to use this new server : you may notice that, from now on, all these files will be served from http://asuras.aaonline.fr/ (Don't try to access this URL directly though - there is nothing to see)

What this should give in theory is :
  • Greatly reduce the number of queries to the AAO's main Apache server. Therefore, hopefully allow webpages on the site to load faster, and more importantly, hopefully avoid future kernel panics (since I suspect those were caused by intense Apache traffic - even though my settings should prevent this).
  • Load all these static files through a much lighter and faster stack : the new server uses nginx instead of apache, which should be a bit faster for serving static content. So hopefully trials should load a bit faster as well.
It may require some fine tuning though, so don't hesitate to let me know your impressions - if things are faster, slower, buggy, etc.

Re: [S] Site shutdowns

Posted: Fri Jun 22, 2018 1:14 am
by kwando1313
Is there a reason we don't use nginx for everything? Since (afaik) that's the standard thing used for website servicing nowadays...

Re: [S] Site shutdowns

Posted: Fri Jun 22, 2018 11:01 am
by Unas
The reason is PHP - in which all the server-side code of AAO is written.

Apache has a rather deeply integrated PHP module which has no real equivalent in ngnix.
It's possible to execute PHP on nginx (through a system called php-fpm), but in my experience it always involved a significant overhead in processing time. I experimented with it a few years ago, and it was adding around 0.3 to 0.5s to each single php request.
In fact, I actually use it on the new server (I have a few admin and monitoring tools based on PHP as well), and have the same experience again - the tool feels significantly slower than the same running on Apache on the main server.

This may not be true with cutting edge versions of php or nginx, but for now I'm staying on old stable releases.

Re: [S] Site shutdowns

Posted: Wed Jul 11, 2018 11:24 pm
by Super legenda
It happened again.

Re: [S] Site shutdowns

Posted: Thu Jul 12, 2018 4:05 am
by Southern Corn
It happened for a whole day too.

Re: [S] Site shutdowns

Posted: Thu Jul 12, 2018 6:49 am
by drvonkitty
A couple questions:

One, is the cost of the server a problem? I don't know about anyone else, but I wouldn't mind chipping in a couple bucks a month to help with server running costs, if that'd help lighten the load.

And two, how severe are these crashes? Could trial data be at risk in a worst-case scenario crash?