[S] Site shutdowns

Post about what you like and dislike on AAO and suggest new features and improvements.

Moderator: EN - Forum Moderators

User avatar
DWaM
Posts: 1763
Joined: Fri Jun 01, 2012 9:23 am
Gender: Male
Spoken languages: English
Location: The Kingdom of Ellipses

[S] Site shutdowns

Post by DWaM »

(Moves from the Help and Support section)

I'm not sure how productive this will be, but at this point, I feel like this is something that should at least be addressed at some point.

Initially, after the host change, we understood that it wasn't quite what the previous one was -- the shutdowns that were initially were noticeable, but never terribly long and never terribly frequent. However, now, we've sort of reached a point where the site appears to be going down a few times within a ridiculously short timespan. Now, it's obvious that none of this is Unas' fault. The issue is evidently with the host. But it feels like something should be said on the matter publicly so that trial makers are aware of what's going on. I fear if there's no communication in regards to this, people who aren't used to this new state of things will just bail on the site if they find hours of work lost because the site happened to go down at the wrong moment.

Basically, it would be nice if some of these things are addressed:
  • Could we get information on why the site is potentially going down on this new host? Is it an issue that can be fixed?
  • Is there some sort of frequency or regularity to the times the site goes down so that we can at least prepare ourselves for it, and trial makers can properly know not to make any major progress in that time?
  • If not, is there some way we can communicate to people just now finding the site that the site going down is a potential issue?
  • Is it possible to make someone else, in addition to Unas, be able to bring the site back online, so that Unas doesn't have to be the one pinged every time this happens? Especially given the frequency, it's safe to say Unas will often find himself busy doing... well, more important things.
  • Are there any plans for perhaps changing the host and eliminating this issue altogether?
I admit some of these are probably a bit tough to answer and on some I can already sort of guess the answer to, but I feel like it's important to at least talk about it publicly and address the issue head-on at this point.

Especially given the fact that we may be looking at an influx of cases this summer due to a game jam being (potentially) hosted on the AA/Court Records discord. If, by that time, the site ends up being perceived as unreliable due to shutdowns, people might end up not using AAO at all, which would be a damn shame.

Thank you in advance.
User avatar
Enthalpy
Community Manager
Posts: 5169
Joined: Wed Jan 04, 2012 4:40 am
Gender: Male
Spoken languages: English, limited Spanish

Re: [S] Site shutdowns

Post by Enthalpy »

DWaM wrote:It's important to at least talk about it publicly and address the issue head-on at this point.

Especially given the fact that we may be looking at an influx of cases this summer due to a game jam being (potentially) hosted on the AA/Court Records discord. If, by that time, the site ends up being perceived as unreliable due to shutdowns, people might end up not using AAO at all, which would be a damn shame.
Absolutely agreed. It's crossed my mind a time or two that it will be really bad if AAO doesn't recover from one of these outages, or if something worse than normal happens.

I don't have authority over anything server or hosting relating, so I can only relate what Unas has already told me.
DWaM wrote:Could we get information on why the site is potentially going down on this new host? Is it an issue that can be fixed?
We'd like information ourselves. Here's what we know:
  • The webhosting platform, Scaleway, has not been helpful in identifying the problem.
  • Unas has checked the physical machine that does the hosting and hasn't found a problem.
  • The outage in November gave a different error message than the other ones and likely has a different cause. This is the only shutdown that gave server logs. mysqld ran out of memory, then the OS did, and in a last-ditch effort to prevent a server crash, another program killed mysqld. Shortly before this happened, there was a large number of something called "innoDB semaphores" and apache2 threads. Unas has logs for this and shared some with me, but I can't read them well enough to know if sharing them is safe to do.
  • The latest outage had a different cause and was due to a problem with Scaleway that affected multiple websites.
DWaM wrote:Is there some sort of frequency or regularity to the times the site goes down so that we can at least prepare ourselves for it, and trial makers can properly know not to make any major progress in that time?
No. We still don't know why it happens, let alone when.
DWaM wrote:If not, is there some way we can communicate to people just now finding the site that the site going down is a potential issue?
Yes. I can put up an Announcement explaining things easily enough, though I'd prefer to wait to give Unas a chance to chime in, since he knows more about the outages than I do. (I've sent him an e-mail about this topic and will give him a week before I put up an announcement.)
DWaM wrote:Is it possible to make someone else, in addition to Unas, be able to bring the site back online, so that Unas doesn't have to be the one pinged every time this happens? Especially given the frequency, it's safe to say Unas will often find himself busy doing... well, more important things.
I don't know if this is technically possible, but I'd be willing to take on this role.
DWaM wrote:Are there any plans for perhaps changing the host and eliminating this issue altogether?
None that I know of, but Unas may have some.

Let me know if I can do anything else.
[D]isordered speech is not so much injury to the lips that give it forth, as to the disproportion and incoherence of things in themselves, so negligently expressed. ~ Ben Jonson
User avatar
Unas
Admin / Site programmer
Posts: 8850
Joined: Tue Jul 10, 2007 4:43 pm
Gender: Male
Spoken languages: Français, English, Español
Contact:

Re: [S] Site shutdowns

Post by Unas »

Enth summed it up quite well.

Basically, the server tends to go in a state called "kernel panic" and not respond to anything. When this happens, unfortunately, the crash is so severe that there is no log written, so there is no proper way to investigate the cause after forcing a reboot.
Once, however, we were lucky, and the server only "half-crashed" (ie the database was killed, but the kernel and apache servers still up), which allowed me to get some meaningful logs highlighting RAM issues. Not quite sure whether it was the same issue, but I tend to think that it was.
Back then, I tried to tweak the server's performance limits so it would not use up all the memory, but apparently without much effect, judging from continued occurrences of the problem - and unfortunately I didn't take the time to look at it again until today...

As for last week's issue, it was basically the same, except that at the same time there was also a wider issue on Scaleway's network that was preventing reboot, so I had to wait until they fixed it before I was able to trigger the reboot that brought the site back online...

As far as what can be done about this, well...
  • I just added additional restrictions to the server's performance settings. We'll see if it behaves better from now on.
  • Years ago, I developed support for serving the AAO static files from a separate server, to decrease the load on the main one - but I never took the time to actually set it up.
    If the issue still occurs, I guess I could set up an additional server and use it for that. Thankfully, these servers are cheap enough, it wouldn't ruin me to set up a second one, even though I'd rather avoid it if not necessary...
  • Unfortunately, as far I know, I can't give anyone else access to reboot the machine without giving them my personal credentials to the host's admin console.
    Given these credentials are also linked to my payment information, I ovbiously won't give them to anyone.
When is the competition you're talking about supposed to take place ?
ImageImageImage
If knowledge can create problems, it is not through ignorance that we can solve them.
Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra. ( Isaac Asimov )
User avatar
Exedeb
Posts: 115
Joined: Mon Jun 29, 2009 8:33 pm
Gender: Male
Spoken languages: Italian, English

Re: [S] Site shutdowns

Post by Exedeb »

The Game Jam is going to start from July 1 to August 11.

For more info, see this webpage.
User avatar
Unas
Admin / Site programmer
Posts: 8850
Joined: Tue Jul 10, 2007 4:43 pm
Gender: Male
Spoken languages: Français, English, Español
Contact:

Re: [S] Site shutdowns

Post by Unas »

Thanks. If we experience new site shutdowns by then, I'll set up an additional server.
ImageImageImage
If knowledge can create problems, it is not through ignorance that we can solve them.
Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra. ( Isaac Asimov )
User avatar
energizerspark
Posts: 4130
Joined: Thu Jan 21, 2010 5:41 pm
Gender: Male
Spoken languages: English
Location: the Whole Sort of General Mish Mash

Re: [S] Site shutdowns

Post by energizerspark »

I'm assuming I'm not the only person that couldn't connect yesterday?
this signature has been left as it was when I left the forum for archival purposes
Currently watching:
Steven Universe
Currently playing:
Currently reading:
Image
the avatar is from Urusei Yatsura in case you were wondering
Image
User avatar
Southern Corn
Posts: 171
Joined: Sat May 19, 2018 6:05 pm
Gender: Male
Spoken languages: English, Bad Jokes

Re: [S] Site shutdowns

Post by Southern Corn »

Nope. Wasn't able to for most of the day either.
Image
Image
User avatar
Gosicrystal
Posts: 39
Joined: Mon Apr 24, 2017 7:54 pm
Gender: Male
Spoken languages: Español, English

Re: [S] Site shutdowns

Post by Gosicrystal »

Me too.
User avatar
Enthalpy
Community Manager
Posts: 5169
Joined: Wed Jan 04, 2012 4:40 am
Gender: Male
Spoken languages: English, limited Spanish

Re: [S] Site shutdowns

Post by Enthalpy »

It was, as you surmised, another site shutdown. I wouldn't be surprised to hear word from Unas on this.
[D]isordered speech is not so much injury to the lips that give it forth, as to the disproportion and incoherence of things in themselves, so negligently expressed. ~ Ben Jonson
User avatar
Unas
Admin / Site programmer
Posts: 8850
Joined: Tue Jul 10, 2007 4:43 pm
Gender: Male
Spoken languages: Français, English, Español
Contact:

Re: [S] Site shutdowns

Post by Unas »

Hi there,

Sorry for taking care of that a bit late... Anyway, as promised, since another shutdown occurred in June, I've decided to set up an additional server to serve all static files of AAO (this includes trial pictures, sounds and music from the default AAO asset-base).

I've just finalised the setup to use this new server : you may notice that, from now on, all these files will be served from http://asuras.aaonline.fr/ (Don't try to access this URL directly though - there is nothing to see)

What this should give in theory is :
  • Greatly reduce the number of queries to the AAO's main Apache server. Therefore, hopefully allow webpages on the site to load faster, and more importantly, hopefully avoid future kernel panics (since I suspect those were caused by intense Apache traffic - even though my settings should prevent this).
  • Load all these static files through a much lighter and faster stack : the new server uses nginx instead of apache, which should be a bit faster for serving static content. So hopefully trials should load a bit faster as well.
It may require some fine tuning though, so don't hesitate to let me know your impressions - if things are faster, slower, buggy, etc.
ImageImageImage
If knowledge can create problems, it is not through ignorance that we can solve them.
Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra. ( Isaac Asimov )
User avatar
kwando1313
Posts: 7684
Joined: Tue Jul 22, 2008 6:33 pm
Gender: Male
Spoken languages: English, Français (un peu), Ancient Belkan
Location: Uminari City

Re: [S] Site shutdowns

Post by kwando1313 »

Is there a reason we don't use nginx for everything? Since (afaik) that's the standard thing used for website servicing nowadays...
Avatar made by Rimuu~

Image

"The Knight of the Iron Hammer, Vita, and the Steel Count, Graf Eisen. There's nothing in this world we can't destroy."
User avatar
Unas
Admin / Site programmer
Posts: 8850
Joined: Tue Jul 10, 2007 4:43 pm
Gender: Male
Spoken languages: Français, English, Español
Contact:

Re: [S] Site shutdowns

Post by Unas »

The reason is PHP - in which all the server-side code of AAO is written.

Apache has a rather deeply integrated PHP module which has no real equivalent in ngnix.
It's possible to execute PHP on nginx (through a system called php-fpm), but in my experience it always involved a significant overhead in processing time. I experimented with it a few years ago, and it was adding around 0.3 to 0.5s to each single php request.
In fact, I actually use it on the new server (I have a few admin and monitoring tools based on PHP as well), and have the same experience again - the tool feels significantly slower than the same running on Apache on the main server.

This may not be true with cutting edge versions of php or nginx, but for now I'm staying on old stable releases.
ImageImageImage
If knowledge can create problems, it is not through ignorance that we can solve them.
Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra. ( Isaac Asimov )
Super legenda
Posts: 860
Joined: Mon Sep 11, 2017 8:10 pm
Gender: Male
Spoken languages: Español e Ingles

Re: [S] Site shutdowns

Post by Super legenda »

It happened again.
User avatar
Southern Corn
Posts: 171
Joined: Sat May 19, 2018 6:05 pm
Gender: Male
Spoken languages: English, Bad Jokes

Re: [S] Site shutdowns

Post by Southern Corn »

It happened for a whole day too.
Image
Image
User avatar
drvonkitty
Posts: 567
Joined: Sat Apr 14, 2012 12:25 am
Spoken languages: English

Re: [S] Site shutdowns

Post by drvonkitty »

A couple questions:

One, is the cost of the server a problem? I don't know about anyone else, but I wouldn't mind chipping in a couple bucks a month to help with server running costs, if that'd help lighten the load.

And two, how severe are these crashes? Could trial data be at risk in a worst-case scenario crash?
Image

Image
Post Reply