Intermittent 500 Internal Server Error


#1

Hi there,

Every few months, I get a series of ERROR 500 intermittently on my instance (lgbt.io) until I clear remote file caches (despite still having more than 50% free disk space), clear Sidekiq’s Retries and Dead tasks and do a mastodon service restart. Sometimes even a server reboot, in order to get my instance back to stable state.

I tried to relate my server RAM with ERROR 500. Despite my server RAM (8GB) is constantly topped, ERROR 500 doesn’t happen constantly (it happens only every few months). So, the error may or may not caused by memory?

I also wonder what files or any error logs that should I look into to find out the underlying issues?

[My instance has a total of 750 members, 44000 toots and 1653 connections to other instances]


#2

Where the logs are depends on your deployment choice, Docker or standalone. You would want to look at the logs of the web process, in Docker it’s: docker logs mastodon_web_1 (container name may vary, add -f if you want the log to update in real-time), with systemd it’s: journalctl -u mastodon-web (likewise about the -f)

It’s important to understand what goes wrong specifically, normally you shouldn’t need to clear anything manually.


#3

@Gargron using the command guided by your reply, all I see was lines like the following, I assume they are okay?

May 27 13:01:42 host.lgbt.io bundle[7320]: [10d54a2e-fa84-4332-bfa5-752fd27432a0] method=GET path=/api/v1/statuses/99902894372540113/card format=html controller=Api::V1
May 27 13:01:42 host.lgbt.io bundle[7320]: [5ca71e99-65fa-4f52-9add-c42a8e711189] [active_model_serializers] Rendered ActiveModel::Serializer::Null with Hash (0.05ms)
May 27 13:01:42 host.lgbt.io bundle[7320]: [5ca71e99-65fa-4f52-9add-c42a8e711189] method=GET path=/api/v1/statuses/99902971888653498/card format=html controller=Api::V1
May 27 13:01:43 host.lgbt.io bundle[7320]: [04611196-b5c0-418e-b1d6-3f1c873dd31d] [active_model_serializers] Rendered ActiveModel::Serializer::Null with Hash (0.05ms)
May 27 13:01:43 host.lgbt.io bundle[7320]: [04611196-b5c0-418e-b1d6-3f1c873dd31d] method=GET path=/api/v1/statuses/99903009073853845/card format=html controller=Api::V1

When I check NGINX’s error log, I see many connect() failed (111: Connection refused) errors, such as following:

root@host:~# tail -10 /var/log/nginx/error.log
2018/05/27 14:33:19 [error] 7297#7297: *8367 connect() failed (111: Connection refused) while connecting to upstream, client: 2001:b011:8011:19f6:6890:ea00:bd41:bd0, server: lgbt.io, request: "GET /api/v1/streaming/?stream=public:local&access_token=55724370ac7e74ef97d94d94a0a21e75f47ebfaa04d8e9acd1b79e51362b9e5a HTTP/1.1", upstream: "http://127.0.0.1:4004/api/v1/streaming/?stream=public:local&access_token=55724370ac7e74ef97d94d94a0a21e75f47ebfaa04d8e9acd1b79e51362b9e5a", host: "lgbt.io"
2018/05/27 14:33:19 [error] 7297#7297: *8367 connect() failed (111: Connection refused) while connecting to upstream, client: 2001:b011:8011:19f6:6890:ea00:bd41:bd0, server: lgbt.io, request: "GET /api/v1/streaming/?stream=public:local&access_token=55724370ac7e74ef97d94d94a0a21e75f47ebfaa04d8e9acd1b79e51362b9e5a HTTP/1.1", upstream: "http://127.0.0.1:4003/api/v1/streaming/?stream=public:local&access_token=55724370ac7e74ef97d94d94a0a21e75f47ebfaa04d8e9acd1b79e51362b9e5a", host: "lgbt.io"
2018/05/27 14:33:53 [error] 7297#7297: *8416 connect() failed (111: Connection refused) while connecting to upstream, client: 95.154.215.48, server: lgbt.io, request: "GET /api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9 HTTP/1.1", upstream: "http://127.0.0.1:4002/api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9", host: "lgbt.io"
2018/05/27 14:33:53 [error] 7297#7297: *8416 connect() failed (111: Connection refused) while connecting to upstream, client: 95.154.215.48, server: lgbt.io, request: "GET /api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9 HTTP/1.1", upstream: "http://127.0.0.1:4003/api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9", host: "lgbt.io"
2018/05/27 14:33:53 [error] 7297#7297: *8416 connect() failed (111: Connection refused) while connecting to upstream, client: 95.154.215.48, server: lgbt.io, request: "GET /api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9 HTTP/1.1", upstream: "http://127.0.0.1:4004/api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9", host: "lgbt.io"
2018/05/27 14:39:06 [error] 7298#7298: *8703 connect() failed (111: Connection refused) while connecting to upstream, client: 92.40.249.96, server: lgbt.io, request: "GET /api/v1/streaming/user HTTP/1.1", upstream: "http://127.0.0.1:4003/api/v1/streaming/user", host: "lgbt.io"
2018/05/27 14:39:06 [error] 7298#7298: *8703 connect() failed (111: Connection refused) while connecting to upstream, client: 92.40.249.96, server: lgbt.io, request: "GET /api/v1/streaming/user HTTP/1.1", upstream: "http://127.0.0.1:4004/api/v1/streaming/user", host: "lgbt.io"
2018/05/27 14:39:06 [error] 7298#7298: *8703 connect() failed (111: Connection refused) while connecting to upstream, client: 92.40.249.96, server: lgbt.io, request: "GET /api/v1/streaming/user HTTP/1.1", upstream: "http://127.0.0.1:4002/api/v1/streaming/user", host: "lgbt.io"
2018/05/27 14:39:22 [error] 7296#7296: *8735 connect() failed (111: Connection refused) while connecting to upstream, client: 95.154.215.48, server: lgbt.io, request: "GET /api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9 HTTP/1.1", upstream: "http://127.0.0.1:4004/api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9", host: "lgbt.io"
2018/05/27 14:39:22 [error] 7296#7296: *8735 connect() failed (111: Connection refused) while connecting to upstream, client: 95.154.215.48, server: lgbt.io, request: "GET /api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9 HTTP/1.1", upstream: "http://127.0.0.1:4003/api/v1/streaming/?stream=user&access_token=c853ef61779db1c1665c7d4054b72305c3df72798d64f534daf621c43157c2c9", host: "lgbt.io"

I have the following setup in /etc/nginx/sites-enabled/mastodon

upstream mastodon-streaming {
    least_conn;
    server 127.0.0.1:4000 weight=10 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:4001 weight=10 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:4002 weight=10 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:4003 weight=10 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:4004 weight=10 max_fails=3 fail_timeout=30s;
}

location /api/v1/streaming {
   #proxy_pass http://localhost:4000;
   proxy_pass http://mastodon-streaming;
}

I also have the following mastodon-web.service variables:

Environment="WEB_CONCURRENCY=10"
Environment="MAX_THREADS=20"

My server has 8GB RAM. The RAM used up was always more than 95%. The Error 500 issue only appears after some time the RAM hitting maximum, Error 500 did not occur right away when RAM usage first hit maximum. So this makes me wonder if Error 500 was related to RAM or other things.

I am still trying to see if I missed out anything.

May I know if you see any issues with the setup above?


#4

Are those services on 4001, 4002, 4003 really working? Can you check the output of netstat with a right options to see listening ports?


#5

@saper you were right those ports weren’t listening. I have removed 4002, 4003 and 4004 since I was just testing something. 4001 is listening. After playing around with the streaming and web services, as well as the web_concurrency and max_threads, my server does seem to stabilise, but I am looking at moving my cloud server to one that gives much higher RAM resources. :slight_smile:


#6

While you are here, maybe it is worth to check what is taking so much memory (a common compliant about Mastodon). There are some guides how to do that with Rails, on #FreeBSD I’d start with checking procstat(1)