High amount of Sidekiq retries, toots not appearing

Hi there,

I’m the admin over at mmorpg.social.

I’m experiencing a problem with toots from other instances not appearing in my federated timeline. I’m getting a high number of retries and failures in Sidekiq (over 80,000 failures in 3 days). I’ve attached a log below:

When I look at the detailed sidekiq logs, I get a group of entries similar to the below:

sidekiq_1    | 2018-12-16T20:50:26.753Z 6 TID-gnzb91ika ActivityPub::ProcessingWorker JID-3090432b2f9b673dd6e4f0ca INFO: start
sidekiq_1    | 2018-12-16T20:50:27.799Z 6 TID-gnzb91ika ActivityPub::ProcessingWorker JID-3090432b2f9b673dd6e4f0ca INFO: fail: 1.046 sec
sidekiq_1    | 2018-12-16T20:50:27.800Z 6 TID-gnzb91ika WARN: {"context":"Job raised exception","job":{"class":"ActivityPub::ProcessingWorker","args":[9,"{\"type\": \"Announce\", \"to\": [\"https://relay.mastodon.host/actor/followers\"], \"object\": \"https://mastodon.host/users/anexcursion/statuses/101252513258197361\", \"actor\": \"https://relay.mastodon.host/actor\", \"id\": \"https://relay.mastodon.host/activities/283a62e7-75a4-4bca-8bdb-c435b0dfaf20\", \"@context\": \"https://www.w3.org/ns/activitystreams\"}",null],"retry":true,"queue":"default","backtrace":true,"jid":"3090432b2f9b673dd6e4f0ca","created_at":1544990747.2852695,"enqueued_at":1544993426.7524924,"error_message":"failed to connect: No address for mastodon.host on https://mastodon.host/users/anexcursion/statuses/101252513258197361","error_class":"HTTP::ConnectionError","failed_at":1544990748.3137655,"retry_count":6,"error_backtrace":["/mastodon/app/lib/request.rb:183:in `open'"],"retried_at":1544992064.727663},"jobstr":"{\"class\":\"ActivityPub::ProcessingWorker\",\"args\":[9,\"{\\\"type\\\": \\\"Announce\\\", \\\"to\\\": [\\\"https://relay.mastodon.host/actor/followers\\\"], \\\"object\\\": \\\"https://mastodon.host/users/anexcursion/statuses/101252513258197361\\\", \\\"actor\\\": \\\"https://relay.mastodon.host/actor\\\", \\\"id\\\": \\\"https://relay.mastodon.host/activities/283a62e7-75a4-4bca-8bdb-c435b0dfaf20\\\", \\\"@context\\\": \\\"https://www.w3.org/ns/activitystreams\\\"}\",null],\"retry\":true,\"queue\":\"default\",\"backtrace\":true,\"jid\":\"3090432b2f9b673dd6e4f0ca\",\"created_at\":1544990747.2852695,\"enqueued_at\":1544993426.7524924,\"error_message\":\"failed to connect: No address for mastodon.host on https://mastodon.host/users/anexcursion/statuses/101252513258197361\",\"error_class\":\"HTTP::ConnectionError\",\"failed_at\":1544990748.3137655,\"retry_count\":6,\"error_backtrace\":[\"/mastodon/app/lib/request.rb:183:in `open'\"],\"retried_at\":1544992064.727663}"}
sidekiq_1    | 2018-12-16T20:50:27.800Z 6 TID-gnzb91ika WARN: HTTP::ConnectionError: failed to connect: No address for mastodon.host on https://mastodon.host/users/anexcursion/statuses/101252513258197361
sidekiq_1    | 2018-12-16T20:50:27.800Z 6 TID-gnzb91ika WARN: /mastodon/app/lib/request.rb:183:in `open'

Any clue what might be the issue here? I’ve done some curl checks from the server and they all resolve on both ip4 and ip6, so I’m a little lost here.

Thanks in advance!

1 Like


Tell me about it…

Waste of server CPU time and means actual working instances are not getting content.

I think I’m getting close to the issue.

Turns out, trying to run a federated mastadon instance on a single CPU VPS is a no-no. Even though the CPU was barely hitting above 30% utilisation, tasks would time-out waiting for a DNS resolution. As soon as I resized to a two-CPU VPS, the problem vanished.

This makes me think that there might be a timeout issue on some of the task steps. (like DNS lookups) which can result in a queue building up as jobs get re-queued. I’ll have to do some digging to find it in the code, but it feels like some tuning might help.

Hey Gazimoff, I’m having the exact same issue. Did you ever figure out what was going on?

Unfortunately I didn’t get to the bottom of it, but moving to a multi-core virtual machine fixed it for my load of 150 users. I’ve put it on the back burner for now (I still have a few retries, but they’re limited to certain instances as @humblr pointed out.)

150 User online at a time or Just 150 user registered?