Images not downloading after storage failure

Hello,

I am running Mastodon on my Kubernetes cluster, and it was working great until I decided to upgrade Kubernetes and somehow managed to wipe all my storage, meaning I lost all images and my redis instance’s persistant file.

I’ve re-created the storage for /public/system but for some reason mastodon won’t download any images.

Permissions are correct and the mastodon user can write to the directory:

mastodon@mastodon-rails-64486fb845-v6m2g:~$ id
uid=991(mastodon) gid=991(mastodon) groups=991(mastodon),2000

mastodon@mastodon-rails-64486fb845-v6m2g:~$ ls -ld public/system/
drwxrwsr-x. 7 mastodon mastodon 4096 Jun 12 01:22 public/system/

mastodon@mastodon-rails-64486fb845-v6m2g:~$ echo "Hello!" > public/system/testfile.txt

mastodon@mastodon-rails-64486fb845-v6m2g:~$ ls -l public/system/testfile.txt
-rw-r--r--. 1 mastodon mastodon 7 Jun 12 10:25 public/system/testfile.txt

mastodon@mastodon-rails-64486fb845-v6m2g:~$ cat public/system/testfile.txt
Hello!

I’m not seeing any errors in rails or in sidekiq which could give a clue what the issue is.
Would someone be able to point me in the correct direction how I can debug this?
Is it sidekiq which downloads images? or the rails instance?

Sidekiq¹ downloads images in almost all cases, and you should see any errors in the STDOUT log for the sidekiq container. Since I don’t know much about kubernetes and it’s not a supported mastodon configuration, i’m not sure if i can give a lot of help beyond that. Is it possible the images are getting written to the container but not propagated to persistent storage? are you sure the public/system volume is relative to the app’s working directly? In the docker container I would expect that to be /mastodon/public/system.

¹To be precise, sidekiq is also an instance of rails, just not a web server.

Doh. I figured it out.

I stupidly was focusing on permissions errors in the sidekiq logs, and didn’t pay attention to DNS resolution errors.

I figured they were just instances that had shutdown, as I have seen that happen quite a lot… but upon closer inspection, sidekiq wasn’t able to resolve ANY hostname.

For some reason, K8S is setting ndots:5 in /etc/resolv.conf, which means pretty much all any instance which has a hostname with fewer than 5 dots get appended with the my search path, which is causing DNS resolution to fail.

I had to many set the ndots option to 2 in my deployment definition. This blog post explains it: Kubernetes pods /etc/resolv.conf ndots:5 option and why it may negatively affect your application performances