Elastic Search, search all public Toots and cluster setup


#1

Just got Elastic Search set up and it works very well searching your own Toots which going off the Github threads is by design for “privacy”.

I was wondering if it was possible to allow it to be used to search the entire instance?
Obviously, Humblr is using Mastodon a bit differently which is why we are okay with searching other users. In fact, the reason why we want to search on our instance is so that users can find users they are not already connected with and see the search function as a great way for that.

In terms of setting it up on a Mastodon with multiple servers. You only need one instance of ES if I am not mistaken?
What does ES work with? Presuming just Redis.

Thank you


#2

Didn’t know about elastic search, what’s the purpose?


#3

Allows users to search their own timeline, toots they are mentioned in etc


#4

Elasticsearch is its own database; it doesn’t use Redis for anything. Mastodon uses Redis for background tasks. Different tools for different jobs.

As to instance-wide full text search, that’s never going to be implemented (it’s abused on commercial social media to find targets for harassment; we don’t want that, here), so you’d have to write and maintain your own patch to add it on your own instance.

There are other ways to discover users on your instance, and a pretty big one will be releasing soon, so I’d recommend keeping an eye open for that.


#5

Thanks for the info.

How does it build the database, what does it connect with to build its own database?

Hopefully it is not too hard to patch our end. Would help a lot of our users out.
I am aware of the new feature, still think full instance search will be required.


#6

The chewy:deploy rake task imports all toots into ElasticSearch, once ES_ENABLED=true, new toots are continuously indexed. We use searchable_by (array of account IDs) to query toots the user is allowed to find from ElasticSearch. There is an additional step in Ruby where each toot is again checked if the user is allowed to see it. To modify this behaviour, you would edit app/services/search_service.rb to remove the filter from the ElasticSearch query. Because of the additional Ruby check, this should not expose anyone’s private or direct toots to other people, but it does complicate things and I would recommend against it (e.g. whole page of results might be filtered out). Alternatively, more modifications to both that file and app/chewy/statuses_index.rb to change how and which toots are indexed (exclude all private and direct ones, skip searchable_by etc), drop the previous index and re-run chewy:deploy etc…


#7

Thanks for the info. Will allow us to start in the right place.