sapient_cogbag,
@sapient_cogbag@infosec.pub avatar
Common Interest Algorithm

The weighting system indicates how much interest (or avoidance) an instance has for a topic as specified by the subject tree. The value of weight for each subject tree should be a value from -1 -> 1 (inclusive), and applies to the deep-most component of the tree. We’ll call this the sentiment of the instance towards that specific level of the tree.

The common interest algorithm specifies a rough way to estimate how “aligned” in sentiment a given pair of entities are using an incomplete collection of nested topic paths ^.^ and then using heuristics to fill in the “gaps” needed for direct comparison. It takes the partially specified trees - along with estimated polarisabilities - from federated instances, combines them together, then uses that to “complete” the sentiment weights specified by users and instances so they can be directly compared to determine the common interests of each to contribute to directing users to instances correct for them.

The default option should be that users are assumed to want “general sentiment/general topic/root topic” instances (i.e. with path /), and then they can specify much more refined interests using various methods, like taking search terms and using the collected known topics for them in various languages to construct a user-friendly search function based off the common interest algorithm heuristic, or allowing direct specification of interests, for more advanced users ^.^.

The full (but slightly incomplete) details of my approximate proposed Common Interest Algorithm are in this gitlab snippet, written in poorly-organised Rust code.

Tagging the Willingness for New Users

Different instances have a different level of desire (and gatekeeping) for new users.

Some don’t allow any new users at all. Others require filling out a form and waiting for approval. Many require an email or captcha, and some don’t require anything whatsoever.

Some don’t want any new users, some do accept new users but only can handle a small number, and others are free-for-all open registration.

Many users will want the ability to create communities without needing to seek approval. For defaults on the “maximum” level of “inconvenience” an instance presenting other instances should show to the user, it makes sense for an instance to use it’s own level of “inconvenience”.

nodeinfo2 (also see here for all keys) already exists to provide some basic information, but it’s not enough for this feature ;p

As such, I suggest we instead construct a property on the main server actor, for now called instance_onboarding_meta. This is an object of the form:

<pre style="background-color:#ffffff;">
<span style="color:#323232;">{
</span><span style="color:#323232;">    </span><span style="font-weight:bold;color:#183691;">"accepting_new_users"</span><span style="color:#323232;">: </span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">bool</span><span style="color:#323232;">, </span><span style="font-style:italic;color:#969896;">// if this is false, no other references need be present
</span><span style="color:#323232;">    </span><span style="font-weight:bold;color:#183691;">"capacity_used"</span><span style="color:#323232;">: </span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">float</span><span style="color:#323232;"> </span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">(>=</span><span style="color:#323232;"> </span><span style="color:#0086b3;">0</span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">)</span><span style="color:#323232;">, </span><span style="font-style:italic;color:#969896;">// Must be present, represents one-minus the remaining amount of users it can take as a fraction of total estimated capacity. Alternatively, represents an approximate fraction of resource usage. If it's >1, this implies the server is over-capacity.
</span><span style="color:#323232;">     </span><span style="font-weight:bold;color:#183691;">"preferred_max_users"</span><span style="color:#323232;">: </span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">integer</span><span style="color:#323232;"> </span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">(>=</span><span style="color:#323232;"> </span><span style="color:#0086b3;">0</span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">)</span><span style="color:#323232;">, </span><span style="font-style:italic;color:#969896;">// If present, represents the approximate maximum number of users this instance wants to host. If unset, assume unlimited but perform estimates based on the fraction. 
</span><span style="color:#323232;">    </span><span style="font-weight:bold;color:#183691;">"signup_requirements"</span><span style="color:#323232;">: {
</span><span style="color:#323232;">          </span><span style="font-weight:bold;color:#183691;">"captcha"</span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">,</span><span style="color:#323232;">
</span><span style="color:#323232;">          </span><span style="font-weight:bold;color:#183691;">"email"</span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">,</span><span style="color:#323232;">
</span><span style="color:#323232;">          </span><span style="font-weight:bold;color:#183691;">"approval"</span><span style="background-color:#f5f5f5;font-weight:bold;color:#b52a1d;">,</span><span style="color:#323232;">
</span><span style="color:#323232;">     }, </span><span style="font-style:italic;color:#969896;">// Must be present, a list of the signup requirements. May need more options as new authentication and validation mechanisms are added to the various Fedi servers ^.^
</span><span style="color:#323232;">     </span><span style="font-weight:bold;color:#183691;">"signup_uri"</span><span style="color:#323232;">: "https://example.com/signup/finalized" </span><span style="font-style:italic;color:#969896;">// "final" signup page, rather than one providing alternate instance suggestions. Should take e.g. a `?username=<new username>` parameter.
</span><span style="color:#323232;">}
</span>

Instance Signup Redirection Algorithm

Now that a system has been proposed for giving instances to describe how much effort it takes to sign up, how much they can really take new users, and what kind of community they’re interested in, we can use this data to construct a method to split signup across the fediverse.

We’ll describe things in terms of what happens either as the list of instance values is changed while they are polled, or finally what happens when a user actually looks for an instance ^.^. Though, a lot of the ideas are also mentioned in the Common Interest Algorithm Snippet, which also at least partially discusses some other things.

Step 1 - Candidate Instance Collation

The first step is to collate information about potential candidate instances, by making requests to the endpoints described above to instances the current instance is federated with - including itself! (it might be useful to combine all the metadata into one endpoint as well, but that’s all bikeshedding):

  • instance_software - the software of each instance
  • instance_focus - the list of weighted subject-trees that indicate what the community is oriented around - see the algorithm snippet for efficiently merging in information from instances without having to recalculate the full weights every time, via use of BTrees/BTreeMap.
  • instance_onboarding_meta - Information about how the instance accepts new users, and it’s resources to do so.

Instances shouldn’t poll this very frequently - certainly not on every attempted user signup! - and instead should cache it and poll periodically (say, every hour or so ^.^). This avoids slamming large portions of the network.

Step 2 - Software Filtering

The next step is filtering out candidate instances running different fediverse software than ourselves.

Step 3 - User Acceptance Filtering & Weighting

Our instance should then filter out instances that aren’t accepting users, and perform the following steps to assign weights to instances (may be configurable if the user is ok with accepting more effort than our instance requires - as most users are likely to use the default settings it should be cached too):

  • For each instance, if it requires more things to sign up (email when we don’t need it, etc.), then remove it from the list.

    For captcha, mark that instance with a “0.5” weight multiplier rather than eliminating it, if we don’t also require captcha.

    From a user-configurability perspective, each possible requirement to signing up can either:

    • Eliminate from the list (a user doesn’t want to deal with forms) - this is the default for things required by another instance that aren’t required by ours, except captcha
    • Reduce it’s chance of selection (as in captcha) - this is the default for instances if the respective instance has captcha but the current instance doesn’t.
    • Have no effect - this is the default if we also have a requirement.
  • For each instance, if it has a preferred max user count, then calculate the current approximate user count by multiplying it by the resource usage capacity.

    Then, calculate the approximate available user slots by subtracting the approximate user count from the preferred maximum. Note that this value may be negative in the case of an overloaded server.

  • Find the instance with the largest preferred max user count (if none exists, then use the current server’s user count instead, though remember that if your server does have such a preferred max count, it should be in the list). If any server has an estimated total user slots consumed greater than the maximum preferred user count, use this instead.

    Then, assume that the preferred maximum for servers with no specified maximum is approximately 2x that value. Calculate the approximate available user slots of instances without an existing preferred maximum, using this estimate in combination with the resource consumption fractions.

  • For any instance with available user slots <0 - that is, overloaded servers - divide those (negative) available user slots by some value such as 4.

    If any instance has a negative number of available user slots, add the most-negative number back on to every instance’s count of available user slots, so that the smallest value is zero.

    The division by 4 (or some other number) means that all overloaded servers are avoided more than they would be if we just added the most-negative value back directly.

  • Assign weights to each instance depending on their proportion of available user slots compared to the total. If the instance has already been tagged by a weight (from e.g. having captcha), then multiply by that weight.

https://infosec.pub/comment/696577

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • uselessserver093
  • Food
  • aaaaaaacccccccce
  • [email protected]
  • test
  • CafeMeta
  • testmag
  • MUD
  • RhythmGameZone
  • RSS
  • dabs
  • Socialism
  • KbinCafe
  • TheResearchGuardian
  • Ask_kbincafe
  • oklahoma
  • feritale
  • SuperSentai
  • KamenRider
  • All magazines