for ML engineers: why can't you simply exclude the word "fuck"?

So, I’ve heard that ML manipulates tokens and specifically for the English corpora they take place of words. If we want model to be polite and not to speak uncomfortable language we can remove certain words from the internal array where all tokens and their associative data are stored, for example “fuck”.

Image

Image alternative text

TheInsane42, 9 months ago

Why? There are so many snowflakes around, that you’d get texts like this:

**’ ***** **** ***** ****!

I’d say get a life and don’t be offended so much. On the other hand, the human plague would be solved fast when nobody would fuck anymore. You’re on to something. ;)

reply

report

activity

copy /kbin url

copy original url

Loading...

swordsmanluke, 9 months ago

As others have mentioned, it’s not quite that simple.

For starters, you can absolutely remove the word “fuck” from all the training data. Now it’s literally impossible for the AI to “know” the word. But what do you do with the training data? Do you replace “fuck” with a different token? “****” perhaps? Or do you just drop the data entirely?

Giving “offense” is much more complex than just a single word. See, if we just replace the token, the AI may still decide that “Go **** yourself” is a perfectly valid response to a query. On the other hand, if you drop all instances of "fuck"from the data, your AI will just learn offensive euphemisms instead: “You can shove your request where the sun don’t shine”

Worse, there are plenty of sexual / offensive phrases that are built up from perfectly innocuous tokens. “Prone bone”, for instance.

The goal with these (and really almost all) AI models is for them to be “helpful, honest, and harmless”. Simply alerting or replacing a single token (or even combination of tokens) doesn’t really help, because the AI is modeling concepts, not just individual words.

All of this to say that the problem being solved is not to stop an AI from saying “fuck” - it’s to build an AI that doesn’t want to.

reply

report

activity

copy /kbin url

copy original url

Loading...

NeoNachtwaechter, 9 months ago

Bending language has never fixed anything. It just feeds hypocrites.

reply

report

activity

copy /kbin url

copy original url

Loading...

LazaroFilm, 9 months ago

That won’t fucking work you dumb fuck. — not an ai.

reply

report

activity

copy /kbin url

copy original url

Loading...

TheHobbyist, 9 months ago

You can.

With OpenAI for instance, you can modify the probability of a token to be output by setting its logit bias, as described here: platform.openai.com/docs/api-reference/…/create#l…

By setting it to -100 or +100 you can effectively ban or force it.

reply

report

activity

copy /kbin url

copy original url

Loading...

theKalash, 9 months ago

Just run the output through a simple string replacement function before returning it to the user. No need to mess with the model itself.

reply

report

activity

copy /kbin url

copy original url

Loading...

aard, 9 months ago

A well proven clbuttic solution.

reply

report

activity

copy /kbin url

copy original url

Loading...

azurefirefly, 9 months ago

Fuck

reply

report

activity

copy /kbin url

copy original url

Loading...

toofpic, 9 months ago

Goooood AI!

reply

report

activity

copy /kbin url

copy original url

Loading...

Crackhappy, 9 months ago

What the fuck

reply

report

activity

copy /kbin url

copy original url

Loading...

BURN, 9 months ago

ML/Generative AI don’t “store” an internal array of specifics. Instead it’s a statistical model based on the next (or in ChatGPT’s case, 3rd most likely) word in a sentence.

To avoid swearing or other really anything it needs to be excluded at a training level, before the algorithm is trained.

As it stands, we have very little to no visibility into why these models work. Even the researchers are trying to open the black box, but there’s so much that it’s nearly impossible to isolate a node that would or would not contain the work fuck

reply

report

activity

copy /kbin url

copy original url

Loading...

xerox, 9 months ago

(or in ChatGPT’s case, 3rd most likely)

Why 3rd?

reply

report

activity

copy /kbin url

copy original url

Loading...

BURN, 9 months ago

I believe that the 3rd or nth, word is because it sounds more human. The statistically first correct word ends up sounding very robotic and forced, where the 3rd is still very likely correct, but leads to variation in responses

This is all from what I remember reading a mini-paper about it, so I could be wrong

reply

report

activity

copy /kbin url

copy original url

Loading...

BetaDoggo_, 9 months ago

Chatgpt’s sampling parameters are unknown, and it definitely doesn’t choose the 3rd most likely. More complicated sampling methods are probably used, such as temperature, top p and top k.

reply

report

activity

copy /kbin url

copy original url

Loading...

BURN, 9 months ago

Correct, but also way over the level of the average reader

I probably should have used a different example other than ChatGPT tbh

reply

report

activity

copy /kbin url

copy original url

Loading...

wispydust, 9 months ago

That’s alright. You did good simplifying an unrelated idea for the sake of explaining another concept.

reply

report

activity

copy /kbin url

copy original url

Loading...

Add comment