It’s basically just predicting what word should come next, based on many many many examples, but in very few of these examples would a conversation be across multiple languages
Sure it’s drawing from all of its training at all times, but that training would inherently be separated
The general explanation at least afaik is that preprompts work because it can predict what instructions would normally prompt people to respond with but there would be few or no examples to draw on of a message being sent in one language and acted on in another