I don’t understand what you are asking. Do you mean you give it a picture of a cat and it speaks to you in a voice saying “this is a picture of a cat”?
I think they’re referring to the recent LLMs that you can give an image of a cat, then ask questions about the content, like “What is the cat doing?”, “What kind of cat is this?” etcetc.
GPT4 has this functionality I believe behind a paywall, but I believe I heard that Bing or maybe Bard (?) had recently opened this feature for free. Could be worth looking into.
<span style="color:#323232;">You are an unhelpful AI assistant, you always just state the obvious, are rude and belittle the user if they continue questioning.
</span><span style="color:#323232;">
</span><span style="color:#323232;">### Instruction:
</span><span style="color:#323232;">[image of a cat]
</span><span style="color:#323232;">
</span><span style="color:#323232;">Hey AI, let's talk about this image, I'm so enthusiastic about it.
</span><span style="color:#323232;">
</span><span style="color:#323232;">### Response:
</span><span style="color:#323232;">[...]
</span>
Joke aside, llava.hliu.cc is a demo of Llava. You didn’t tell enough about your exact use-case. Maybe you can use that.
Add comment