Akasazh,
@Akasazh@feddit.nl avatar

Wow, the face paint one looks like domestic violence…

fiat_lux, (edited )

it utilizes the power of attention mechanisms to weigh the relevance of input data

By applying a technique called supervised fine-tuning across modalities, Meta was able to significantly boost CM3leon's performance at image captioning, visual QA, and text-based editing. Despite being trained on just 3 billion text tokens, CM3leon matches or exceeds the results of other models trained on up to 100 billion tokens.

That's a very fancy way to say they deliberately focussed it on a small set of information they chose, and that they also heavily configure the implementation. Isn't it?

This sounds like hard-wiring in bias by accident, and I look forward to seeing comparisons with other models on that...

From the Meta paper:

"The ethical implications of image data sourcing in the domain of text-to-image generation have been a topic of considerable debate. In this study, we use only licensed images from Shutterstock.
As a result, we can avoid concerns related to images ownership and attribution, without sacrificing performance."

Oh no. That... that was the only ethical concern they considered? They didn't even do a language accuracy comparison? Data ethics got a whole 3 sentences?

For all the self-praise in the paper about state of the art accuracy on low input, and insisting I pronounce "CM3Leon" as Chameleon (no), it would have been interesting to see how well it describes people, not streetlights and pretzels. And how it evaluates text/images generated from outside the cultural context of its pre-defined dataset.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • [email protected]
  • uselessserver093
  • Food
  • aaaaaaacccccccce
  • test
  • CafeMeta
  • testmag
  • MUD
  • RhythmGameZone
  • RSS
  • dabs
  • oklahoma
  • feritale
  • KamenRider
  • Ask_kbincafe
  • TheResearchGuardian
  • KbinCafe
  • Socialism
  • SuperSentai
  • All magazines