nandeEbisu,

I assume you are referring to transformers, which came out in the literature around 2017. Attention on its own is significantly older, but wasn’t really used in a context that came close to being used as a large language model until the early / mid 2010s.

While attention is fairly simplistic, a trait which helps it parallelize well and scale well, there is a lot of research that came about recently around how the text is presented to the model, and the size of the models. There is also a lot of sophistication around instruction tuning and alignment as well which is how you get from simple text continuation to something that can answer questions. I don’t think you could make something like chatGPT using just the 2017 “Attention is All You Need” paper.

I suspect that publicly released models lags whatever google or OpenAI has figured out by 6 months to a year, especially because there is now a lot of shareholder pressure around releasing LLM based products. Advancement that are developed in the open source community, like apply LoRA and quantization in various contexts, has a significantly shorter time between development and release.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • uselessserver093
  • Food
  • aaaaaaacccccccce
  • [email protected]
  • test
  • CafeMeta
  • testmag
  • MUD
  • RhythmGameZone
  • RSS
  • dabs
  • Socialism
  • KbinCafe
  • TheResearchGuardian
  • oklahoma
  • feritale
  • SuperSentai
  • KamenRider
  • All magazines