supert, 8 months ago I can run 4bit quantised llama 70B on a pair of 3090s. Or rent gpu server time. It’s expensive but not prohibitive.
I can run 4bit quantised llama 70B on a pair of 3090s. Or rent gpu server time. It’s expensive but not prohibitive.