https://cerebras.ai/blog/introducing-cerebras-inference-ai-at-instant-speed> Cerebras inference delivers 450 tokens per second for Llama3.1 70B, which is 20x faster than NVIDIA GPU-based hyperscale clouds.
Mastodon is the best way to keep up with what's happening.
Follow anyone across the fediverse and see it all in chronological order. No algorithms, ads, or clickbait in sight.