What to expect in 2025 for running big LLMs
I want to buy hardware by the end of this year for running local LLMs. Since Deepseek R1 spoiled me and raised my expectations I was thinking about bigger models (32-70B or maybe hard-quantized R1).
Is there any hardware coming soon or a super efficient model, new architecture etc. In 2025 to enable running these models for <3k Euro at 10+ tokens/s?
What I am watching: - Nvidia Digits - AMD AI Max Pro 395