Best LLM models for consumer-grade hardware

Question

In the ongoing discussion about the best LLM models that can be run on consumer-grade hardware, several contenders come up, including phi-4. However, experts agree that there is no definitive 'best' model because each comes with unique advantages and disadvantages. The **Locallama** community has been highlighted as a resource for running LLMs locally. Several recommended models are:

- **DeepSeek-R1-0528-Qwen3-8B**: Recently released, touted as the best reasoning model in an 8B size.
- **Qwen3**: Offers multiple sizes, including the **Qwen3-30B-A3B**, who performs admirably on CPUs and has adjustable reasoning capabilities for various tasks.

Users with 8GB of VRAM (like on an RTX 3070) report that Qwen3-30B-A3B operates well, though it's not the fastest. The *Mixture of Experts* (MoE) architecture of Qwen3 contributes significantly to its performance. For consumers, it is encouraged to experiment with different models to find the best fit for personal tasks. **Ollama** and **OpenWebUI** have been suggested as user-friendly platforms for trying out multiple models. The community seems to favor models that approach the 16GB memory mark for reasonable performance on chat-based applications.

Best LLM models for consumer-grade hardware

0 Answers