I created this project to learn LLM inference and address issues that llama.cpp doesn't. This project focuses on MoE-like(or new architecture) LLM inference, which can leverage both CPU and GPU. I will record all the materials I read in the docs folder so that people interested in learning LLM inference can refer to them. Making integration of new open weights easier is another important aspect of this project. There are many open weights, and more will emerge. Integrating new models should not be difficult. I believe proper codebase organization and predefined prompts can help people integrate new models. Since this project is about LLM inference, it's straightforward to leverage LLMs. If this project cannot benefit from LLMs, why use them in the first place?
Working on deepseekv2...