IntermediateVocabulary#llama.cpp#GGUF#quantization#local LLM

llama.cpp Inference Exercises

llama.cpp enables efficient local inference of large language models on consumer hardware. These exercises cover the GGUF file format, quantization levels and tradeoffs, GPU layer offloading with -ngl, context size configuration, and platform-specific backends including Metal for Apple Silicon.

0 / 5 completed
1 / 5
What is the GGUF file format used by llama.cpp?