PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLM·9 min read·Oct 26, 2023--4--4
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speed·8 min read·Jan 11, 2024--3--3
Benjamin MarieFine-tune Tiny Chat Models with Apple OpenELM and ORPOCan we make a good chat model with a 270M LLM?·9 min read·1 day ago----
Benjamin MarieinTowards Data ScienceTurn Llama 3 into an Embedding Model with LLM2VecRAG with Llama 3 for the generation and the retrieval·7 min read·May 3, 2024--3--3
Benjamin MarieinTowards Data ScienceJamba: The New Hybrid Transformer/MambaFaster and better than the transformer but more difficult to train·8 min read·Apr 30, 2024----
Benjamin MarieEstimate the Memory Consumption of LLMs for Inference and Fine-tuningA close look at the memory consumption of Command-R+, Mixtral-8x22B, and Llama 3 70B·9 min read·Apr 27, 2024--1--1
Benjamin MarieMegalodon: Yet Another Method for Transformers with Unlimited ContextA complex but interesting method·2 min read·Apr 23, 2024----
Benjamin MarieTraining, Loading, and Merging QDoRA, QLoRA, and LoftQ AdaptersAnd How to Quantize LLMs After a Merge·7 min read·Apr 20, 2024--1--1
Benjamin MarieinTowards Data ScienceNeural Speed: Fast Inference on CPU for 4-bit Large Language ModelsUp to 40x faster than llama.cpp?·5 min read·Apr 18, 2024--4--4
Benjamin MarieTurn LLMs into Text Embeddings for RAG Systems with LLM2VecA very simple method yielding accurate semantic representations·2 min read·Apr 17, 2024----