Benjamin Marie – Medium

Pinned

Benjamin Marie
in
Towards Data Science

Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer

Cheap supervised fine-tuning with an impressive LLM

9 min readOct 26, 2023

--

4

Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer

--

4

Pinned

Benjamin Marie
in
Towards Data Science

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

Finding the right trade-off between memory usage and inference speed

8 min readJan 11, 2024

--

3

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

--

3

Benjamin Marie

Fine-tune Tiny Chat Models with Apple OpenELM and ORPO

Can we make a good chat model with a 270M LLM?

9 min read1 day ago

--

Fine-tune Tiny Chat Models with Apple OpenELM and ORPO

--

Benjamin Marie
in
Towards Data Science

Turn Llama 3 into an Embedding Model with LLM2Vec

RAG with Llama 3 for the generation and the retrieval

7 min readMay 3, 2024

--

3

Turn Llama 3 into an Embedding Model with LLM2Vec

--

3

Benjamin Marie
in
Towards Data Science

Jamba: The New Hybrid Transformer/Mamba

Faster and better than the transformer but more difficult to train

8 min readApr 30, 2024

--

Jamba: The New Hybrid Transformer/Mamba

--

Benjamin Marie

Estimate the Memory Consumption of LLMs for Inference and Fine-tuning

A close look at the memory consumption of Command-R+, Mixtral-8x22B, and Llama 3 70B

9 min readApr 27, 2024

--

1

Estimate the Memory Consumption of LLMs for Inference and Fine-tuning

--

1

Benjamin Marie

Megalodon: Yet Another Method for Transformers with Unlimited Context

A complex but interesting method

2 min readApr 23, 2024

--

Megalodon: Yet Another Method for Transformers with Unlimited Context

--

Benjamin Marie

Training, Loading, and Merging QDoRA, QLoRA, and LoftQ Adapters

And How to Quantize LLMs After a Merge

7 min readApr 20, 2024

--

1

Training, Loading, and Merging QDoRA, QLoRA, and LoftQ Adapters

--

1

Benjamin Marie
in
Towards Data Science

Neural Speed: Fast Inference on CPU for 4-bit Large Language Models

Up to 40x faster than llama.cpp?

5 min readApr 18, 2024

--

4

Neural Speed: Fast Inference on CPU for 4-bit Large Language Models

--

4

Benjamin Marie

Turn LLMs into Text Embeddings for RAG Systems with LLM2Vec

A very simple method yielding accurate semantic representations

2 min readApr 17, 2024

--

Turn LLMs into Text Embeddings for RAG Systems with LLM2Vec

--

Benjamin Marie

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/

Following

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams