A simple library for modular model merging 🚀

Model merging has recently gained lots of attention in the Deep Learning field, allowing data scientist to combine strengths of multiple fine-tuned models without the burden of retraining. This technique addresses the rising costs of massive model finetuning and offers an efficient alternative to running expensive ensemble methods.

The concept is straightforward: since training may not always reach optimal performance, merging fine-tuned models aims to approach that sweet spot. This method pushes performance boundaries without the additional runtime of traditional model ensembles 🎯.

There’s no one-size-fits-all when it comes to model merging—different methods shine in different scenarios. That’s why a variety of merging libraries exist. In this blog post I introduce you to Mergecraft, a newly developed library with a sharp focus on both research and flexibility 🔧.

Built by researchers, for researchers, Mergecraft makes it easy to explore, experiment with, and even deploy novel merging techniques. Whether you want to try the latest state-of-the-art methods or design your own, Mergecraft has your back 🙌. Ready to dive in and see how it works?

Quick Start 🚀

At the heart of Mergecraft is a simple but powerful idea: it treats model parameters as if they were dictionaries of tensors. To make this work seamlessly, we have the StateDict class, which extends Python’s dictionary with extra goodies like arithmetic operations and utilities to make merging easier.

Oh, and here’s something cool: Mergecraft plays perfectly with Hugging Face🤗 models! Let’s see that in action:

import mergecraft
from mergecraft import StateDict

gpt2 = StateDict.from_hf('openai-community/gpt2')
recipe = StateDict.from_hf('mrm8488/gpt2-finetuned-recipes-cooking')
print('Is StateDict just another dictionary?', isinstance(gpt2, dict) )
Is StateDict just another dictionary?  True

In this snippet, we’ve loaded two models straight from Hugging Face: the base GPT-2 model and a fine-tuned version focused on recipes. Now that we have these models, the big question is—what cool things can we do with them? 🔥

Thanks to the magic of the StateDict abstraction, we can now implement one of the simplest model merging techniques: Isotropic Merging [1] — which basically means averaging the model parameters. It’s as easy as:

iso = (gpt2 + recipe) / 2

Boom 💥—you’ve just merged two models! Now, let’s take it a step further and turn this merged model back into a Hugging Face pipeline:

iso_pipe = iso.to_model('openai-community/gpt2')
iso_pipe('In order to make a great carbonara you\\'ll need to ')

How easy was that?! You’ve just combined GPT-2 with a recipe-tuned version and transformed it into a text generator. What will it suggest next for your carbonara? 🍝✨

Using Ready-Made Techniques 🌟

You don’t always have to merge models from scratch. With Mergecraft, you can easily tap into a range of pre-built merging techniques with just a single command. For example, if you’re working with several fine-tuned BERT models and want to combine them to boost performance on a specific task, you might want to use the DARE (Drop And REscale)[2] technique. Here’s how you can do it:

models = [
    'google-bert/bert-base-uncased',
    'textattack/bert-base-uncased-RTE',
    'yoshitomo-matsubara/bert-base-uncased-rte',
    'Ruizhou/bert-base-uncased-finetuned-rte',
    'howey/bert-base-uncased-rte',
    'anirudh21/bert-base-uncased-finetuned-rte']
# List of layers where DARE should be skipped
# because these layers are randomly initialized during finetuning
rnd_layers = ['classifier.weight', 'classifier.bias']
merged = mergecraft.dare(models, passthrough=rnd_layers, base_index=0)

Just pass your list of models to the mergecraft.dare function, and you’re good to go! You can also specify which layers should bypass the DARE merging for a more traditional isotropic merge. This is handy because DARE can’t be applied to layers that are initialized randomly during fine-tuning.