Merging fine-tuned models

Contents

Merging fine-tuned models#

After fine-tuning a model, it must be merged with the base model to make a new model others can simply download and try.

base_model = "google/gemma-2b-it"
new_model = "gemma-2b-it-bia-proof-of-concept2"
user_org = "haesleinhuepf"

For the merging step, we reload the model and the base model.

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
import torch
from trl import setup_chat_format
# Reload tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(base_model)

base_model_reload = AutoModelForCausalLM.from_pretrained(
        base_model,
        return_dict=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True,
)

base_model_reload, tokenizer = setup_chat_format(base_model_reload, tokenizer)

# Merge adapter with base model
model = PeftModel.from_pretrained(base_model_reload, new_model + "_temp")

merged_model = model.merge_and_unload()

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.

Test model#

After merging, we can test the model.

messages = [{"role": "user", "content": """
Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
"""}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipe = pipeline(
    "text-generation",
    model=merged_model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

C:\Users\haase\miniconda3\envs\genai-gpu\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:482: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

<|im_start|>user

Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
<|im_end|>
<|im_start|>assistant
The code imports the module imread_label_image from the skimage.io module.
It then loads the image file called "blobs.tif" from the same directory as the script.
The image is loaded as an array of unsigned 32-bit integers.
The code then segments the nuclei in the image using the pyclesperanto_prototype library. 
The segmented image is then displayed.

```python

from skimage.io import imread_label_image

image = imread_label_image("../../11a_prompt_engineering/data/

merged_model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

('gemma-2b-it-bia-proof-of-concept2\\tokenizer_config.json',
 'gemma-2b-it-bia-proof-of-concept2\\special_tokens_map.json',
 'gemma-2b-it-bia-proof-of-concept2\\tokenizer.json')

merged_model

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256002, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): GemmaRMSNorm()
  )
  (lm_head): Linear(in_features=2048, out_features=256002, bias=False)
)

Uploading the model to Huggingface hub#

Next, we upload the model to the Huggingface hub. Afterwards, we can use it.

merged_model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

CommitInfo(commit_url='https://huggingface.co/haesleinhuepf/gemma-2b-it-bia-proof-of-concept2/commit/2eb0497b02cbb0fdc7fa31d974a61a1b43540b17', commit_message='Upload tokenizer', commit_description='', oid='2eb0497b02cbb0fdc7fa31d974a61a1b43540b17', pr_url=None, pr_revision=None, pr_num=None)