Merging fine-tuned models#
After fine-tuning a model, it must be merged with the base model to make a new model others can simply download and try.
base_model = "google/gemma-2b-it"
new_model = "gemma-2b-it-bia-proof-of-concept2"
user_org = "haesleinhuepf"
For the merging step, we reload the model and the base model.
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
import torch
from trl import setup_chat_format
# Reload tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(base_model)
base_model_reload = AutoModelForCausalLM.from_pretrained(
base_model,
return_dict=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
base_model_reload, tokenizer = setup_chat_format(base_model_reload, tokenizer)
# Merge adapter with base model
model = PeftModel.from_pretrained(base_model_reload, new_model + "_temp")
merged_model = model.merge_and_unload()
`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.
Test model#
After merging, we can test the model.
messages = [{"role": "user", "content": """
Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
"""}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipe = pipeline(
"text-generation",
model=merged_model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
C:\Users\haase\miniconda3\envs\genai-gpu\Lib\site-packages\transformers\models\gemma\modeling_gemma.py:482: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
<|im_start|>user
Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
<|im_end|>
<|im_start|>assistant
The code imports the module imread_label_image from the skimage.io module.
It then loads the image file called "blobs.tif" from the same directory as the script.
The image is loaded as an array of unsigned 32-bit integers.
The code then segments the nuclei in the image using the pyclesperanto_prototype library.
The segmented image is then displayed.
```python
from skimage.io import imread_label_image
image = imread_label_image("../../11a_prompt_engineering/data/
merged_model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)
('gemma-2b-it-bia-proof-of-concept2\\tokenizer_config.json',
'gemma-2b-it-bia-proof-of-concept2\\special_tokens_map.json',
'gemma-2b-it-bia-proof-of-concept2\\tokenizer.json')
merged_model
GemmaForCausalLM(
(model): GemmaModel(
(embed_tokens): Embedding(256002, 2048, padding_idx=0)
(layers): ModuleList(
(0-17): 18 x GemmaDecoderLayer(
(self_attn): GemmaSdpaAttention(
(q_proj): Linear(in_features=2048, out_features=2048, bias=False)
(k_proj): Linear(in_features=2048, out_features=256, bias=False)
(v_proj): Linear(in_features=2048, out_features=256, bias=False)
(o_proj): Linear(in_features=2048, out_features=2048, bias=False)
(rotary_emb): GemmaRotaryEmbedding()
)
(mlp): GemmaMLP(
(gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
(up_proj): Linear(in_features=2048, out_features=16384, bias=False)
(down_proj): Linear(in_features=16384, out_features=2048, bias=False)
(act_fn): PytorchGELUTanh()
)
(input_layernorm): GemmaRMSNorm()
(post_attention_layernorm): GemmaRMSNorm()
)
)
(norm): GemmaRMSNorm()
)
(lm_head): Linear(in_features=2048, out_features=256002, bias=False)
)
Uploading the model to Huggingface hub#
Next, we upload the model to the Huggingface hub. Afterwards, we can use it.
merged_model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)
CommitInfo(commit_url='https://huggingface.co/haesleinhuepf/gemma-2b-it-bia-proof-of-concept2/commit/2eb0497b02cbb0fdc7fa31d974a61a1b43540b17', commit_message='Upload tokenizer', commit_description='', oid='2eb0497b02cbb0fdc7fa31d974a61a1b43540b17', pr_url=None, pr_revision=None, pr_num=None)