Llama 重写日志[未完…]

执行如下命令后报错

from llama import tokenizer, Llama, Dialog
checkpoint_dir = "/training-data/pakcages/llama/llama-2-7b-chat"
tokenizer_path = "/training-data/pakcages/llama/tokenizer.model"
temperature = 0.75
top_p = 0.9
max_seq_len = 128
max_gen_len = 64
max_batch_size = 4

generator = Llama.build(
    ckpt_dir=checkpoint_dir,
    tokenizer_path=tokenizer_path,
    max_seq_len=max_seq_len,
    max_batch_size=max_batch_size)

ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

是源码里面这一段引起的：

if not torch.distributed.is_initialized():
    torch.distributed.init_process_group("nccl")

启动不起来看样子是因为分布式的问题。我尝试绕开分布式，从它的build函数开始看：

checkpoints = sorted(Path(ckpt_dir).glob("*.pth"))
ckpt_path = checkpoints[get_model_parallel_rank()]
checkpoint = torch.load(ckpt_path, map_location="cpu")

检查给定的checkpoint_dir是否包含pth文件，7B的模型只有一个pth文件，所以一个进程就可以了，我想get_model_parallel_rank()大概意思即是有几个文件就启动多少个进程，代码来自facebook团队开发并行训练包fairscale:

def get_model_parallel_rank() -> int:
    """Return my rank for the model parallel group."""
    return torch.distributed.get_rank(group=get_model_parallel_group())

def get_model_parallel_group() -> torch.distributed.ProcessGroup:
    """Get the model parallel group the caller rank belongs to."""
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
    return _MODEL_PARALLEL_GROUP

咱不管这些，7B反正就一个参数文件，直接从文件夹加载：

from pathlib import Path
checkpoints = sorted(Path(checkpoint_dir).glob("*.pth"))
# Llama-2-7b model weights are distributed in a single file.
checkpoint = checkpoints[0]
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
checkpoint = torch.load(checkpoint, map_location=device)

7b模型直接加载进显卡占用了13671MB的显存：

huggingface转换

算了，还是先不重写了，先用huggingface转换吧，安装一下transformers，他的转换函数在src/transformers/models/llama/convert_llama_weights_to_hf.py 这里可以看源文件。

本身安装transformers的时候已经安装了这个模块，写个脚本：

from transformers.models.llama.convert_llama_weights_to_hf import main

if __name__ == "__main__":
    main()

直接这个脚本就行了，执行一下：

# python convert.py --help
usage: convert.py [-h] [--input_dir INPUT_DIR] [--model_size {7B,7Bf,13B,13Bf,30B,34B,65B,70B,70Bf,tokenizer_only}] [--output_dir OUTPUT_DIR] [--safe_serialization SAFE_SERIALIZATION]

options:
  -h, --help            show this help message and exit
  --input_dir INPUT_DIR
                        Location of LLaMA weights, which contains tokenizer.model and model folders
  --model_size {7B,7Bf,13B,13Bf,30B,34B,65B,70B,70Bf,tokenizer_only}
                        'f' models correspond to the finetuned versions, and are specific to the Llama2 official release. For more details on Llama2, checkout the original repo: https://huggingface.co/meta-llama
  --output_dir OUTPUT_DIR
                        Location to write HF model and tokenizer
  --safe_serialization SAFE_SERIALIZATION
                        Whether or not to save using `safetensors`.

–input_dir 写的是llama的根目录

–model_size 选择要转换的模型参数量，这个有个bug，你只能填提供的那几个名字，问题是llama的目录下对应的模型文件名是”llama-2-*b”这种，转换脚本会去”input_dir/*B”下面去找模型文件，所以需要给”llama-2-*b”改成”*B”后再执行脚本。

转换过程中并不需要GPU，完成之后用transformers加载就行了，7B的模型加载完后划分了26G现存

执行：

total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of parameters: {total_params}")
>>> Total number of parameters: 6607343616

模型的总参数是6,607,343,616

粗略计算一下，采用单精度(single-precision float-point format)存储这些参数的话总共要用6607343616 * 32 / 8 / 1024 / 1024 / 1024 = 24.6143G，如果采用半精度存储看下：

tokenizer = LlamaTokenizer.from_pretrained('llama_hf/7Bf', torch_dtype=torch.float16)
model = LlamaModel.from_pretrained('llama_hf/7Bf', torch_dtype=torch.float16)

放进显卡的话划分了13G现存

还想再降低显存占用就得使用quantization了，后面整理下。

LlamaForCausalLM可以用来生成回答，默认的LlamaModel没有这功能

Llama 重写日志[未完…]

huggingface转换

By tensorzen

发表回复取消回复

You Missed

Step by Step实现RAG

timeScale vs fixedDeltaTime

Difference between Gradient and Derivative

Fixed update with Physics.Simulate in Unity

huggingface转换

By tensorzen

Related Post

发表回复 取消回复

You Missed

发表回复取消回复