王川 Michael Wang

Multi-Query-Attention

"时间是金"

背景 ChatGLM2中提到了训练和推理的加速，主要为X-transformer中的Flash Attention技术和Multi-Query-Attention技术。本文主要记录Multi-Query-Attention的原理。 MQA 在标准Transformer中使用的k 个注意力头，也就意味着Q、K、V三个矩阵的shape都是[k, h]，其中h的head数量。在Pa...

Posted by 王川 on June 27, 2023

【LLM】LLAMA-sft记录

"时间是金"

一、背景目标 LLaMA是开源社区的主要研究模型，中文社区中LLaMA因为预训练阶段仅包含700个中文token导致中文预训练的匮乏，因此通常的做法是在LLaMA原生模型的基础上铜鼓哦扩充中文词表的方式做继续预训练。基于中文继续预训练后得到的模型，可以参与后续SFT/RM/RL stage的训练。目前中文继续预训练包括：崔一鸣的Chinese-LLaMA和封神榜的Ziy...

Posted by 王川 on June 6, 2023

【LLM】LLAMA-lora记录

"时间是金"

#

Posted by 王川 on May 30, 2023

【LLM】chatGLM-SFT-LAMP记录

"时间是金"

huggingface bin 2 lamp checkpoint 1 chatGLM/chatglm-hope/hug2lamp-chatglm-6b #

Posted by 王川 on May 25, 2023

git lfs下载huggingface文件

"时间是金"

背景公司的网络一般都会隔绝外网，因此如果需要连接外网下载huggingface的文件，需要首先添加代理，然后使用git 下载，这里huggingface的datasets和models都很大，如何优雅地下载就是本文解决的内容。 SOP /usr/bin/sudo yum install git-lfs # 安装git-lfs cd BAAI/glm...

Posted by 王川 on May 23, 2023

Chinese Llama 原理

"时间是金"

Llama llama是meta发布的预训练模型，但是巨大的预训练预料中仅有700多个中文token，因此没有中文语料的预训练无法再中文领域应用。 Chinese Llama 中文llama的常见做法是对llama模型embedding层做中文此表的扩充，做相应的中文语料的继续预训练。如何新增中文语料 1 2 3 4 5 6 7 8 9 10 11...

Posted by 王川 on May 18, 2023

LLM多轮能力评测

"时间是金"

Leadership 多轮交互分数表平均分 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 ...

Posted by 王川 on May 17, 2023

chatGLM P-Tuning V2 SFT记录

"时间是金"

BUGS wandb BUGS：PermissionError: [Errno 13] Permission denied: ‘/var/run/secrets/kubernetes.io/serviceaccount/token’ 解决：ptuning/main.py中添加 os.environ[“WANDB_DISABLED”] = “tr...

Posted by 王川 on May 15, 2023

基于BLOOM的SFT

"时间是金"

BLOOM训练环境 Megatron-deepspeed torch transformers BLOOM关键参考内容借鉴代码仓库：https://github.com/bigscience-workshop/xmtf 训练脚本：https://github.com/bigscience-workshop/bigscience/tree/master/tra...

Posted by 王川 on May 15, 2023

【LLM】chatGLM-SFT记录

"时间是金"

安装记录安装torch 1 2 3 4 5 6 7 8 9 10 11 # step1 proxy # 美团跳板机需要首先连接外网 export http_proxy=http://10.22.139.49:6666 export https_proxy=http://10.22.139.49:6666 # step2 install # torch 1.13.1 cuda11.7 ...

Posted by 王川 on May 11, 2023

Chuan Wang

Multi-Query-Attention

"时间是金"

【LLM】LLAMA-sft记录

"时间是金"

【LLM】LLAMA-lora记录

"时间是金"

【LLM】chatGLM-SFT-LAMP记录

"时间是金"

git lfs下载huggingface文件

"时间是金"

Chinese Llama 原理

"时间是金"

LLM多轮能力评测

"时间是金"

chatGLM P-Tuning V2 SFT记录

"时间是金"

基于BLOOM的SFT

"时间是金"

【LLM】chatGLM-SFT记录

"时间是金"

FEATURED TAGS

ABOUT ME