共计 30 篇文章
2025
presence_penalty, frequency_penalty以及repetition_penalty
[DeepSeek-V3-技术报告阅读] Complementary Sequence-Wise Auxiliary Loss
nvidia gpu结构简介和cuda编程入门
pytorch各种乘法,mm, matmul, dot, @, *, mul, multiply
2024
激活函数和GLU
优化器:从SGD到Adam到AdamW
似然(likelihood)和NLLLoss
deepspeed训练模型提示:cpu_adam.so: cannot open shared object file: No such file or directory
线性代数知识回顾
pytorch分布式训练注意事项/踩坑总结 - 持续更新