将 Gemma 2B 的上下文窗口扩展到 1000万的技术方法

by 小互
1年 ago

Gemma-10M 模型使用一种称为 Infini-Attention 的技术，将 Gemma 2B 的上下文窗口扩展到 10M。其主要方法是通过循环局部注意力和压缩记忆，实现长距离依赖关系的保留。

特性：

Support authors and subscribe to content

This is premium stuff. Subscribe to read the entire article.

Login if you have purchased

加入会员

加入会员查看更多会员内容和教程。
超过1000+的会员内容，每天更新。

开通会员

Categories: AI 论文, AI 项目

退出移动版