NeuralNovel
/

Confinus-2x7B

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NeuralNovel commited on Mar 5

Commit

cf7184f

•

1 Parent(s): 15d92e0

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -119,6 +119,12 @@ In the boundless sands ..
 A model to test how MoE will route without square expansion.
 ## "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
@@ -136,10 +142,6 @@ At every layer, for every token, a router network chooses two of these groups (t
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/up_I0R2TQGjqTShZp_1Sz.png)
-[Join our Discord!](https://discord.gg/rJXGjmxqzS)
-<a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='36' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
 Switch Layer
 MoE layer from the [Switch Transformers paper](https://arxiv.org/abs/2101.03961)

 A model to test how MoE will route without square expansion.
 ## "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
+[Join our Discord!](https://discord.gg/rJXGjmxqzS)
+<a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='36' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/up_I0R2TQGjqTShZp_1Sz.png)
 Switch Layer
 MoE layer from the [Switch Transformers paper](https://arxiv.org/abs/2101.03961)