The V4 is here

#11
by TheYuriLover - opened

Hey!

https://huggingface.co/datasets/gozfarb/ShareGPT_Vicuna_unfiltered

@gozfarb decided to make a V4 version of the unfiltered dataset, I think it's the one that removed almost all the woke in it, you should make a new finetune with the 1.1 version in it! :D

deleted

Yep, it's got all the stuff we found in discussion 7 on the main dataset repo.

Hit me up if there's anything else you want trimmed or if there are any problems with the dataset otherwise.

Thanks! I've already updated to 1.1 and queued up the jobs to train with the V4 from @gozfarb. It will probably take a few days though before the results come, the GPU cluster is quite busy at the moment. But in the meantime if there are further updates to the dataset or an official version from @anon8231489123 , I can still swap the dataset before the jobs start. Personally I think everyone already did great job preparing the V4, and I couldn't really find anything to add myself.

I know I'm asking a lot but there's this SuperCOT Lora that is giving the model some really high quality outputs, it's like even better than the best finetunes.
If there was a finetune of this, I really believe we could increase the quality of the llama model from another level, and for the moment you seem to be the only dude that can do this task.
https://huggingface.co/kaiokendev/SuperCOT-LoRA

Tbh you could even mix those dataset with V4 vicuna and make like the ultimate finetune idk lmao

Could add it on the list :). Maybe if someone would be willing to prepare a dataset for it (or do it collaboratively). Yeah, one could combine some interesting things and we could see for fun what sort of finetune we can put together. Let's complete the surgery on vicuna first, who knows that could then act as a base for improvements.

That's a great idea actually, if we went to the conclusion that the V4_vicuna model is good enough then we can train on it with more stuff in the future, but all of these gotta be on the same instruction format though, or else the poor model will be confused... and us too lmao

deleted

I agree with getting base Vicuna right first. We shouldn't change too many variables at once.

As to cleaning up the datasets, I think @kaiokendev might have said he did some cleaning to the datasets he linked. If he did, hopefully he can share them or clarify that he didn't edit them.

Assuming that (or the raw ones being used), the datasets in question would be pretty easy to convert to Vicuna format. I could write a script to do that with the caveat that they would all be single question/answer conversations. I don't know what Vicuna would do with that or if it would affect output quality for longer conversations since I'm not super familiar with how much the conversation structure itself matters for weighting the finetune. Maybe it's just concerned with lowering weights on duplicate answers in the same conversation? Dunno.

The best advantage of @kaiokendev 's dataset is that it's totally unwoke, when I tried the SuperCOT Lora I had no instance of refusal or moralizing stuff, and that's a good thing! And yeah, if he did clean the original dataset to make a big one at the end, if he could share it to us that would be cool.

@reeducator Are there any open source projects where you can install client and grant the power of your machine to solve some problem, like training models? Maybe instead of doing training solo, you can try to get some additional computing power that way? Since I think it’s the only way in future to compete with big corporations with their hardware resources in order to train better models.

deleted

There are a few projects, most notable are probably Petals and the base project it uses Hivemind, however there doesn't seem to be much interested in them at the moment and no real pushes to try to adapt them to community projects for the moment. It wouldn't be too bad if KoboldAI rolled in something like Hivemind so the same people who dedicated GPU to hosted models could dedicate it to training.

I know there's hivemind at least https://github.com/learning-at-home/hivemind, but I don't know how well it works in practice. Given enough 24GB consumer GPUs though, achieving something with it might be pretty plausible though. I noticed that with certain configuration of training batch size vs gradient accumulation steps the used memory per GPU was around that much. We'd need a lot of people with lot of 24GB VRAM GPUs though, not sure where we'd get that many people who are willing to contribute.

Awesome uncensored 1T model when? You can probably train the first uncensored 1T model I hear it can run on around 800GB VRAM if it's 3-4Bit. It takes A LOT more to train it though that's the only down side.

When it rains H or A100s, then we can consider it \o/

This comment has been hidden

@reeducator I know that you said that you have no plans on making 7b version, but still, if it’s not too long to train, maybe it’s possible to add such version later, when all other versions will be done? I managed to use ggml vicuna 7b q4_0 on Samsung galaxy s23 Ultra. It is not very fast, but still usable. Having such model running in your pocket is pretty cool and might be useful in some cases (like when internet is not available).

@Kelheor yeah, I think the idea has been that once we have something that we're more or less happy with, we do the other model sizes 7B and 30B (if possible). But it's true that it probably doesn't take too long to train compared to 13B.

How it works here is that whenever we train something, I have to queue up for the GPU time on the cluster - it's actually mostly that part that takes long time, and often longer than the training itself. Whenever there's a training slot, one has to decide whether there's a need to iterate on something that we've been working on to improve, or create something entirely new. So far we've figured that we can probably make most of it by iterating on the datasets and the 13B to create a more useful vicuna, which is then easy to compare with our previous results.

But because the 7B most likely wouldn't take too long to train, one might be able to chain it together with some other 13B model training and fit it within the same timeframe. Bluemoon 13B for example takes only a few hours to train, much less than the typical allotted GPU time. What I possibly could do is that next time we see fit to train the next bluemoon, I can try squeeze in the Vicuna 7B within that same slot. Let's see!

Sign up or log in to comment