If anyone in LocalLLaMA still trains models, I made a collection of interesting and nice datasets: https://github.com/Green0-0/llm_datasets/tree/main