Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks. Nevertheless, achieving optimal performance often requires careful tuning. One crucial aspect is data selection. LLMs are trained on massive datasets, and the relevance of this data directly affects model output. Furthermore, hyperparameter