AI language models show promise in reducing shampoo stability testing
A study has shown that large language models (LLMs) can predict the phase stability of shampoo formulations with comparable accuracy to traditional machine learning methods. The findings suggest that artificial intelligence (AI) could help cosmetic formulators reduce the number of physical experiments needed to reach a stable product.
LLMs are a type of AI originally developed to understand and generate human language. Recent methods have adapted them to work with structured data, such as ingredient compositions, by converting numerical information into text.
These models are trained on large volumes of publicly available online content, which allows them to incorporate general domain knowledge when making predictions.
The study, published in Cosmetics and conducted by researchers at French predictive modeling company TinyPred, tested whether LLMs could classify shampoo formulations as either phase stable or unstable based on their ingredient compositions.
The researchers found that LLMs were able to make accurate predictions with fewer examples than traditional algorithms.
“The LLM-based approach requires approximately two times fewer training samples to achieve the same predictive strength as conventional machine learning,” the study outlines. This finding could allow formulators to accelerate product development and reduce the cost of high-throughput experimentation.

The findings also point to opportunities for brands to create proprietary LLMs trained on their internal formulation history. This could include successful products but also failed attempts, which contain valuable data that rarely enters the public domain.
Such a model could act as a formulation assistant, helping chemists test ideas virtually before committing to lab work.
Predictions beyond datasets
The study used a publicly available dataset of 812 shampoo formulations, of which 294 were labeled as phase stable. Each formulation contained a water base and four key ingredients: two surfactants, one polyelectrolyte, and one thickener. The aim was to develop a predictive model that could assess whether a new combination of ingredients would be stable.
Reducing the physical tests required to assess formulation stability could shorten R&D cycles.Three LLMs from Meta’s open-source Llama family were selected, each varying in size and complexity (3, 8, and 70 billion parameters). These were compared against three standard machine learning models: logistic regression, random forest, and gradient-boosted decision trees (LGBM).
These traditional models rely on mathematical rules to find patterns in numerical data, but they do not integrate external knowledge beyond the dataset.
To ensure fair comparison, the researchers tested each model’s performance at multiple training sizes, using between 10 and 100 sample formulations. They evaluated accuracy using the Area Under the Receiver Operating Characteristic Curve (AUC), a metric commonly used to measure a classification model’s ability to distinguish between categories.
Instead of retraining the LLMs, the researchers used a method called in-context learning, where the model is shown example formulations and outcomes in a single prompt.
The model then predicts the outcome for a new, unseen formulation based on the examples it was given.
Fewer samples, better results
The largest model, Llama 3–70B, performed better than conventional machine learning methods at smaller training sizes. When trained on 20 samples, it produced comparable AUC results to traditional models trained on 50 samples.
The smaller LLMs also showed improved performance over conventional models at low data volumes, though to a lesser degree.
The researchers investigated what was driving the LLMs’ performance. By stripping away the ingredient names and context and replacing them with abstract codes, the researchers found that accuracy dropped below that of conventional models.
“This suggests that the LLM advantage does not arise from being a better statistical learner, but rather from prior expert knowledge as context is key to performance,” the study states.
The embedded knowledge may come from the model’s pre-training on text sources that include ingredient descriptions, formulation guides, and open-access formulation databases.
Since the dataset used in this study was published after the model’s pre-training cutoff date, the LLM had not seen the specific formulations before.LLMs can classify shampoo formulations as stable or unstable based on their ingredient compositions.
Streamlining R&D
The study adds to a growing body of research exploring how AI can support cosmetic innovation. Reducing the number of physical tests required to assess formulation stability could shorten development cycles and lower resource demands. According to the study, this is particularly relevant for companies working with new or upcycled ingredients that lack historical testing data.
As regulatory and sustainability pressures push formulators to explore unfamiliar ingredient combinations, AI-based prediction tools could help identify viable options faster.
“The performance gain already illustrates the potential of this new LLM-based approach to reduce time and cost in the development of new cosmetic formulations,” the authors write.
This research follows major players in the personal care industry, such as Unilever, who are incorporating AI to improve formulation efficiency.
As companies continue to digitize R&D, integrating language-based AI into predictive formulation may become a competitive advantage.