Back to all blogs
Introducing new Multilingual AI Safety Guardrails for 100 Languages
Introducing new Multilingual AI Safety Guardrails for 100 Languages
Dec 2, 2025
|
5 min read




LLM safety research has made rapid progress in English and a few high-resource languages, but global deployment exposes a hard limitation: safety guardrails do not generalize uniformly across languages. Empirically, multilingual jailbreak success rates increase sharply as we move from English → mid-resource → low-resource languages. This is consistent across multilingual toxicity datasets, multilingual jailbreak benchmarks, and real-world user interactions. Low-resource communities face significantly higher risks, not because their languages are inherently harder, but because current systems were never architected to scale linguistically.
In our new work, “CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer” (short for CRoss-lingual Efficient Safety Transfer), we set out to address this gap from first principles, designing a scalable system that can support 100 languages, run 10× faster than existing LLM-based guardrails, and be trained using data from only 13 high-resource languages. This blog provides a high-level walkthrough of how CREST works, what makes it different, and why it matters. The full details, analysis, and experiments are in the paper.
The Multilingual Safety Gap Problem
Most existing guardrails are impressive engineering feats. But they all share the same bottleneck: They are built and evaluated mostly on English and a handful of high-resource languages. These approaches attempt to fill the gap either using machine translation, larger multilingual LLMs, or fine-tuning on multilingual datasets for every new language. The multilingual coverage scalability is treated as an additive problem. A simple harmful prompt in a high-resource language (eg: English, Spanish, or Hindi) is blocked by these guardrails, but easily bypassed in a low-resource language (eg: Hausa, Javanese, or Pashto). This is not dataset noise, but a systemic failure mode.

Multilingual safety gap: strong filtering in high-resource languages, failures in low-resource ones.
Building a universal multilingual guardrail is a challenge in itself, mainly because of the possible reasons. Due to the scarcity of data for low-resource languages, they are underrepresented in pretraining, causing degraded embedding quality. Most supervised safety datasets exist almost exclusively in English. Although translation tries to fill the gap, but fails to preserve cultural and contextual cues. Additionally, large multilingual guardrails (2.5B–8B) are relatively more accurate but impractical for on-device or real-time use.
The CREST Framework: A Scalable Way to Build Multilingual Safety Guardrails
CREST provides a parameter-efficient alternative to large multilingual guardrails. It uses a cluster-guided cross-lingual transfer strategy built on XLM-RoBERTa (XLM-R) representations to support 100 target languages using training data from only 13 high-resource languages.
Language Embedding Space and Clustering
XLM-R, pretrained on 100 languages, encodes structural and statistical properties of languages in its hidden layers. By translating a multilingual safety dataset into 100 languages, embedding each sample with XLM-R, and aggregating sentence embeddings per language, the paper constructs a language-level semantic representation. Clustering these language embeddings reveals 8 dense clusters capturing unique linguistic characteristics.

Languages are clustered based on representational similarity derived from XLM-R embeddings.
Languages inside a cluster share representational neighborhoods, enabling transfer of safety alignment signals from high-resource languages to low-resource neighbors.
Selecting Representative Training Languages
From these 8 clusters, CREST selects one or two high-resource “anchor” languages per cluster, with a total of 13 languages. These anchors have ample supervised data that cover the morphological and syntactic diversity of the cluster. These anchors reduce redundancy in training and maximize safety signal coverage.
This design choice mainly provides two benefits. Firstly, the fine-tuning distribution approximates the structure of the 100-language space. Secondly, the high-resource anchors act as “representative proxies” for entire clusters. Detailed analysis is provided in the paper that demonstrates that training on these 13 languages is sufficient to obtain strong generalization across all clusters.
Lightweight Safety Model Architecture
CREST guardrail models are developed over two variants. XLM-R-Base (279M) and XLM-R-Large (560M) with a simple binary classification head. The entire model is fine-tuned end-to-end on the aggregated multilingual safety dataset. This architecture is ideal because the vocabulary already supports 100 languages, and the encoder’s cross-lingual alignment is pretrained. Due to a small model size, the fine-tuning cost is low relative to LLM guardrails, and it enables deployment in real-time and on-device environments.
How Well Does CREST Actually Perform?
The paper evaluates CREST on major safety and robustness benchmarks from different aspects. They are High-resource languages, Low-resource languages, Code-switched, and culturally grounded datasets.
English Safety Benchmarks
Despite being 5-15× smaller than large guardrail LLMs, CREST-Large exceeds LlamaGuard3-8B and Aegis-Defensive-7B on average across 6 safety benchmarks. It surpasses smaller baselines like PolyGuard-Smol, DuoGuard, and WalledGuard. This validates that the cluster-guided multilingual training does not degrade performance on English.

F1 score comparison of CREST with baselines on six English safety benchmarks and CSRT (code-switch). Baselines are grouped by scale: Large (≥2.5B) and Small (≤0.5B) models.
High-Resource Languages
Across French, Italian, German, Portuguese, and Spanish, CREST-Large consistently achieves top-tier scores and outperforms other small-scale multilingual guardrails. CREST generalizes strongly to languages like French, Italian, and Portuguese, which weren’t seen in training. This confirms that cluster-level alignment enables high-resource transfer even outside the training set.

Average F1 Score performance across the 6 safety datasets on 5 commonly supported high-resource languages
Low-Resource Languages
CREST is evaluated across 11 low-resource languages from all clusters, including Galician, Icelandic, Afrikaans, Sinhala, Pashto, Javanese, Hausa, and Georgian. On evaluating, we observed that CREST’s zero-shot performance on unseen out-of-distribution (OOD) Low languages is nearly identical to its performance on in-domain (ID) languages. This is the strongest evidence that the cluster-guided transfer strategy works.

Average F1 scores of the CREST-BASE and CREST-LARGE across six safety benchmarks. Scores are reported on both ID and OOD Low languages.
Code-Switching and Cultural Robustness
The guardrails are evaluated on the Code-Switching Red-Teaming(CSRT) dataset and the IndicSafe and Cultural Kaleidoscope datasets, evaluating robustness to code-switching and cultural safety, respectively. CREST-Base and CREST-Large show the best performance among small-scale guardrails on CSRT, which tests for multilingual mixing. It achieves decent robust behavior on IndicSafe-En and Cultural Kaleidoscope. The guardrails are resilient against multilingual mixing and cultural nuances that break other guardrails. This suggests that the representational clusters capture not just linguistic proximity but also semantic alignment patterns of each language, useful for safety.
Why CREST Works Across 100 Languages
The paper’s analysis section presents a few core research questions that target the mechanisms driving CREST’s cross-lingual generalization and the conditions under which transfer between languages is most effective.
High-Resource Languages Provide Maximum Intra-Cluster Transfer
High-resource languages with large corpora show stronger intra-cluster transfer than low-resource ones because they provide richer lexical coverage, more generalizable representations, and stable semantic gradients. In the Indic cluster, for example, Hindi produces significantly better transfer to neighboring Indic languages than medium-resource Kannada or low-resource Sindhi.

Average F1 performance of models across 15 Indic languages trained on one representative Indic language from each resource category.
Cross-Cluster Transfer Is Insufficient
Cross-cluster transfer is weak: a Hindi-trained model performs poorly on the Chinese cluster, and a Chinese-trained model performs poorly on the Indic cluster. However, combining both languages during training improves performance on both clusters. This shows that each cluster still requires its own representative language, since cross-cluster transfer alone is insufficient for broad generalization.
The full paper presents the clustering methodology, embedding visualizations, cross-cluster experiments, low-resource evaluations, and ablation studies. Read the full CREST research paper here.
We release the CREST-Base model checkpoint publicly on HuggingFace, ready for integration.
What CREST Unlocks for AI Safety
The findings from CREST demonstrate that multilingual safety does not require massive multilingual datasets, billions of parameters, or language-specific fine-tuning. It showed that language-space structure can be exploited to build scalable guardrails from limited supervision.
This opens the path for:
real-time multilingual safety for agents
on-device moderation for mobile and embedded systems
inclusive protection for underrepresented languages
efficient pipelines for products with strict latency budgets
deployable multilingual guardrails for enterprise applications
For technical inquiries about this research or to discuss enterprise AI security solutions,
Reach out to our team at contact@repello.ai - we’re here to help you secure your AI systems.
LLM safety research has made rapid progress in English and a few high-resource languages, but global deployment exposes a hard limitation: safety guardrails do not generalize uniformly across languages. Empirically, multilingual jailbreak success rates increase sharply as we move from English → mid-resource → low-resource languages. This is consistent across multilingual toxicity datasets, multilingual jailbreak benchmarks, and real-world user interactions. Low-resource communities face significantly higher risks, not because their languages are inherently harder, but because current systems were never architected to scale linguistically.
In our new work, “CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer” (short for CRoss-lingual Efficient Safety Transfer), we set out to address this gap from first principles, designing a scalable system that can support 100 languages, run 10× faster than existing LLM-based guardrails, and be trained using data from only 13 high-resource languages. This blog provides a high-level walkthrough of how CREST works, what makes it different, and why it matters. The full details, analysis, and experiments are in the paper.
The Multilingual Safety Gap Problem
Most existing guardrails are impressive engineering feats. But they all share the same bottleneck: They are built and evaluated mostly on English and a handful of high-resource languages. These approaches attempt to fill the gap either using machine translation, larger multilingual LLMs, or fine-tuning on multilingual datasets for every new language. The multilingual coverage scalability is treated as an additive problem. A simple harmful prompt in a high-resource language (eg: English, Spanish, or Hindi) is blocked by these guardrails, but easily bypassed in a low-resource language (eg: Hausa, Javanese, or Pashto). This is not dataset noise, but a systemic failure mode.

Multilingual safety gap: strong filtering in high-resource languages, failures in low-resource ones.
Building a universal multilingual guardrail is a challenge in itself, mainly because of the possible reasons. Due to the scarcity of data for low-resource languages, they are underrepresented in pretraining, causing degraded embedding quality. Most supervised safety datasets exist almost exclusively in English. Although translation tries to fill the gap, but fails to preserve cultural and contextual cues. Additionally, large multilingual guardrails (2.5B–8B) are relatively more accurate but impractical for on-device or real-time use.
The CREST Framework: A Scalable Way to Build Multilingual Safety Guardrails
CREST provides a parameter-efficient alternative to large multilingual guardrails. It uses a cluster-guided cross-lingual transfer strategy built on XLM-RoBERTa (XLM-R) representations to support 100 target languages using training data from only 13 high-resource languages.
Language Embedding Space and Clustering
XLM-R, pretrained on 100 languages, encodes structural and statistical properties of languages in its hidden layers. By translating a multilingual safety dataset into 100 languages, embedding each sample with XLM-R, and aggregating sentence embeddings per language, the paper constructs a language-level semantic representation. Clustering these language embeddings reveals 8 dense clusters capturing unique linguistic characteristics.

Languages are clustered based on representational similarity derived from XLM-R embeddings.
Languages inside a cluster share representational neighborhoods, enabling transfer of safety alignment signals from high-resource languages to low-resource neighbors.
Selecting Representative Training Languages
From these 8 clusters, CREST selects one or two high-resource “anchor” languages per cluster, with a total of 13 languages. These anchors have ample supervised data that cover the morphological and syntactic diversity of the cluster. These anchors reduce redundancy in training and maximize safety signal coverage.
This design choice mainly provides two benefits. Firstly, the fine-tuning distribution approximates the structure of the 100-language space. Secondly, the high-resource anchors act as “representative proxies” for entire clusters. Detailed analysis is provided in the paper that demonstrates that training on these 13 languages is sufficient to obtain strong generalization across all clusters.
Lightweight Safety Model Architecture
CREST guardrail models are developed over two variants. XLM-R-Base (279M) and XLM-R-Large (560M) with a simple binary classification head. The entire model is fine-tuned end-to-end on the aggregated multilingual safety dataset. This architecture is ideal because the vocabulary already supports 100 languages, and the encoder’s cross-lingual alignment is pretrained. Due to a small model size, the fine-tuning cost is low relative to LLM guardrails, and it enables deployment in real-time and on-device environments.
How Well Does CREST Actually Perform?
The paper evaluates CREST on major safety and robustness benchmarks from different aspects. They are High-resource languages, Low-resource languages, Code-switched, and culturally grounded datasets.
English Safety Benchmarks
Despite being 5-15× smaller than large guardrail LLMs, CREST-Large exceeds LlamaGuard3-8B and Aegis-Defensive-7B on average across 6 safety benchmarks. It surpasses smaller baselines like PolyGuard-Smol, DuoGuard, and WalledGuard. This validates that the cluster-guided multilingual training does not degrade performance on English.

F1 score comparison of CREST with baselines on six English safety benchmarks and CSRT (code-switch). Baselines are grouped by scale: Large (≥2.5B) and Small (≤0.5B) models.
High-Resource Languages
Across French, Italian, German, Portuguese, and Spanish, CREST-Large consistently achieves top-tier scores and outperforms other small-scale multilingual guardrails. CREST generalizes strongly to languages like French, Italian, and Portuguese, which weren’t seen in training. This confirms that cluster-level alignment enables high-resource transfer even outside the training set.

Average F1 Score performance across the 6 safety datasets on 5 commonly supported high-resource languages
Low-Resource Languages
CREST is evaluated across 11 low-resource languages from all clusters, including Galician, Icelandic, Afrikaans, Sinhala, Pashto, Javanese, Hausa, and Georgian. On evaluating, we observed that CREST’s zero-shot performance on unseen out-of-distribution (OOD) Low languages is nearly identical to its performance on in-domain (ID) languages. This is the strongest evidence that the cluster-guided transfer strategy works.

Average F1 scores of the CREST-BASE and CREST-LARGE across six safety benchmarks. Scores are reported on both ID and OOD Low languages.
Code-Switching and Cultural Robustness
The guardrails are evaluated on the Code-Switching Red-Teaming(CSRT) dataset and the IndicSafe and Cultural Kaleidoscope datasets, evaluating robustness to code-switching and cultural safety, respectively. CREST-Base and CREST-Large show the best performance among small-scale guardrails on CSRT, which tests for multilingual mixing. It achieves decent robust behavior on IndicSafe-En and Cultural Kaleidoscope. The guardrails are resilient against multilingual mixing and cultural nuances that break other guardrails. This suggests that the representational clusters capture not just linguistic proximity but also semantic alignment patterns of each language, useful for safety.
Why CREST Works Across 100 Languages
The paper’s analysis section presents a few core research questions that target the mechanisms driving CREST’s cross-lingual generalization and the conditions under which transfer between languages is most effective.
High-Resource Languages Provide Maximum Intra-Cluster Transfer
High-resource languages with large corpora show stronger intra-cluster transfer than low-resource ones because they provide richer lexical coverage, more generalizable representations, and stable semantic gradients. In the Indic cluster, for example, Hindi produces significantly better transfer to neighboring Indic languages than medium-resource Kannada or low-resource Sindhi.

Average F1 performance of models across 15 Indic languages trained on one representative Indic language from each resource category.
Cross-Cluster Transfer Is Insufficient
Cross-cluster transfer is weak: a Hindi-trained model performs poorly on the Chinese cluster, and a Chinese-trained model performs poorly on the Indic cluster. However, combining both languages during training improves performance on both clusters. This shows that each cluster still requires its own representative language, since cross-cluster transfer alone is insufficient for broad generalization.
The full paper presents the clustering methodology, embedding visualizations, cross-cluster experiments, low-resource evaluations, and ablation studies. Read the full CREST research paper here.
We release the CREST-Base model checkpoint publicly on HuggingFace, ready for integration.
What CREST Unlocks for AI Safety
The findings from CREST demonstrate that multilingual safety does not require massive multilingual datasets, billions of parameters, or language-specific fine-tuning. It showed that language-space structure can be exploited to build scalable guardrails from limited supervision.
This opens the path for:
real-time multilingual safety for agents
on-device moderation for mobile and embedded systems
inclusive protection for underrepresented languages
efficient pipelines for products with strict latency budgets
deployable multilingual guardrails for enterprise applications
For technical inquiries about this research or to discuss enterprise AI security solutions,
Reach out to our team at contact@repello.ai - we’re here to help you secure your AI systems.

You might also like

8 The Green, Ste A
Dover, DE 19901, United States of America

8 The Green, Ste A
Dover, DE 19901, United States of America

8 The Green, Ste A
Dover, DE 19901, United States of America







