NeoBERT

Code is coming soon!

Description

NeoBERT is a next-generation encoder model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions.

Paper: paper
Model: huggingface.

Get started

Ensure you have the following dependencies installed:

pip install transformers torch xformers==0.0.28.post3

If you would like to use sequence packing (un-padding), you will need to also install flash-attention:

pip install transformers torch xformers==0.0.28.post3 flash_attn

How to use

Load the model using Hugging Face Transformers:

from transformers import AutoModel, AutoTokenizer

model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")

# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)

Features

Feature	NeoBERT
`Depth-to-width`	28 × 768
`Parameter count`	250M
`Activation`	SwiGLU
`Positional embeddings`	RoPE
`Normalization`	Pre-RMSNorm
`Data Source`	RefinedWeb
`Data Size`	2.8 TB
`Tokenizer`	google/bert
`Context length`	4,096
`MLM Masking Rate`	20%
`Optimizer`	AdamW
`Scheduler`	CosineDecay
`Training Tokens`	2.1 T
`Efficiency`	FlashAttention

License

Model weights and code repository are licensed under the permissive MIT license.

Citation

If you use this model in your research, please cite:

@misc{breton2025neobertnextgenerationbert,
      title={NeoBERT: A Next-Generation BERT}, 
      author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
      year={2025},
      eprint={2502.19587},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.19587}, 
}

Contact

For questions, do not hesitate to reach out and open an issue on here or on our GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeoBERT

Description

Get started

How to use

Features

License

Citation

Contact

About

Releases

Packages

chandar-lab/NeoBERT

Folders and files

Latest commit

History

Repository files navigation

NeoBERT

Description

Get started

How to use

Features

License

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages