About
EvolutionaryScale is an AI research company building frontier models for the life sciences. Their flagship product, ESM3, is a multimodal biological language model trained on 2.78 billion natural proteins and 771 billion unique tokens of evolutionary data, using 98 billion parameters and over 10^24 FLOPs of compute. ESM3 treats biology as a programmable system, understanding the 20 amino acids that form life's alphabet and enabling generative design of entirely novel proteins. Unlike traditional protein models, ESM3 simultaneously reasons over three fundamental properties — sequence, structure, and function — allowing scientists to provide mixed inputs and explore vast design spaces. This emergent reasoning capability was demonstrated by designing esmGFP, a novel green fluorescent protein representing 500 million years of evolutionary distance from any natural counterpart. ESM Cambrian, a parallel model family, sets a new state of the art for protein sequence representation learning, complementing ESM3's generative capabilities. Use cases include designing enzymes that break down plastics, proteins that capture carbon, and novel antibody therapeutics. ESM3 is available in small, medium, and large sizes through EvolutionaryScale's Forge API (closed beta), AWS SageMaker, AWS Bedrock, and NVIDIA BioNemo. ESM3-open, a smaller model with full weights and source code, is freely available on GitHub under a non-commercial license, making it accessible to academic researchers worldwide.
Key Features
- Multimodal Protein Reasoning: ESM3 simultaneously reasons over protein sequence, structure, and function, allowing mixed-modality inputs to explore vast protein design spaces.
- Generative Protein Design: Generate entirely novel proteins using chain-of-thought prompting — demonstrated by creating esmGFP, a protein representing 500 million years of evolutionary departure from natural fluorescent proteins.
- ESM Cambrian Representation Model: A parallel model family to ESM3 that sets a new state of the art in protein sequence representation learning, ideal for classification and analysis tasks.
- Massive Evolutionary Training: Trained on 2.78 billion natural proteins and 771 billion unique tokens, with 98 billion parameters, capturing the full breadth of evolutionary biology.
- Flexible Deployment Options: Available via Forge API, AWS SageMaker and Bedrock, NVIDIA BioNemo, and as an open-source model on GitHub for non-commercial research use.
Use Cases
- Designing novel therapeutic antibodies and drug candidates by exploring vast protein sequence and structure spaces beyond natural evolution.
- Engineering enzymes for environmental applications, such as proteins that degrade plastic waste or capture atmospheric carbon.
- Academic research into protein folding, function prediction, and evolutionary biology using the open-source ESM3-open model.
- Developing new fluorescent proteins and biosensors for use as research tools in cell biology and diagnostics.
- Enterprise biotech and pharmaceutical R&D pipelines requiring scalable, cloud-deployable protein design AI via AWS or NVIDIA BioNemo.
Pros
- State-of-the-Art Protein AI: ESM3 represents the most capable generative model for protein design, backed by unprecedented compute and training data scale.
- Open-Source Option: ESM3-open provides free access to weights and source code on GitHub, making cutting-edge protein AI accessible to academic researchers.
- Multi-Cloud Availability: Integration with AWS and NVIDIA platforms enables enterprise-grade deployment within existing scientific computing infrastructure.
- Broad Scientific Applications: Applicable to drug discovery, enzyme engineering, carbon capture, plastic degradation, and any domain requiring novel protein design.
Cons
- API Access is in Closed Beta: Full API access through Forge is not publicly available yet, requiring users to apply for access and wait for approval.
- Non-Commercial License for Open Model: ESM3-open is only freely available for non-commercial use; commercial applications require a paid API or enterprise agreement.
- Highly Specialized Domain: ESM3 is purpose-built for protein science and life sciences research, making it unsuitable for general-purpose AI tasks.
Frequently Asked Questions
ESM3 is a generative biological language model developed by EvolutionaryScale that reasons over protein sequence, structure, and function simultaneously, enabling scientists to design novel proteins with properties not found in nature.
ESM3 is accessible via the Forge API (closed beta — apply for access on the EvolutionaryScale website), AWS SageMaker and Bedrock, and NVIDIA BioNemo. The open-source ESM3-open model with weights is freely available on GitHub for non-commercial use.
ESM3 is EvolutionaryScale's generative flagship model for designing novel proteins. ESM Cambrian is a parallel model family optimized for protein sequence representation learning, delivering breakthrough performance for analysis and classification tasks.
ESM3 can be used to design enzymes for breaking down plastics (PETase), proteins that capture carbon (carbonic anhydrase), novel antibodies for drug discovery, and entirely new fluorescent proteins — among many other life science applications.
The open-source ESM3-open model is available under a non-commercial license. Commercial use requires access through the Forge API or supported cloud platforms like AWS and NVIDIA BioNemo. Contact EvolutionaryScale for enterprise licensing details.
