Back to Glossary

Attention Mechanism

Models & Architectures

Weights relevant parts of the input dynamically.


Attention computes weighted combinations of representations to focus on important information.

  • Types: Self-, cross-, and causal attention.
  • Benefits: Better context handling and potential interpretability (attention maps).
  • Costs: Quadratic complexity in full self-attention; long-context optimizations are needed.