lightly-ai · liopeer · Feb 4, 2025 · Feb 4, 2025 · Feb 4, 2025 · Feb 4, 2025
diff --git a/docs/source/examples/byol.rst b/docs/source/examples/byol.rst
@@ -3,10 +3,37 @@
 BYOL
 ====
 
-Example implementation of the BYOL architecture.
+BYOL (Bootstrap Your Own Latent) [0]_ is a self-supervised learning framework for visual 
+representation learning without negative samples. Unlike contrastive learning methods, 
+such as MoCo [1]_ and SimCLR [2]_ that compare positive and negative pairs, BYOL uses 
+two neural networks – "online" and "target" – where the online network is 
+trained to predict the target’s representation of the same image under different 
+augmentations, yielding in iterative bootstrapping of the latent samples. 
+The target's weights are updated as the exponential moving average 
+(EMA) of the online network, and the authors show that this is sufficient to prevent 
+collapse to trivial solutions. The authors also show that due to the absence
+of negative samples, BYOL is less sensitive to the batch size during training and manages
+to achieve state-of-the-art performance on several semi-supervised and transfer learning benchmarks.
+
+Key Components
+--------------
+
+- **Data Augmentations**: BYOL [0]_ uses the same augmentations as SimCLR [2]_, namely random resized crop, random horizontal flip, color distortions, Gaussian blur and solarization. The color distortion consists of a random sequence of brightness, constrast, saturation, hue adjustments and an optional grayscale conversion. However the hyperparameters for the augmentations are different from SimCLR [2]_.
+- **Backbone**: BYOL [0]_ uses ResNet-type convolutional backbones as the online and target networks. They do not evaluate the performance of other architectures.
+- **Projection & Prediction Head**: A projection head is used to map the output of the backbone to a lower-dimensional space. For this, the target network once again relies on an EMA of the online network. A notable architectureal choice is the use of an additional prediction head, a secondary MLP appended to only the online network's projection head.
+- **Loss Function**: BYOL [0]_ uses a negative cosine similarity loss between the representations of the online's prediction output and the target's projection output.
+
+Good to Know
+-------------
+
+- **Backbone Networks**: SimCLR is specifically optimized for convolutional neural networks, with a focus on ResNet architectures. We do not recommend using it with transformer-based models and instead suggest using :doc:`DINO <dino>` [3]_.
+
 
 Reference:
-    `Bootstrap your own latent: A new approach to self-supervised Learning, 2020 <https://arxiv.org/abs/2006.07733>`_
+    .. [0] `Bootstrap your own latent: A new approach to self-supervised Learning, 2020 <https://arxiv.org/abs/2006.07733>`_
+    .. [1] `Momentum Contrast for Unsupervised Visual Representation Learning, 2019 <https://arxiv.org/abs/1911.05722>`_
+    .. [2] `A Simple Framework for Contrastive Learning of Visual Representations, 2020 <https://arxiv.org/abs/2002.05709>`_
+    .. [3] `Emerging Properties in Self-Supervised Vision Transformers, 2021 <https://arxiv.org/abs/2104.14294>`_
 
 
 .. tabs::