suzyahyah
diff --git a/‎_layout.scss b/‎_layout.scss
diff --git a/‎_posts/2025-03-23-DynamicBatching.md
Lines changed: 13 additions & 9 deletions b/‎_posts/2025-03-23-DynamicBatching.md
Lines changed: 13 additions & 9 deletions
diff --git a/‎_sass/minima.scss
Lines changed: 2 additions & 2 deletions b/‎_sass/minima.scss
Lines changed: 2 additions & 2 deletions
diff --git a/‎_sass/minima/._base.scss.swp
12 KB b/‎_sass/minima/._base.scss.swp
12 KB
diff --git a/‎_sass/minima/_base.scss
Lines changed: 3 additions & 3 deletions b/‎_sass/minima/_base.scss
Lines changed: 3 additions & 3 deletions
@@ -12,19 +12,18 @@ categories: [Code]
 
 To maximise GPU memory when training large models, we want to pack tokens such that sequence padding is minimised and GPU memory is maximised. 
 
-We have several options, starting with the default.
+1. `torch.utils.data.dataloader` is an python iterable over a PyTorch dataset
+2. `torch.utils.data.dataset` implements `__getitem()__`, which maps keys to data samples.
+3. `torch.utils.data.sampler` specifies the sequences of keys used in data loading.
 
-#### **Default Approach**
+By default, the `DataLoader` will collate individual fetched samples into batches using the arguments `batch_size`, `drop_last`, `batch_sampler`, and `collate_fn`. An alternatively, if `batch_size` is None, we can construct a `BatchSampler` which yields a list of keys at a time.
 
-The most default thing to do is to pad every sequence to the maximimum context window, and return a fixed batch size. 
 
-{% highlight python%}
-from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained(<model_id>, use_fast=True)
-tokenizer(examples['text'], truncation=True, max_length=max_seq_length, padding='longest')
-{% endhighlight %}
+#### **Default Approach**
 
-However, this is incredibly wasteful. Imagine a batch size of 2, where we have a sequence X1 of length 10 and sequence X2 of length 1000 in the same batch. Sequence X1 will be padded for 990 token positions, which is nearly 50\% wasted GPU memory.
+We have several options, starting with the default.
+
+The most default thing to do is to pad every sequence to the maximimum context window, and return a fixed batch size. However, this is incredibly wasteful. Imagine a batch size of 2, where we have a sequence X1 of length 10 and sequence X2 of length 1000 in the same batch. Sequence X1 will be padded for 990 token positions, which is nearly 50% wasted GPU memory.
 
 <br>
 
@@ -132,3 +131,8 @@ class MyTrainer(Trainer):
 {% endhighlight %}
 
 Then we can easily do `trainer = MyTrainer(..); trainer.train()`. Because we used BatchSampler, the `batch_size` argument given to trainer should be empty or there will be an error thrown regarding a conflict in `batch_size` number.
+
+<br>
+#### **References**
+
+[PyTorch Data Utils Reference](https://pytorch.org/docs/stable/data.html)
@@ -3,11 +3,11 @@
 // Define defaults for each variable.
 
 //$base-font-family: "Helvetica Neue", Helvetica, Arial, sans-serif !default;
-$base-font-family: Garamond, serif !default;
+$base-font-family: Sabon, serif !default;
 $base-font-size:   16px !default;
 $base-font-weight: 400 !default;
 $small-font-size:  $base-font-size * 0.875 !default;
-$base-line-height: 1.5 !default;
+$base-line-height: 1.8 !default;
 
 $spacing-unit:     30px !default;
 
 
@@ -139,14 +139,14 @@ blockquote {
  */
 pre,
 code {
-  @include relative-font-size(0.8);
+  @include relative-font-size(0.7);
   border: 2px solid $grey-color-light;
   border-radius: 0.5px;
-  background-color: #ccf9f4;
+  background-color: light-grey;
 }
 
 code {
-  padding: 0.5px 3px;
+  padding: 0.5px 1px;
 }
 
 pre {