Skip to content

Commit aec4463

Browse files
committed
update
1 parent c5933ad commit aec4463

File tree

5 files changed

+18
-14
lines changed

5 files changed

+18
-14
lines changed

_layout.scss

Whitespace-only changes.

_posts/2025-03-23-DynamicBatching.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,18 @@ categories: [Code]
1212

1313
To maximise GPU memory when training large models, we want to pack tokens such that sequence padding is minimised and GPU memory is maximised.
1414

15-
We have several options, starting with the default.
15+
1. `torch.utils.data.dataloader` is an python iterable over a PyTorch dataset
16+
2. `torch.utils.data.dataset` implements `__getitem()__`, which maps keys to data samples.
17+
3. `torch.utils.data.sampler` specifies the sequences of keys used in data loading.
1618

17-
#### **Default Approach**
19+
By default, the `DataLoader` will collate individual fetched samples into batches using the arguments `batch_size`, `drop_last`, `batch_sampler`, and `collate_fn`. An alternatively, if `batch_size` is None, we can construct a `BatchSampler` which yields a list of keys at a time.
1820

19-
The most default thing to do is to pad every sequence to the maximimum context window, and return a fixed batch size.
2021

21-
{% highlight python%}
22-
from transformers import AutoTokenizer
23-
tokenizer = AutoTokenizer.from_pretrained(<model_id>, use_fast=True)
24-
tokenizer(examples['text'], truncation=True, max_length=max_seq_length, padding='longest')
25-
{% endhighlight %}
22+
#### **Default Approach**
2623

27-
However, this is incredibly wasteful. Imagine a batch size of 2, where we have a sequence X1 of length 10 and sequence X2 of length 1000 in the same batch. Sequence X1 will be padded for 990 token positions, which is nearly 50\% wasted GPU memory.
24+
We have several options, starting with the default.
25+
26+
The most default thing to do is to pad every sequence to the maximimum context window, and return a fixed batch size. However, this is incredibly wasteful. Imagine a batch size of 2, where we have a sequence X1 of length 10 and sequence X2 of length 1000 in the same batch. Sequence X1 will be padded for 990 token positions, which is nearly 50% wasted GPU memory.
2827

2928
<br>
3029

@@ -132,3 +131,8 @@ class MyTrainer(Trainer):
132131
{% endhighlight %}
133132

134133
Then we can easily do `trainer = MyTrainer(..); trainer.train()`. Because we used BatchSampler, the `batch_size` argument given to trainer should be empty or there will be an error thrown regarding a conflict in `batch_size` number.
134+
135+
<br>
136+
#### **References**
137+
138+
[PyTorch Data Utils Reference](https://pytorch.org/docs/stable/data.html)

_sass/minima.scss

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
// Define defaults for each variable.
44

55
//$base-font-family: "Helvetica Neue", Helvetica, Arial, sans-serif !default;
6-
$base-font-family: Garamond, serif !default;
6+
$base-font-family: Sabon, serif !default;
77
$base-font-size: 16px !default;
88
$base-font-weight: 400 !default;
99
$small-font-size: $base-font-size * 0.875 !default;
10-
$base-line-height: 1.5 !default;
10+
$base-line-height: 1.8 !default;
1111

1212
$spacing-unit: 30px !default;
1313

_sass/minima/._base.scss.swp

12 KB
Binary file not shown.

_sass/minima/_base.scss

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,14 +139,14 @@ blockquote {
139139
*/
140140
pre,
141141
code {
142-
@include relative-font-size(0.8);
142+
@include relative-font-size(0.7);
143143
border: 2px solid $grey-color-light;
144144
border-radius: 0.5px;
145-
background-color: #ccf9f4;
145+
background-color: light-grey;
146146
}
147147

148148
code {
149-
padding: 0.5px 3px;
149+
padding: 0.5px 1px;
150150
}
151151

152152
pre {

0 commit comments

Comments
 (0)