diff --git a/.gitignore b/.gitignore
index f13ef30..eb03af6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -160,4 +160,10 @@ cython_debug/
 .idea/
 .DS_Store
 docs/chapter2/models/GPT-2/*
-MetaICL/
\ No newline at end of file
+MetaICL/
+data/
+output/
+outputs/
+
+docs/chapter2/code/BabyLlama/models/*
+*.zip
diff --git a/PPT/distillation.pptx b/PPT/distillation.pptx
new file mode 100644
index 0000000..a7afe40
Binary files /dev/null and b/PPT/distillation.pptx differ
diff --git a/docs/chapter2/README.md b/docs/chapter2/README.md
index 136d7fc..2db71a8 100644
--- a/docs/chapter2/README.md
+++ b/docs/chapter2/README.md
@@ -1,6 +1,6 @@
 # 2.1 蒸馏
 
-本章讲介绍基于Transformer的模型的主流蒸馏方法和代码，还将实现一个端侧部署demo。
+本章将介绍大模型的主流蒸馏方法和代码。
 
 ## Roadmap
 ### 1. 蒸馏基础
@@ -18,7 +18,7 @@
 -  2.1 概述
       - 何时使用白盒蒸馏
 -  2.2 MiniLLM
--  2.3 GKD
+-  2.3 BabyLlama
 
 ### 3. 基于涌现能力的蒸馏（黑盒蒸馏）
 -  3.1 概述
@@ -33,5 +33,3 @@
 
 
 ### 4. 总结
-- 4.1 前沿相关工作扩展
-- 4.2 总结
diff --git a/docs/chapter2/chapter2_1.md b/docs/chapter2/chapter2_1.md
index 8cebd91..d064528 100644
--- a/docs/chapter2/chapter2_1.md
+++ b/docs/chapter2/chapter2_1.md
@@ -54,6 +54,10 @@ https://github.com/datawhalechina/awesome-compression/blob/main/docs/ch06/ch06.m
 
 ![](images/Figure%206.png)
 
+# 前置知识
+了解以下知识有助于接下来的学习：
+1. logits 和 软目标
+2. 监督微调（SFT技术）
 
 
 参考文献： 
diff --git a/docs/chapter2/chapter2_2.md b/docs/chapter2/chapter2_2.md
index 44070ea..20d612e 100644
--- a/docs/chapter2/chapter2_2.md
+++ b/docs/chapter2/chapter2_2.md
@@ -1,7 +1,7 @@
 # 白盒蒸馏
 
 ## 1. 什么是白盒蒸馏
-对于开源的大模型，我们可以获得一切的模型推理时数据，包括token输出的概率分布。这种能够获得token输出概率分布的场景，可以被看作“白盒”场景。反之是黑盒场景。利用白盒所提供的数据进行蒸馏，是白盒蒸馏。
+白盒蒸馏是指在蒸馏过程中使用到教师模型的参数或 logits 的 蒸馏技术[2].
 
 接下来我们会介绍经典的白盒蒸馏方法和代码实现。
 
@@ -47,18 +47,45 @@ MiniLLM的论文中提出了另一个新颖的视角——逆向KL其实可以
 由于这部分涉及较多数学公式推导和强化学习，有兴趣的同学可以查看论文自行学习。
 
 # 3. BabyLlama（实践）
-[BabyLlama](http://arxiv.org/abs/2308.02019)将蒸馏看作一种提高训练样本利用效率的有效方式。作为代码实践的例子，我们将看到它的蒸馏损失函数使用到了教师模型的soft-labels。
-
-BabyLlama的代码包含了 
-1. 数据清洗和tokenizer训练
-2. 教师模型训练
-3. 蒸馏学生模型
-
-但实际上白盒蒸馏也可以使用现成的开源模型和tokenizer。
-
-
+[BabyLlama](http://arxiv.org/abs/2308.02019) 将小模型蒸馏直接应用到了大模型上。它的损失函数是以下两种损失的加权和：
+- 和硬损失的交叉熵
+- 和软损失的KL散度
+
+在code/BabyLlama/3.distill.ipynb中可以看到它的损失函数：
+```python
+def compute_loss(self, model, inputs, return_outputs=False):
+        # 硬损失，即和ground truth的交叉熵
+        outputs_student = model(**inputs)
+        student_loss = outputs_student.loss
+
+        # compute teacher output
+        with torch.no_grad():
+            all_teacher_logits = []
+            for teacher in self.teachers:
+                outputs_teacher = teacher(**inputs)
+                all_teacher_logits.append(outputs_teacher.logits)
+            avg_teacher_logits = torch.stack(all_teacher_logits).mean(dim=0)
+
+        # assert size
+        assert outputs_student.logits.size() == avg_teacher_logits.size()
+
+        # 软损失，和教师模型输出分布的KL散度
+        loss_function = nn.KLDivLoss(reduction="batchmean")
+        loss_logits = (
+            loss_function(
+                F.log_softmax(outputs_student.logits / self.args.temperature, dim=-1),
+                F.softmax(avg_teacher_logits / self.args.temperature, dim=-1),
+            )
+            * (self.args.temperature ** 2)
+        )
+        # Return weighted student loss
+        loss = self.args.alpha * student_loss + (1.0 - self.args.alpha) * loss_logits
+        return (loss, outputs_student) if return_outputs else loss
+```
 
 ## 参考资料
-- MiniLLM: Knowledge Distillation of Large Language Models
-- https://github.com/microsoft/LMOps/tree/main/minillm 
-- https://blog.csdn.net/ningmengzhihe/article/details/130679350
\ No newline at end of file
+1. MiniLLM: Knowledge Distillation of Large Language Models
+2. Efficient Large Language Models: A Survey
+3. https://github.com/microsoft/LMOps/tree/main/minillm 
+4. https://blog.csdn.net/ningmengzhihe/article/details/130679350
+5. Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty
\ No newline at end of file
diff --git a/docs/chapter2/chapter2_3.md b/docs/chapter2/chapter2_3.md
index f84276c..b47dd27 100644
--- a/docs/chapter2/chapter2_3.md
+++ b/docs/chapter2/chapter2_3.md
@@ -1,4 +1,5 @@
-# 基于涌现能力的蒸馏（黑盒蒸馏）
+# 黑盒蒸馏（技巧蒸馏）
+黑盒蒸馏所用到的仅仅是教师模型的回答（有时也包括输出的概率分布，即软目标，但是不会用到logits）。
 
 黑盒蒸馏意味着教师模型的输出是我们唯一能获取到的训练资源，因此，黑盒蒸馏整体的思路可以分成两步：
 1. 从教师模型收集问答数据 
@@ -17,9 +18,7 @@
 
 以下是一个简单的ICL例子：
 
-<div align="center">
-    <img src="images/image-1.png" alt="alt text" width="550"/>
-</div>
+![alt_text](images/image-1.png)
 
 模型成功模仿了示例中的答题思路和答题格式。
 
@@ -44,9 +43,9 @@ y_3
 即只要在prompt前加几个例子，模型就能学到其中的格式和逻辑，从而不用更新参数就能学习。
 
 训练之前，我们会收集如下含有提示词和标签的数据：
-<div align="center">
-    <img src="images/image-3.png" alt="alt text" width="750"/>
-</div>
+
+
+![alt_text](images/image-3.png)
 
 ## 1.2 ICL 微调
 
@@ -132,7 +131,7 @@ $$
 $$
 
 ## 1.5 实践
-
+参见code/ICL。
 
 ## 1.6 改进方向
 模型上下文学习的性能和上下文中的例子质量紧密相关，所以有人研究专门设计了一个例子检索器，检索高质量的示例[6]。
@@ -175,7 +174,7 @@ $$
 
 
 ## 2.3 指令跟随蒸馏实践
-
+参见 code/InsructFollowing.
 
 ## 2.4 对抗蒸馏
 对抗蒸馏(adversarial distillation)提出除了可以让知识单向地从教师注入学生，学生也可以产生“反馈”，
@@ -210,12 +209,13 @@ $$
 ## 步骤3：实践
 正式的微调Loss函数是交叉熵损失。
 
+参见code/CoT.
 
 
 
 # 4. 扩展和总结
 实际上，除了以上三种涌现能力的蒸馏，只要是从教授模型收集某种类型的数据，然后用这些数据微调学生模型，都是黑盒蒸馏的应用范围。
-因此，对于一些特定领域和特定需求的任务，也可以使用类似的方法达到希望的效果。比如近期上海交通大学的[O1复现论文](https://arxiv.org/pdf/2410.18982)就是一个很好的对教师模型的推理能力进行蒸馏的例子。
+因此，对于一些特定领域和特定需求的任务，也可以使用类似的方法达到希望的效果。比如随着OpenAI 强大的复杂推理模型O1的发布，对推理能力进行蒸馏也可以套用上面的方法。近期上海交通大学的[O1复现论文](https://arxiv.org/pdf/2410.18982)就是一个很好的对教师模型的推理能力进行蒸馏的例子。
 
 但是也有研究[5]指出黑盒蒸馏导致仅模仿但不理解的问题，要提高学习质量，还需学生有良好的天赋（base 模型的能力）。
 
diff --git a/docs/chapter2/chapter2_4.md b/docs/chapter2/chapter2_4.md
index fd7a750..dcc2acc 100644
--- a/docs/chapter2/chapter2_4.md
+++ b/docs/chapter2/chapter2_4.md
@@ -1,2 +1,7 @@
 # 总结
 
+本章中，我们学习了大模型蒸馏的概念，与传统蒸馏的不同，以及主流的大模型蒸馏范式。
+笔者认为，不论是白盒还是黑盒蒸馏，大模型蒸馏贯穿始终的思想是“训练数据来源于教师”，而非人为标注或机器标注。
+
+蒸馏无疑是一种低成本高效率的提升小模型能力的方式，也可以说它是一条“捷径”，它的初衷是有在限资源部署更好的模型。
+但是作为长期主义的研究工作者，想要提升模型能力，不能一味地依靠蒸馏“走捷径”，还是要从第一性原理出发，从根本上探索提升模型能力的技术路线。
\ No newline at end of file
diff --git a/docs/chapter2/code/BabyLlama/1.clean_and_tokenize.ipynb b/docs/chapter2/code/BabyLlama/1.clean_and_tokenize.ipynb
index cad5614..efbb5ec 100644
--- a/docs/chapter2/code/BabyLlama/1.clean_and_tokenize.ipynb
+++ b/docs/chapter2/code/BabyLlama/1.clean_and_tokenize.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -11,11 +11,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# 下载数据：https://osf.io/rduj2"
+    "# 下载两份数据：https://osf.io/5mk3x, https://osf.io/m48ed\n",
+    "# 将两份数据解压到当前目录下的data文件夹中\n",
+    "# data目录结构如下：\n",
+    "# data/\n",
+    "#   |--train_10M/\n",
+    "#   |--dev/"
    ]
   },
   {
@@ -36,23 +41,24 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
     "from pathlib import Path\n",
-    "from mrclean import *"
+    "from mrclean import *\n",
+    "import os"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
-    "DATA_ROOT = Path(\"F:/llm-deploy-data/data/Babyllama\")\n",
+    "DATA_ROOT = Path(\"./data\")\n",
     "SEQ_LENGTH = 128 # this is a legacy parameter, it does not affect cleaning\n",
-    "DATA_SPLITS = ['babylm_10M', 'babylm_dev']\n",
+    "DATA_SPLITS = ['train_10M', 'dev']\n",
     "\n",
     "CLEANUP_FUNCTIONS = {\n",
     "    'aochildes': cleanup_aochildes,\n",
@@ -70,25 +76,25 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "🧹 Cleaned 'bnc_spoken.train' (size 4883879 -> 4851676) in babylm_10M\n",
-      "🧹 Cleaned 'childes.train' (size 15482927 -> 15482927) in babylm_10M\n",
-      "🧹 Cleaned 'gutenberg.train' (size 13910986 -> 13910986) in babylm_10M\n",
-      "🧹 Cleaned 'open_subtitles.train' (size 10806305 -> 10804026) in babylm_10M\n",
-      "🧹 Cleaned 'simple_wiki.train' (size 8411630 -> 8387062) in babylm_10M\n",
-      "🧹 Cleaned 'switchboard.train' (size 719322 -> 719322) in babylm_10M\n",
-      "🧹 Cleaned 'bnc_spoken.dev' (size 6538139 -> 6503778) in babylm_dev\n",
-      "🧹 Cleaned 'childes.dev' (size 14638378 -> 14638378) in babylm_dev\n",
-      "🧹 Cleaned 'gutenberg.dev' (size 15490473 -> 15490473) in babylm_dev\n",
-      "🧹 Cleaned 'open_subtitles.dev' (size 11016133 -> 11014854) in babylm_dev\n",
-      "🧹 Cleaned 'simple_wiki.dev' (size 8149513 -> 8128239) in babylm_dev\n",
-      "🧹 Cleaned 'switchboard.dev' (size 724013 -> 724013) in babylm_dev\n"
+      "🧹 Cleaned 'childes.train' (size 15482927 -> 15482927) in train_10M\n",
+      "🧹 Cleaned 'simple_wiki.train' (size 8411630 -> 8387062) in train_10M\n",
+      "🧹 Cleaned 'bnc_spoken.train' (size 4883879 -> 4851676) in train_10M\n",
+      "🧹 Cleaned 'gutenberg.train' (size 13910986 -> 13910986) in train_10M\n",
+      "🧹 Cleaned 'switchboard.train' (size 719322 -> 719322) in train_10M\n",
+      "🧹 Cleaned 'open_subtitles.train' (size 10806305 -> 10804026) in train_10M\n",
+      "🧹 Cleaned 'switchboard.dev' (size 724013 -> 724013) in dev\n",
+      "🧹 Cleaned 'simple_wiki.dev' (size 8149513 -> 8128239) in dev\n",
+      "🧹 Cleaned 'gutenberg.dev' (size 15490473 -> 15490473) in dev\n",
+      "🧹 Cleaned 'bnc_spoken.dev' (size 6538139 -> 6503778) in dev\n",
+      "🧹 Cleaned 'open_subtitles.dev' (size 11016133 -> 11014854) in dev\n",
+      "🧹 Cleaned 'childes.dev' (size 14638378 -> 14638378) in dev\n"
      ]
     }
    ],
@@ -117,7 +123,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -129,7 +135,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
@@ -142,7 +148,7 @@
    ],
    "source": [
     "# We train the tokenizer on the train data only\n",
-    "data_dir = Path(\"F:/llm-deploy-data/data/Babyllama/babylm_10M_clean/\")\n",
+    "data_dir = Path(\"./data/train_10M_clean/\")\n",
     "\n",
     "paths = [str(f) for f in data_dir.glob(\"*\") if f.is_file() and not f.name.endswith(\".DS_Store\") and f.suffix in [\".train\"]]\n",
     "\n",
@@ -153,7 +159,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -167,9 +173,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 9,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
    "source": [
     "trainer = trainers.BpeTrainer(vocab_size=16000, min_frequency=2, special_tokens=[\"<pad>\", \"<s>\", \"</s>\"])\n",
     "tokenizer.train(paths, trainer)"
@@ -177,11 +193,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
-    "tokenizer_path =  DATA_ROOT / \"models/gpt-clean-16000.json\"\n",
+    "tokenizer_path = \"./models/gpt-clean-16000.json\"\n",
+    "os.makedirs(\"models\", exist_ok=True)\n",
     "tokenizer.save(str(tokenizer_path), pretty=True)"
    ]
   },
@@ -194,7 +211,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [
     {
@@ -202,7 +219,7 @@
      "output_type": "stream",
      "text": [
       "Encoded String: ['ĠThe', 'Ġquick', 'Ġbrown', 'Ġfox', 'Ġjumps', 'Ġover', 'Ġthe', 'Ġlazy', 'Ġdog', '.']\n",
-      "Encoded IDs: [302, 1784, 3266, 5712, 15961, 541, 190, 11553, 1469, 16]\n",
+      "Encoded IDs: [300, 1782, 3264, 5710, 15959, 539, 188, 11551, 1467, 16]\n",
       "Decoded String:  The quick brown fox jumps over the lazy dog.\n"
      ]
     }
@@ -248,7 +265,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.9.20"
   },
   "orig_nbformat": 4
  },
diff --git a/docs/chapter2/code/BabyLlama/2.teacher_train.ipynb b/docs/chapter2/code/BabyLlama/2.teacher_train.ipynb
index 866d39c..72efb75 100644
--- a/docs/chapter2/code/BabyLlama/2.teacher_train.ipynb
+++ b/docs/chapter2/code/BabyLlama/2.teacher_train.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -15,7 +15,9 @@
    "source": [
     "# 训练教师模型GPT-2 和 Llama\n",
     "\n",
-    "> 论文中写到：\"The GPT-2 model has 24 layers, 16 attention heads, an embedding dimension of 1536, intermediate size of 6144, and maximum sequence length of 128, resulting in 705M parameters. It was trained for 6 epochs with a batch size of 256 and maximum learning rate3 of 2.5 · 10−4. The LLaMA model has 24 layers, 8 attention heads, a hidden size of 1024, intermediate size of 3072, and maximum sequence length of 256, resulting in 360M parameters. It was trained for 4 epochs with a batch size of 128 and maximum learning rate of 3 · 10−4.\""
+    "> 论文中写到：\n",
+    ">\n",
+    "> \"The GPT-2 model has 24 layers, 16 attention heads, an embedding dimension of 1536, intermediate size of 6144, and maximum sequence length of 128, resulting in 705M parameters. It was trained for 6 epochs with a batch size of 256 and maximum learning rate3 of 2.5 · 10−4. The LLaMA model has 24 layers, 8 attention heads, a hidden size of 1024, intermediate size of 3072, and maximum sequence length of 256, resulting in 360M parameters. It was trained for 4 epochs with a batch size of 128 and maximum learning rate of 3 · 10−4.\""
    ]
   },
   {
@@ -24,30 +26,23 @@
    "metadata": {},
    "outputs": [
     {
-     "name": "stderr",
+     "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Token indices sequence length is longer than the specified maximum sequence length for this model (1282184 > 128). Running this sequence through the model will result in indexing errors\n"
+      "Loading data from data/train_10M_clean/tokenized_GPT2TokenizerFast_16000.pt\n",
+      "🔥 数据集总大小: 16912909\n",
+      "🔥 为了缩短训练时间，这里缩减为: 375842\n",
+      "Loading data from data/dev_clean/tokenized_GPT2TokenizerFast_16000.pt\n",
+      "🔥 数据集总大小: 17428872\n",
+      "🔥 为了缩短训练时间，这里缩减为: 87144\n"
      ]
     },
     {
-     "name": "stdout",
+     "name": "stderr",
      "output_type": "stream",
      "text": [
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\bnc_spoken.train len: 1282184\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\childes.train len: 6301883\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\gutenberg.train len: 3482661\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\open_subtitles.train len: 3394685\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\simple_wiki.train len: 2202434\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\switchboard.train len: 249559\n",
-      "Saving data to F:\\llm-deploy-data\\data\\Babyllama\\babylm_10M_clean\\tokenized_GPT2TokenizerFast_16000.pt\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\bnc_spoken.dev len: 1749792\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\childes.dev len: 5927646\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\gutenberg.dev len: 3896232\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\open_subtitles.dev len: 3466050\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\simple_wiki.dev len: 2138217\n",
-      "🔥 F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\switchboard.dev len: 251354\n",
-      "Saving data to F:\\llm-deploy-data\\data\\Babyllama\\babylm_dev_clean\\tokenized_GPT2TokenizerFast_16000.pt\n"
+      "/home/PJLAB/gaoyufei/workdir/llm-deploy/docs/chapter2/code/BabyLlama/babylm_dataset.py:31: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n",
+      "  self.data = torch.load(tokenized_file)\n"
      ]
     }
    ],
@@ -59,9 +54,9 @@
     "from random import sample, seed\n",
     "from torch.utils.data import Subset\n",
     "\n",
-    "data_train_path = \"F:/llm-deploy-data/data/Babyllama/babylm_10M_clean\"\n",
-    "data_eval_path = \"F:/llm-deploy-data/data/Babyllama/babylm_dev_clean\"\n",
-    "tokenizer_path = \"F:/llm-deploy-data/data/Babyllama/models/gpt-clean-16000.json\"\n",
+    "data_train_path = \"./data/train_10M_clean\"\n",
+    "data_eval_path = \"./data/dev_clean\"\n",
+    "tokenizer_path = \"./models/gpt-clean-16000.json\"\n",
     "\n",
     "SEQ_LENGTH = 128\n",
     "tokenizer = GPT2TokenizerFast(tokenizer_file= str(tokenizer_path))\n",
@@ -70,6 +65,7 @@
     "tokenizer.pad_token = \"<pad>\"\n",
     "tokenizer.model_max_length = SEQ_LENGTH\n",
     "\n",
+    "# 进入BsbylmDataset类，可以在初始化函数中修改数据集大小\n",
     "train_dataset = BabylmDataset(data_train_path, SEQ_LENGTH, tokenizer=tokenizer, random_chunk=True)\n",
     "full_eval_dataset = BabylmDataset(data_eval_path, SEQ_LENGTH, tokenizer=tokenizer, offset=0)\n",
     "\n",
@@ -84,41 +80,229 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "e:\\Anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
+      "  warnings.warn(\n",
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1590: FutureWarning: using `no_cuda` is deprecated and will be removed in version 5.0 of 🤗 Transformers. Use `use_cpu` instead\n",
       "  warnings.warn(\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "c592ad2c124e44d8abf7d6b77ab20639",
+       "model_id": "9cbcd5e9da344e3db753fbd37dfe1433",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "  0%|          | 0/1548 [00:00<?, ?it/s]"
+       "  0%|          | 0/732 [00:00<?, ?it/s]"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
     },
     {
-     "ename": "",
-     "evalue": "",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
-      "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
-      "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
-      "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 6.2285, 'grad_norm': 19.691848754882812, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.16}\n",
+      "{'loss': 4.1185, 'grad_norm': 5.937752723693848, 'learning_rate': 3.3333333333333335e-05, 'epoch': 0.33}\n",
+      "{'loss': 3.535, 'grad_norm': 6.225916862487793, 'learning_rate': 5e-05, 'epoch': 0.49}\n",
+      "{'loss': 3.3047, 'grad_norm': 5.6779255867004395, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.65}\n",
+      "{'loss': 3.1285, 'grad_norm': 4.134920597076416, 'learning_rate': 8.333333333333333e-05, 'epoch': 0.82}\n",
+      "{'loss': 2.9916, 'grad_norm': 4.114354133605957, 'learning_rate': 0.0001, 'epoch': 0.98}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "64adef62a75549dda40a1f7642e1d7c4",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.713097095489502, 'eval_runtime': 88.6904, 'eval_samples_per_second': 2.255, 'eval_steps_per_second': 0.282, 'epoch': 1.0}\n",
+      "{'loss': 2.8335, 'grad_norm': 3.389439582824707, 'learning_rate': 0.00011666666666666667, 'epoch': 1.14}\n",
+      "{'loss': 2.8309, 'grad_norm': 2.93381929397583, 'learning_rate': 0.00013333333333333334, 'epoch': 1.3}\n",
+      "{'loss': 2.7779, 'grad_norm': 3.458967685699463, 'learning_rate': 0.00015, 'epoch': 1.47}\n",
+      "{'loss': 2.7734, 'grad_norm': 2.757002830505371, 'learning_rate': 0.00016666666666666666, 'epoch': 1.63}\n",
+      "{'loss': 2.7185, 'grad_norm': 2.951125144958496, 'learning_rate': 0.00018333333333333334, 'epoch': 1.79}\n",
+      "{'loss': 2.6367, 'grad_norm': 2.8377842903137207, 'learning_rate': 0.0002, 'epoch': 1.96}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "73e1b68c12c84cb397b4a01d153d0466",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.771847724914551, 'eval_runtime': 87.665, 'eval_samples_per_second': 2.281, 'eval_steps_per_second': 0.285, 'epoch': 2.0}\n",
+      "{'loss': 2.5443, 'grad_norm': 2.7709591388702393, 'learning_rate': 0.00021666666666666668, 'epoch': 2.11}\n",
+      "{'loss': 2.6008, 'grad_norm': 2.589423894882202, 'learning_rate': 0.00023333333333333333, 'epoch': 2.28}\n",
+      "{'loss': 2.5744, 'grad_norm': 2.468208074569702, 'learning_rate': 0.00025, 'epoch': 2.44}\n",
+      "{'loss': 2.477, 'grad_norm': 3.0269947052001953, 'learning_rate': 0.00024868020482261805, 'epoch': 2.6}\n",
+      "{'loss': 2.4907, 'grad_norm': 2.3736724853515625, 'learning_rate': 0.0002447486890394361, 'epoch': 2.77}\n",
+      "{'loss': 2.4772, 'grad_norm': 2.330203056335449, 'learning_rate': 0.00023828847337958127, 'epoch': 2.93}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "dd60b7fe85af42b7a41993833dd748dc",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.848089694976807, 'eval_runtime': 87.1521, 'eval_samples_per_second': 2.295, 'eval_steps_per_second': 0.287, 'epoch': 3.0}\n",
+      "{'loss': 2.3498, 'grad_norm': 2.5652854442596436, 'learning_rate': 0.00022943597642661705, 'epoch': 3.09}\n",
+      "{'loss': 2.3245, 'grad_norm': 2.2860045433044434, 'learning_rate': 0.0002183781339051245, 'epoch': 3.25}\n",
+      "{'loss': 2.339, 'grad_norm': 2.232682704925537, 'learning_rate': 0.0002053484512108174, 'epoch': 3.42}\n",
+      "{'loss': 2.2986, 'grad_norm': 2.2428107261657715, 'learning_rate': 0.00019062207254182, 'epoch': 3.58}\n",
+      "{'loss': 2.2996, 'grad_norm': 2.464080572128296, 'learning_rate': 0.00017450997075489462, 'epoch': 3.74}\n",
+      "{'loss': 2.241, 'grad_norm': 2.0995633602142334, 'learning_rate': 0.00015735238063781508, 'epoch': 3.91}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a8eb54ca0c874f0ea058865d9fb42909",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.915830135345459, 'eval_runtime': 87.684, 'eval_samples_per_second': 2.281, 'eval_steps_per_second': 0.285, 'epoch': 4.0}\n",
+      "{'loss': 2.1507, 'grad_norm': 2.2546820640563965, 'learning_rate': 0.0001395116142656538, 'epoch': 4.07}\n",
+      "{'loss': 2.1271, 'grad_norm': 2.361213207244873, 'learning_rate': 0.00012136441015711107, 'epoch': 4.23}\n",
+      "{'loss': 2.1107, 'grad_norm': 2.0600945949554443, 'learning_rate': 0.00010329397779163371, 'epoch': 4.39}\n",
+      "{'loss': 2.0559, 'grad_norm': 2.0332272052764893, 'learning_rate': 8.568190548104832e-05, 'epoch': 4.56}\n",
+      "{'loss': 2.0615, 'grad_norm': 1.9290919303894043, 'learning_rate': 6.890010247494224e-05, 'epoch': 4.72}\n",
+      "{'loss': 2.0547, 'grad_norm': 1.9184967279434204, 'learning_rate': 5.330294545611927e-05, 'epoch': 4.88}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ea12f4a863af49df9a571889f91d5a9c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.9010701179504395, 'eval_runtime': 87.82, 'eval_samples_per_second': 2.277, 'eval_steps_per_second': 0.285, 'epoch': 5.0}\n",
+      "{'loss': 1.9642, 'grad_norm': 1.976872444152832, 'learning_rate': 3.9219795266408314e-05, 'epoch': 5.04}\n",
+      "{'loss': 1.9338, 'grad_norm': 1.9681077003479004, 'learning_rate': 2.6948041885053036e-05, 'epoch': 5.2}\n",
+      "{'loss': 1.9191, 'grad_norm': 1.9897916316986084, 'learning_rate': 1.6746824526945162e-05, 'epoch': 5.37}\n",
+      "{'loss': 1.901, 'grad_norm': 1.9622206687927246, 'learning_rate': 8.831559471647183e-06, 'epoch': 5.53}\n",
+      "{'loss': 1.9304, 'grad_norm': 1.8895570039749146, 'learning_rate': 3.3693911775220242e-06, 'epoch': 5.69}\n",
+      "{'loss': 1.9119, 'grad_norm': 1.924885630607605, 'learning_rate': 4.756627385318069e-07, 'epoch': 5.86}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "15d910bc912143f7ade75330d80c36d3",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.873990058898926, 'eval_runtime': 87.3604, 'eval_samples_per_second': 2.289, 'eval_steps_per_second': 0.286, 'epoch': 5.96}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "There were missing keys in the checkpoint model loaded: ['lm_head.weight'].\n"
      ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'train_runtime': 27515.7373, 'train_samples_per_second': 0.64, 'train_steps_per_second': 0.027, 'train_loss': 2.572689468091954, 'epoch': 5.96}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "('./models/gpt2-teacher/tokenizer_config.json',\n",
+       " './models/gpt2-teacher/special_tokens_map.json',\n",
+       " './models/gpt2-teacher/vocab.json',\n",
+       " './models/gpt2-teacher/merges.txt',\n",
+       " './models/gpt2-teacher/added_tokens.json',\n",
+       " './models/gpt2-teacher/tokenizer.json')"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -137,7 +321,7 @@
     "    )\n",
     "model = GPT2LMHeadModel(model_config)\n",
     "\n",
-    "output_dir = \"./gpt2-teacher\"\n",
+    "output_dir = \"./models/gpt2-teacher\"\n",
     "\n",
     "training_args = TrainingArguments(\n",
     "    output_dir=output_dir,\n",
@@ -146,7 +330,7 @@
     "    evaluation_strategy = \"epoch\",\n",
     "    num_train_epochs=6,\n",
     "    gradient_accumulation_steps=2,\n",
-    "    per_device_train_batch_size=256,\n",
+    "    per_device_train_batch_size=12,\n",
     "    save_total_limit=1,  # Set to zero to avoid saving\n",
     "    warmup_steps=300, \n",
     "    lr_scheduler_type=\"cosine\",\n",
@@ -155,7 +339,8 @@
     "    fp16=False,\n",
     "    load_best_model_at_end=True,\n",
     "    metric_for_best_model=\"eval_loss\",\n",
-    "    torch_compile = False\n",
+    "    torch_compile = False,\n",
+    "    no_cuda = True,   # we use CPU，显卡足够大的话可以改为False\n",
     ")\n",
     "\n",
     "trainer = Trainer(\n",
@@ -173,9 +358,158 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
+      "  warnings.warn(\n",
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1590: FutureWarning: using `no_cuda` is deprecated and will be removed in version 5.0 of 🤗 Transformers. Use `use_cpu` instead\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f837021695b64e958d03494c9cbe09b7",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/368 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 16.6259, 'grad_norm': 6.6172637939453125, 'learning_rate': 1.9999999999999998e-05, 'epoch': 0.22}\n",
+      "{'loss': 10.6818, 'grad_norm': 5.324873924255371, 'learning_rate': 3.9999999999999996e-05, 'epoch': 0.43}\n",
+      "{'loss': 8.1373, 'grad_norm': 2.213642120361328, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.65}\n",
+      "{'loss': 7.0195, 'grad_norm': 2.61836576461792, 'learning_rate': 7.999999999999999e-05, 'epoch': 0.87}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "d5113a38217a420da3b10ced3bf0886d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.424047946929932, 'eval_runtime': 44.2429, 'eval_samples_per_second': 4.52, 'eval_steps_per_second': 0.565, 'epoch': 1.0}\n",
+      "{'loss': 6.3157, 'grad_norm': 2.8587779998779297, 'learning_rate': 9.999999999999999e-05, 'epoch': 1.09}\n",
+      "{'loss': 5.9112, 'grad_norm': 2.363039970397949, 'learning_rate': 0.00011999999999999999, 'epoch': 1.3}\n",
+      "{'loss': 5.6126, 'grad_norm': 2.3234667778015137, 'learning_rate': 0.00014, 'epoch': 1.52}\n",
+      "{'loss': 5.4582, 'grad_norm': 1.9862345457077026, 'learning_rate': 0.00015999999999999999, 'epoch': 1.74}\n",
+      "{'loss': 5.2141, 'grad_norm': 2.734612464904785, 'learning_rate': 0.00017999999999999998, 'epoch': 1.96}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "d608b03ebde242feb67c0b26d3be02d6",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.415191650390625, 'eval_runtime': 44.059, 'eval_samples_per_second': 4.539, 'eval_steps_per_second': 0.567, 'epoch': 2.0}\n",
+      "{'loss': 5.0382, 'grad_norm': 1.8031628131866455, 'learning_rate': 0.00019999999999999998, 'epoch': 2.17}\n",
+      "{'loss': 4.913, 'grad_norm': 1.8266865015029907, 'learning_rate': 0.00021999999999999995, 'epoch': 2.39}\n",
+      "{'loss': 4.6786, 'grad_norm': 1.9183133840560913, 'learning_rate': 0.00023999999999999998, 'epoch': 2.61}\n",
+      "{'loss': 4.6846, 'grad_norm': 1.8740675449371338, 'learning_rate': 0.00026, 'epoch': 2.83}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0e2c202b9c05492abd8e532f5faa6f98",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.405597686767578, 'eval_runtime': 44.1624, 'eval_samples_per_second': 4.529, 'eval_steps_per_second': 0.566, 'epoch': 3.0}\n",
+      "{'loss': 4.5987, 'grad_norm': 1.6913737058639526, 'learning_rate': 0.00028, 'epoch': 3.04}\n",
+      "{'loss': 4.3079, 'grad_norm': 1.8193702697753906, 'learning_rate': 0.0003, 'epoch': 3.26}\n",
+      "{'loss': 4.3523, 'grad_norm': 1.8672795295715332, 'learning_rate': 0.00024039519545688846, 'epoch': 3.48}\n",
+      "{'loss': 4.2241, 'grad_norm': 1.3985413312911987, 'learning_rate': 0.00010895055148918756, 'epoch': 3.7}\n",
+      "{'loss': 4.0808, 'grad_norm': 1.3998284339904785, 'learning_rate': 1.0129165589346643e-05, 'epoch': 3.91}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "bf1f5252f575440586b3328f0479ead9",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 6.422872543334961, 'eval_runtime': 43.8794, 'eval_samples_per_second': 4.558, 'eval_steps_per_second': 0.57, 'epoch': 4.0}\n",
+      "{'train_runtime': 8266.189, 'train_samples_per_second': 1.42, 'train_steps_per_second': 0.045, 'train_loss': 6.166872097098309, 'epoch': 4.0}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "('./models/llama-teacher/tokenizer_config.json',\n",
+       " './models/llama-teacher/special_tokens_map.json',\n",
+       " './models/llama-teacher/vocab.json',\n",
+       " './models/llama-teacher/merges.txt',\n",
+       " './models/llama-teacher/added_tokens.json',\n",
+       " './models/llama-teacher/tokenizer.json')"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "# 训练Llama模型\n",
     "from transformers import (\n",
@@ -194,7 +528,7 @@
     "    )\n",
     "model = LlamaForCausalLM(model_config)\n",
     "\n",
-    "output_dir = \"./gpt2-teacher\"\n",
+    "output_dir = \"./models/llama-teacher\"\n",
     "\n",
     "training_args = TrainingArguments(\n",
     "    output_dir=output_dir,\n",
@@ -203,7 +537,7 @@
     "    evaluation_strategy = \"epoch\",\n",
     "    num_train_epochs=4,\n",
     "    gradient_accumulation_steps=2,\n",
-    "    per_device_train_batch_size=128,\n",
+    "    per_device_train_batch_size=16,\n",
     "    save_total_limit=1,  # Set to zero to avoid saving\n",
     "    warmup_steps=300, \n",
     "    lr_scheduler_type=\"cosine\",\n",
@@ -212,7 +546,8 @@
     "    fp16=False,\n",
     "    load_best_model_at_end=True,\n",
     "    metric_for_best_model=\"eval_loss\",\n",
-    "    torch_compile = False\n",
+    "    torch_compile = False,\n",
+    "    no_cuda=True   # we use CPU，显卡足够大的话可以改为False\n",
     ")\n",
     "\n",
     "trainer = Trainer(\n",
@@ -245,7 +580,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,
diff --git a/docs/chapter2/code/BabyLlama/3.distill.ipynb b/docs/chapter2/code/BabyLlama/3.distill.ipynb
index 0ebc848..65f050b 100644
--- a/docs/chapter2/code/BabyLlama/3.distill.ipynb
+++ b/docs/chapter2/code/BabyLlama/3.distill.ipynb
@@ -16,9 +16,30 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loading data from data/train_10M_clean/tokenized_GPT2TokenizerFast_16000.pt\n",
+      "🔥 数据集总大小: 16912909\n",
+      "🔥 为了缩短训练时间，这里缩减为: 375842\n",
+      "Loading data from data/dev_clean/tokenized_GPT2TokenizerFast_16000.pt\n",
+      "🔥 数据集总大小: 17428872\n",
+      "🔥 为了缩短训练时间，这里缩减为: 87144\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/PJLAB/gaoyufei/workdir/llm-deploy/docs/chapter2/code/BabyLlama/babylm_dataset.py:31: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n",
+      "  self.data = torch.load(tokenized_file)\n"
+     ]
+    }
+   ],
    "source": [
     "from transformers import (\n",
     "    GPT2TokenizerFast,\n",
@@ -47,15 +68,15 @@
     "ALPHA = 0.5\n",
     "#############\n",
     "\n",
-    "teacher_dir1 = './models/Llama-360M'\n",
-    "teacher_dir2 = './models/gpt-705M'\n",
+    "teacher_dir1 = './models/llama-teacher'\n",
+    "teacher_dir2 = './models/gpt2-teacher'\n",
     "\n",
     "\n",
     "MODEL_NAME = f'Baby-Llama-58M'\n",
     "MODEL_OUTPUT = Path('./models') /  MODEL_NAME\n",
-    "EVAL_SAMPLES = 1000\n",
+    "EVAL_SAMPLES = 500\n",
     "\n",
-    "tokenizer_path = \"F:/llm-deploy-data/data/Babyllama/models/gpt-clean-16000.json\"\n",
+    "tokenizer_path = \"./models/gpt-clean-16000.json\"\n",
     "tokenizer = GPT2TokenizerFast(tokenizer_file= str(tokenizer_path))\n",
     "tokenizer.bos_token = \"<s>\"\n",
     "tokenizer.eos_token = \"</s>\"\n",
@@ -63,8 +84,8 @@
     "\n",
     "# in the original code I had random_chunk = False\n",
     "# random_chunk=True is expected to improve the model performance a bit\n",
-    "data_train_path = \"F:/llm-deploy-data/data/Babyllama/babylm_10M_clean\"\n",
-    "data_eval_path = \"F:/llm-deploy-data/data/Babyllama/babylm_dev_clean\"\n",
+    "data_train_path = \"./data/train_10M_clean\"\n",
+    "data_eval_path = \"./data/dev_clean\"\n",
     "train_dataset = BabylmDataset(data_train_path, SEQ_LENGTH, tokenizer=tokenizer, random_chunk=True)\n",
     "full_eval_dataset = BabylmDataset(data_eval_path, SEQ_LENGTH, tokenizer=tokenizer, offset=0)\n",
     "\n",
@@ -75,9 +96,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model num parameters: student = 58343936\n",
+      "model num parameters: teacher1 = 359973888\n",
+      "model num parameters: teacher2 = 704928768\n"
+     ]
+    }
+   ],
    "source": [
     "tokenizer.model_max_length = SEQ_LENGTH\n",
     "\n",
@@ -168,9 +199,209 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
+      "  warnings.warn(\n",
+      "/home/PJLAB/gaoyufei/anaconda3/envs/babyllama/lib/python3.9/site-packages/transformers/training_args.py:1590: FutureWarning: using `no_cuda` is deprecated and will be removed in version 5.0 of 🤗 Transformers. Use `use_cpu` instead\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1b8d70e994da4e99beeda2d119bd1f66",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/552 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 394.2176, 'grad_norm': 197.1733856201172, 'learning_rate': 2.5e-05, 'epoch': 0.22}\n",
+      "{'loss': 320.1827, 'grad_norm': 207.2023468017578, 'learning_rate': 5e-05, 'epoch': 0.43}\n",
+      "{'loss': 261.6258, 'grad_norm': 215.23489379882812, 'learning_rate': 7.5e-05, 'epoch': 0.65}\n",
+      "{'loss': 162.9615, 'grad_norm': 175.06642150878906, 'learning_rate': 0.0001, 'epoch': 0.87}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "16d1cd34945a42bc882b362302ffda6f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 58.3864860534668, 'eval_runtime': 352.6741, 'eval_samples_per_second': 1.418, 'eval_steps_per_second': 0.179, 'epoch': 1.0}\n",
+      "{'loss': 109.23, 'grad_norm': 167.6467742919922, 'learning_rate': 0.000125, 'epoch': 1.09}\n",
+      "{'loss': 72.0003, 'grad_norm': 139.8084259033203, 'learning_rate': 0.00015, 'epoch': 1.3}\n",
+      "{'loss': 40.9822, 'grad_norm': 101.28172302246094, 'learning_rate': 0.000175, 'epoch': 1.52}\n",
+      "{'loss': 19.4506, 'grad_norm': 24.261682510375977, 'learning_rate': 0.0002, 'epoch': 1.74}\n",
+      "{'loss': 11.7896, 'grad_norm': 33.13188171386719, 'learning_rate': 0.00022500000000000002, 'epoch': 1.96}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "388c33324c1e402ca4abf28f2075ac9c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 16.758838653564453, 'eval_runtime': 352.6818, 'eval_samples_per_second': 1.418, 'eval_steps_per_second': 0.179, 'epoch': 2.0}\n",
+      "{'loss': 8.6971, 'grad_norm': 17.671709060668945, 'learning_rate': 0.00025, 'epoch': 2.17}\n",
+      "{'loss': 6.6607, 'grad_norm': 11.907523155212402, 'learning_rate': 0.0002480139005420145, 'epoch': 2.39}\n",
+      "{'loss': 5.2117, 'grad_norm': 5.838933944702148, 'learning_rate': 0.00024211871562497024, 'epoch': 2.61}\n",
+      "{'loss': 4.5462, 'grad_norm': 6.678253650665283, 'learning_rate': 0.00023250178002596255, 'epoch': 2.83}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "8b6f81636ec44af4a34a58f75b3bd7aa",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 10.838906288146973, 'eval_runtime': 352.0537, 'eval_samples_per_second': 1.42, 'eval_steps_per_second': 0.179, 'epoch': 3.0}\n",
+      "{'loss': 4.1238, 'grad_norm': 9.188096046447754, 'learning_rate': 0.0002194686967942823, 'epoch': 3.04}\n",
+      "{'loss': 3.6455, 'grad_norm': 4.341468334197998, 'learning_rate': 0.0002034336259226065, 'epoch': 3.26}\n",
+      "{'loss': 3.399, 'grad_norm': 5.3074846267700195, 'learning_rate': 0.0001849061233400071, 'epoch': 3.48}\n",
+      "{'loss': 3.2133, 'grad_norm': 4.197958469390869, 'learning_rate': 0.00016447494845187814, 'epoch': 3.7}\n",
+      "{'loss': 3.0286, 'grad_norm': 4.915552616119385, 'learning_rate': 0.00014278935478416067, 'epoch': 3.91}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "759590c6c1c2452c966e27a3dfffe641",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 10.316187858581543, 'eval_runtime': 352.3373, 'eval_samples_per_second': 1.419, 'eval_steps_per_second': 0.179, 'epoch': 4.0}\n",
+      "{'loss': 2.8323, 'grad_norm': 4.201409816741943, 'learning_rate': 0.00012053845827012746, 'epoch': 4.13}\n",
+      "{'loss': 2.7643, 'grad_norm': 4.0402913093566895, 'learning_rate': 9.842933880587791e-05, 'epoch': 4.35}\n",
+      "{'loss': 2.5842, 'grad_norm': 3.586437702178955, 'learning_rate': 7.716457095436378e-05, 'epoch': 4.57}\n",
+      "{'loss': 2.5513, 'grad_norm': 3.4587912559509277, 'learning_rate': 5.741989781805035e-05, 'epoch': 4.78}\n",
+      "{'loss': 2.5102, 'grad_norm': 3.301051378250122, 'learning_rate': 3.98227575507636e-05, 'epoch': 5.0}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0ccddada352b41648d0c2f9594c0432b",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 9.898730278015137, 'eval_runtime': 352.8048, 'eval_samples_per_second': 1.417, 'eval_steps_per_second': 0.179, 'epoch': 5.0}\n",
+      "{'loss': 2.4339, 'grad_norm': 2.2749125957489014, 'learning_rate': 2.4932344884454963e-05, 'epoch': 5.22}\n",
+      "{'loss': 2.3826, 'grad_norm': 2.400008201599121, 'learning_rate': 1.3221841267536088e-05, 'epoch': 5.43}\n",
+      "{'loss': 2.3744, 'grad_norm': 2.3529045581817627, 'learning_rate': 5.063378298187843e-06, 'epoch': 5.65}\n",
+      "{'loss': 2.3903, 'grad_norm': 1.7968189716339111, 'learning_rate': 7.162122785128316e-07, 'epoch': 5.87}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b24357639f30483fa6b4dc31c8c2437e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/63 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 9.67468547821045, 'eval_runtime': 352.9349, 'eval_samples_per_second': 1.417, 'eval_steps_per_second': 0.179, 'epoch': 6.0}\n",
+      "{'train_runtime': 16646.7096, 'train_samples_per_second': 1.058, 'train_steps_per_second': 0.033, 'train_loss': 52.86958230751148, 'epoch': 6.0}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "('models/Baby-Llama-58M/tokenizer_config.json',\n",
+       " 'models/Baby-Llama-58M/special_tokens_map.json',\n",
+       " 'models/Baby-Llama-58M/vocab.json',\n",
+       " 'models/Baby-Llama-58M/merges.txt',\n",
+       " 'models/Baby-Llama-58M/added_tokens.json',\n",
+       " 'models/Baby-Llama-58M/tokenizer.json')"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "training_args = DistillationTrainingArguments(\n",
     "    output_dir=MODEL_OUTPUT,\n",
@@ -181,7 +412,6 @@
     "    gradient_accumulation_steps=1,\n",
     "    per_device_train_batch_size=BATCH_SIZE,\n",
     "    save_total_limit=1,  # Set to zero to avoid saving\n",
-    "    report_to=\"wandb\",\n",
     "    warmup_steps=200, \n",
     "    lr_scheduler_type=\"cosine\",\n",
     "    learning_rate=LR,\n",
@@ -192,6 +422,7 @@
     "    weight_decay=0.1,\n",
     "    alpha=ALPHA,\n",
     "    temperature=TEMPERATURE,\n",
+    "    no_cuda=True,\n",
     ")\n",
     "\n",
     "\n",
@@ -221,8 +452,16 @@
    "name": "python3"
   },
   "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
    "name": "python",
-   "version": "3.9.18"
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,
diff --git a/docs/chapter2/code/BabyLlama/babylm_dataset.py b/docs/chapter2/code/BabyLlama/babylm_dataset.py
index fdd8fc5..a9d33b6 100644
--- a/docs/chapter2/code/BabyLlama/babylm_dataset.py
+++ b/docs/chapter2/code/BabyLlama/babylm_dataset.py
@@ -29,6 +29,12 @@ def __init__(self, data_dir: str, seq_length: int, tokenizer, offset: int=0, ran
         if tokenized_file.exists():
             print(f"Loading data from {tokenized_file}")
             self.data = torch.load(tokenized_file)
+            print("🔥", "数据集总大小:", len(self.data))
+            if "train" in data_dir:
+                self.data = self.data[:int((len(self.data) / 45 ))]
+            if "dev" in data_dir:
+                self.data = self.data[:int((len(self.data) / 200 ))]
+            print("🔥", "为了缩短训练时间，这里缩减为:", len(self.data))
         else:
             data = []
             src_files = [str(f) for f in Path(data_dir).glob("**/*")
diff --git a/docs/chapter2/code/CoT/CoT_distill.ipynb b/docs/chapter2/code/CoT/CoT_distill.ipynb
index 940f0c1..0be1bd3 100644
--- a/docs/chapter2/code/CoT/CoT_distill.ipynb
+++ b/docs/chapter2/code/CoT/CoT_distill.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -17,22 +17,43 @@
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: transformers==4.28.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (4.28.1)\n",
+      "Requirement already satisfied: filelock in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (3.13.1)\n",
+      "Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (0.26.3)\n",
+      "Requirement already satisfied: numpy>=1.17 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (1.26.3)\n",
+      "Requirement already satisfied: packaging>=20.0 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (23.2)\n",
+      "Requirement already satisfied: pyyaml>=5.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (6.0.1)\n",
+      "Requirement already satisfied: regex!=2019.12.17 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (2023.12.25)\n",
+      "Requirement already satisfied: requests in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (2.31.0)\n",
+      "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (0.13.3)\n",
+      "Requirement already satisfied: tqdm>=4.27 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from transformers==4.28.1) (4.66.1)\n",
+      "Requirement already satisfied: fsspec>=2023.5.0 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (2023.6.0)\n",
+      "Requirement already satisfied: typing-extensions>=3.7.4.3 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (4.9.0)\n",
+      "Requirement already satisfied: colorama in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from tqdm>=4.27->transformers==4.28.1) (0.4.6)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->transformers==4.28.1) (3.3.2)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->transformers==4.28.1) (3.6)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->transformers==4.28.1) (2.1.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->transformers==4.28.1) (2023.11.17)\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
     }
-   },
+   ],
    "source": [
-    "### 在命令行安装必要包\n",
-    "```\n",
-    "pip install -r code/requirements.txt\n",
-    "```"
+    "%pip install transformers==4.28.1\n",
+    "%pip install torch==2.1.2"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -47,22 +68,202 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 下载InstructGPT作为教师模型生成的数据\n",
+    "# https://github.com/gyfffffff/llm-deploy/releases/tag/cot_data\n",
+    "# 将下载的三个json文件放入同级目录data下"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "^C\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "�ļ�����Ŀ¼��������﷨����ȷ��\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "F:\\llm-deploy\\docs\\chapter2\\models\\GPT-2\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "e:\\Anaconda\\envs\\dl2\\lib\\site-packages\\huggingface_hub\\file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n",
+      "\n",
+      "Fetching 26 files:   0%|          | 0/26 [00:00<?, ?it/s]\n",
+      "Fetching 26 files: 100%|██████████| 26/26 [00:00<00:00, 308.12it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: huggingface-hub[cli] in e:\\anaconda\\envs\\dl2\\lib\\site-packages (0.26.3)\n",
+      "Requirement already satisfied: filelock in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (3.13.1)\n",
+      "Requirement already satisfied: fsspec>=2023.5.0 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (2023.6.0)\n",
+      "Requirement already satisfied: packaging>=20.9 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (23.2)\n",
+      "Requirement already satisfied: pyyaml>=5.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (6.0.1)\n",
+      "Requirement already satisfied: requests in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (2.31.0)\n",
+      "Requirement already satisfied: tqdm>=4.42.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (4.66.1)\n",
+      "Requirement already satisfied: typing-extensions>=3.7.4.3 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (4.9.0)\n",
+      "Requirement already satisfied: InquirerPy==0.3.4 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (0.3.4)\n",
+      "Requirement already satisfied: pfzy<0.4.0,>=0.3.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from InquirerPy==0.3.4->huggingface-hub[cli]) (0.3.4)\n",
+      "Requirement already satisfied: prompt-toolkit<4.0.0,>=3.0.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from InquirerPy==0.3.4->huggingface-hub[cli]) (3.0.43)\n",
+      "Requirement already satisfied: colorama in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from tqdm>=4.42.1->huggingface-hub[cli]) (0.4.6)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (3.3.2)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (3.6)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (2.1.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (2023.11.17)\n",
+      "Requirement already satisfied: wcwidth in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from prompt-toolkit<4.0.0,>=3.0.1->InquirerPy==0.3.4->huggingface-hub[cli]) (0.2.13)\n",
+      "Requirement already satisfied: huggingface-hub[cli] in e:\\anaconda\\envs\\dl2\\lib\\site-packages (0.26.3)\n",
+      "Requirement already satisfied: filelock in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (3.13.1)\n",
+      "Requirement already satisfied: fsspec>=2023.5.0 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (2023.6.0)\n",
+      "Requirement already satisfied: packaging>=20.9 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (23.2)\n",
+      "Requirement already satisfied: pyyaml>=5.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (6.0.1)\n",
+      "Requirement already satisfied: requests in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (2.31.0)\n",
+      "Requirement already satisfied: tqdm>=4.42.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (4.66.1)\n",
+      "Requirement already satisfied: typing-extensions>=3.7.4.3 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (4.9.0)\n",
+      "Requirement already satisfied: InquirerPy==0.3.4 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from huggingface-hub[cli]) (0.3.4)\n",
+      "Requirement already satisfied: pfzy<0.4.0,>=0.3.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from InquirerPy==0.3.4->huggingface-hub[cli]) (0.3.4)\n",
+      "Requirement already satisfied: prompt-toolkit<4.0.0,>=3.0.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from InquirerPy==0.3.4->huggingface-hub[cli]) (3.0.43)\n",
+      "Requirement already satisfied: colorama in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from tqdm>=4.42.1->huggingface-hub[cli]) (0.4.6)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (3.3.2)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (3.6)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (2.1.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from requests->huggingface-hub[cli]) (2023.11.17)\n",
+      "Requirement already satisfied: wcwidth in e:\\anaconda\\envs\\dl2\\lib\\site-packages (from prompt-toolkit<4.0.0,>=3.0.1->InquirerPy==0.3.4->huggingface-hub[cli]) (0.2.13)\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "'export' �����ڲ����ⲿ���Ҳ���ǿ����еĳ���\n",
+      "���������ļ���\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "F:\\llm-deploy\\docs\\chapter2\\models\\GPT-2\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "e:\\Anaconda\\envs\\dl2\\lib\\site-packages\\huggingface_hub\\file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n",
+      "\n",
+      "Fetching 26 files:   0%|          | 0/26 [00:00<?, ?it/s]\n",
+      "Fetching 26 files: 100%|██████████| 26/26 [00:00<00:00, 4682.75it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 下载模型\n",
+    "\n",
+    "# windows\n",
+    "%pip install -U \"huggingface-hub[cli]\"\n",
+    "!$env:HF_ENDPOINT = \"https://hf-mirror.com\"\n",
+    "!huggingface-cli download --resume-download openai-community/gpt2 --local-dir ../../models/GPT-2\n",
+    "\n",
+    "# linux\n",
+    "%pip install -U \"huggingface-hub[cli]\"\n",
+    "!export HF_ENDPOINT=https://hf-mirror.com\n",
+    "!huggingface-cli download --resume-download openai-community/gpt2 --local-dir ../../models/GPT-2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {
-     "ename": "TypeError",
-     "evalue": "Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`.",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[2], line 98\u001b[0m\n\u001b[1;32m     95\u001b[0m     dataset\u001b[38;5;241m.\u001b[39mset_format(\u001b[38;5;28mtype\u001b[39m\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtorch\u001b[39m\u001b[38;5;124m\"\u001b[39m, columns\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mattention_mask\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlabels\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m     96\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m DataLoader(dataset, batch_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m8\u001b[39m, shuffle\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[0;32m---> 98\u001b[0m train_dataloader \u001b[38;5;241m=\u001b[39m \u001b[43mdataloader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtrain_data_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     99\u001b[0m val_dataloader \u001b[38;5;241m=\u001b[39m dataloader(val_data_path)\n\u001b[1;32m    100\u001b[0m test_dataloader \u001b[38;5;241m=\u001b[39m dataloader(test_data_path)\n",
-      "Cell \u001b[0;32mIn[2], line 94\u001b[0m, in \u001b[0;36mdataloader\u001b[0;34m(path)\u001b[0m\n\u001b[1;32m     92\u001b[0m flat_data \u001b[38;5;241m=\u001b[39m get_flat_data(data)\n\u001b[1;32m     93\u001b[0m dataset \u001b[38;5;241m=\u001b[39m datasets\u001b[38;5;241m.\u001b[39mDataset\u001b[38;5;241m.\u001b[39mfrom_dict(flat_data)\n\u001b[0;32m---> 94\u001b[0m dataset \u001b[38;5;241m=\u001b[39m \u001b[43mdataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmap\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtokenize\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbatched\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbatch_size\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mlen\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mdataset\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     95\u001b[0m dataset\u001b[38;5;241m.\u001b[39mset_format(\u001b[38;5;28mtype\u001b[39m\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtorch\u001b[39m\u001b[38;5;124m\"\u001b[39m, columns\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mattention_mask\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlabels\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m     96\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m DataLoader(dataset, batch_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m8\u001b[39m, shuffle\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/datasets/arrow_dataset.py:1289\u001b[0m, in \u001b[0;36mDataset.map\u001b[0;34m(self, function, with_indices, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint)\u001b[0m\n\u001b[1;32m   1287\u001b[0m test_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m[:\u001b[38;5;241m2\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m batched \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m   1288\u001b[0m test_indices \u001b[38;5;241m=\u001b[39m [\u001b[38;5;241m0\u001b[39m, \u001b[38;5;241m1\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m batched \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m-> 1289\u001b[0m update_data \u001b[38;5;241m=\u001b[39m \u001b[43mdoes_function_return_dict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtest_inputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtest_indices\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1290\u001b[0m logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTesting finished, running the mapping function on the dataset\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1292\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m num_proc \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m num_proc \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/datasets/arrow_dataset.py:1276\u001b[0m, in \u001b[0;36mDataset.map.<locals>.does_function_return_dict\u001b[0;34m(inputs, indices)\u001b[0m\n\u001b[1;32m   1272\u001b[0m     all_dict_values_are_lists \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mall\u001b[39m(\n\u001b[1;32m   1273\u001b[0m         \u001b[38;5;28misinstance\u001b[39m(value, allowed_batch_return_types) \u001b[38;5;28;01mfor\u001b[39;00m value \u001b[38;5;129;01min\u001b[39;00m processed_inputs\u001b[38;5;241m.\u001b[39mvalues()\n\u001b[1;32m   1274\u001b[0m     )\n\u001b[1;32m   1275\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m all_dict_values_are_lists \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mFalse\u001b[39;00m:\n\u001b[0;32m-> 1276\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(\n\u001b[1;32m   1277\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mProvided `function` which is applied to all elements of table returns a `dict` of types \u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m. When using `batched=True`, make sure provided `function` returns a `dict` of types like `\u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m`.\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mformat(\n\u001b[1;32m   1278\u001b[0m                 [\u001b[38;5;28mtype\u001b[39m(x) \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m processed_inputs\u001b[38;5;241m.\u001b[39mvalues()], allowed_batch_return_types\n\u001b[1;32m   1279\u001b[0m             )\n\u001b[1;32m   1280\u001b[0m         )\n\u001b[1;32m   1282\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m does_return_dict\n",
-      "\u001b[0;31mTypeError\u001b[0m: Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`."
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "data length: 315\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "259b619cfda54c36bda7c0701faaff07",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/343 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "data length: 19\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3336146116a24a0a8681d9b52197b339",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/19 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "data length: 28\n"
      ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "686c54b0493a469181665e563291294c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/32 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
     }
    ],
    "source": [
@@ -75,14 +276,15 @@
     "model = GPT2LMHeadModel.from_pretrained(model_name_or_path)\n",
     "\n",
     "# 定义数据路径和dataloader\n",
-    "train_data_path = \"../../data/F_zs_cot_date_understanding_good_train.jsonl\"\n",
-    "val_data_path = \"../../data/F_zs_cot_date_understanding_good_val.jsonl\"\n",
-    "test_data_path = \"../../data/F_zs_cot_date_understanding_good_test.jsonl\"\n",
+    "train_data_path = \"data/F_zs_cot_date_understanding_good_train.jsonl\"\n",
+    "val_data_path = \"data/F_zs_cot_date_understanding_good_val.jsonl\"\n",
+    "test_data_path = \"data/F_zs_cot_date_understanding_good_test.jsonl\"\n",
     "\n",
     "def load_finetune_data(path):\n",
     "    if os.path.exists(path):\n",
     "        with open(path) as f:\n",
     "            data = json.load(f)\n",
+    "        print(f\"data length: {len(data['data'])}\")\n",
     "        return data\n",
     "    else:\n",
     "        return None\n",
@@ -146,7 +348,7 @@
     "            # Why? https://github.com/huggingface/transformers/issues/3021#issuecomment-1231526631\n",
     "            input_ids[i] = [tokenizer.pad_token_id] * l + input_ids[i]\n",
     "            attention_mask[i] = [0] * l + attention_mask[i]\n",
-    "            label_ids[i] = [-100] * l + label_ids[i]\n",
+    "            label_ids[i] = [tokenizer.pad_token_id] * l + label_ids[i]\n",
     "        return {\n",
     "            \"input_ids\": torch.tensor(input_ids, dtype=torch.long),\n",
     "            \"attention_mask\": torch.tensor(attention_mask, dtype=torch.long),\n",
@@ -173,9 +375,109 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 19,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Step: 0, Loss: 5.864531517028809\n",
+      "Step: 10, Loss: 3.1689412593841553\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">C:\\Users\\charo\\AppData\\Local\\Temp\\ipykernel_8108\\3851600502.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">33</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">&lt;module&gt;</span>                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">30 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>val()                                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">31 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>torch.save(model.state_dict(), <span style=\"color: #808000; text-decoration-color: #808000\">\"outputs/model_epoch_{}.pt\"</span>.format(i))               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">32 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>33 train()                                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">34 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">C:\\Users\\charo\\AppData\\Local\\Temp\\ipykernel_8108\\3851600502.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">24</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">train</span>                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">21 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">for</span> i <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">in</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">range</span>(<span style=\"color: #0000ff; text-decoration-color: #0000ff\">10</span>):                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">22 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">for</span> step, batch <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">in</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">enumerate</span>(train_dataloader):                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">23 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>loss = train_step(batch)                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>24 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>loss.backward()                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">25 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1.0</span>)                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">26 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>optimizer.step()                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">27 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>optimizer.zero_grad()                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\_tensor.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">492</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">backward</span>                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 489 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>create_graph=create_graph,                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 490 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>inputs=inputs,                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 491 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>)                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 492 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>torch.autograd.backward(                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 493 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, gradient, retain_graph, create_graph, inputs=inputs                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 494 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>)                                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 495 </span>                                                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\autograd\\__init__.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">251</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">backward</span>                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">248 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># The reason we repeat the same comment below is that</span>                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">249 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># some Python versions print out the first line of a multi-line function</span>               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">250 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># calls in the traceback and some print out the last line</span>                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>251 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>Variable._execution_engine.run_backward(  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># Calls into the C++ engine to run the bac</span>   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">252 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>tensors,                                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">253 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>grad_tensors_,                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">254 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>retain_graph,                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">KeyboardInterrupt</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33mC:\\Users\\charo\\AppData\\Local\\Temp\\ipykernel_8108\\3851600502.py\u001b[0m:\u001b[94m33\u001b[0m in \u001b[92m<module>\u001b[0m                    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m30 \u001b[0m\u001b[2m│   │   \u001b[0mval()                                                                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m31 \u001b[0m\u001b[2m│   │   \u001b[0mtorch.save(model.state_dict(), \u001b[33m\"\u001b[0m\u001b[33moutputs/model_epoch_\u001b[0m\u001b[33m{}\u001b[0m\u001b[33m.pt\u001b[0m\u001b[33m\"\u001b[0m.format(i))               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m32 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m33 train()                                                                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m34 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33mC:\\Users\\charo\\AppData\\Local\\Temp\\ipykernel_8108\\3851600502.py\u001b[0m:\u001b[94m24\u001b[0m in \u001b[92mtrain\u001b[0m                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m21 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mfor\u001b[0m i \u001b[95min\u001b[0m \u001b[96mrange\u001b[0m(\u001b[94m10\u001b[0m):                                                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m22 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mfor\u001b[0m step, batch \u001b[95min\u001b[0m \u001b[96menumerate\u001b[0m(train_dataloader):                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m23 \u001b[0m\u001b[2m│   │   │   \u001b[0mloss = train_step(batch)                                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m24 \u001b[2m│   │   │   \u001b[0mloss.backward()                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m25 \u001b[0m\u001b[2m│   │   │   \u001b[0mtorch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=\u001b[94m1.0\u001b[0m)                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m26 \u001b[0m\u001b[2m│   │   │   \u001b[0moptimizer.step()                                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m27 \u001b[0m\u001b[2m│   │   │   \u001b[0moptimizer.zero_grad()                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\_tensor.py\u001b[0m:\u001b[94m492\u001b[0m in \u001b[92mbackward\u001b[0m                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 489 \u001b[0m\u001b[2m│   │   │   │   \u001b[0mcreate_graph=create_graph,                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 490 \u001b[0m\u001b[2m│   │   │   │   \u001b[0minputs=inputs,                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 491 \u001b[0m\u001b[2m│   │   │   \u001b[0m)                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 492 \u001b[2m│   │   \u001b[0mtorch.autograd.backward(                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 493 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[96mself\u001b[0m, gradient, retain_graph, create_graph, inputs=inputs                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 494 \u001b[0m\u001b[2m│   │   \u001b[0m)                                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 495 \u001b[0m                                                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m:\u001b[94m251\u001b[0m in \u001b[92mbackward\u001b[0m                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m248 \u001b[0m\u001b[2m│   \u001b[0m\u001b[2m# The reason we repeat the same comment below is that\u001b[0m                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m249 \u001b[0m\u001b[2m│   \u001b[0m\u001b[2m# some Python versions print out the first line of a multi-line function\u001b[0m               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m250 \u001b[0m\u001b[2m│   \u001b[0m\u001b[2m# calls in the traceback and some print out the last line\u001b[0m                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m251 \u001b[2m│   \u001b[0mVariable._execution_engine.run_backward(  \u001b[2m# Calls into the C++ engine to run the bac\u001b[0m   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m252 \u001b[0m\u001b[2m│   │   \u001b[0mtensors,                                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m253 \u001b[0m\u001b[2m│   │   \u001b[0mgrad_tensors_,                                                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m254 \u001b[0m\u001b[2m│   │   \u001b[0mretain_graph,                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
+       "\u001b[1;91mKeyboardInterrupt\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "def train_step(batch):\n",
     "    kwargs = {\n",
@@ -192,29 +494,42 @@
     "        for step, batch in enumerate(val_dataloader):\n",
     "            loss = train_step(batch)\n",
     "            print(\"Val Step: {}, Loss: {}\".format(step, loss.item()))\n",
+    "\n",
     "def train():\n",
     "    model.to(device)\n",
     "    model.train()\n",
     "    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)\n",
-    "    for i in range(20):\n",
+    "    for i in range(10):\n",
     "        for step, batch in enumerate(train_dataloader):\n",
     "            loss = train_step(batch)\n",
     "            loss.backward()\n",
     "            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)\n",
     "            optimizer.step()\n",
     "            optimizer.zero_grad()\n",
-    "            if step % 4 == 0:\n",
+    "            if step % 10 == 0:\n",
     "                print(\"Step: {}, Loss: {}\".format(step, loss.item()))\n",
-    "                torch.save(model.state_dict(), \"model_step_{}.pt\".format(i))\n",
-    "                val()\n",
+    "        val()\n",
+    "        torch.save(model.state_dict(), \"outputs/model_epoch_{}.pt\".format(i))\n",
+    "            \n",
     "train()\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Test Step: 0, Loss: 2.2565512657165527\n",
+      "Test Step: 1, Loss: 2.2542436122894287\n",
+      "Test Step: 2, Loss: 1.8640873432159424\n",
+      "Test Step: 3, Loss: 2.0942330360412598\n"
+     ]
+    }
+   ],
    "source": [
     "\n",
     "def test(ckpt_path=\"\"):\n",
@@ -234,6 +549,3068 @@
     "\n",
     "test()"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date one year ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/18/2016, (B) 06/25/2016, (C) 06/22/2016, (D) 09/18/2016, (E) 06/18/2030, (F) 07/18/2015. ###  \n",
+      "\n",
+      "May 9th, 2017 is 5/9/2017. \n",
+      "\n",
+      "If we go back one year from that, it would be 5/9/2016. \n",
+      "\n",
+      "But Jane ate one egg per day, so 40 days before 5/9/2016 would be 4/19/2016. \n",
+      "\n",
+      "Therefore, the answer is 4/19/2016, or (C). --> A\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 5/9/2017. \n",
+      "\n",
+      "If we go back one year from that, it would be 5/9/2016. \n",
+      "\n",
+      "But Jane ate one egg per day, so 40 days before 5/9/2016 would be 4/19/2016. \n",
+      "\n",
+      "Therefore, the answer is 4/19/2016, or (C). --> A\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheThe On May 9th the, is 5 people on The has 50 hour hour. The was ate.. information. The was new next of person?? Jane? her MMF/DDYYYY Jane\n",
+      " page is it from\n",
+      " 1 are\n",
+      "1 A\n",
+      " 1029/2017 1 orA) 06/18/2016, orC) 06/25/2016, andD) 06/25/2016, (E)\n",
+      "/25/201650, (F) 06/18/2016,\n",
+      "  Jane\n",
+      "\n",
+      "Jane 9th is 2017 is a years29th2017. \n",
+      "\n",
+      "The we add back 5 month, today, we is be 5/8/2017. \n",
+      "\n",
+      "If 5 has one every. day. which 5 days is 5/9/2017 was be 5/9/2016. \n",
+      "\n",
+      "Therefore, 5 answer is (/19/2016. or (F) --> D -->\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\", 2017\" said\" eggs.\" said.\" one per. she.\" she.\".\".\".\" Today she is time?\" one year ago from from today MM//DD/YY?\n",
+      "\n",
+      " choice\n",
+      "\n",
+      "\n",
+      " Answer answers: ()\n",
+      " 06\n",
+      "\n",
+      "16\n",
+      ", () 06/18/2016, (C) 06/25 222016, (D) 06/01182016, (E) 09/01/2016 2030 (F) 06/01/2016,. ###  \n",
+      "\n",
+      "\n",
+      " 9\n",
+      "\n",
+      "\n",
+      " is\n",
+      "//17\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " could go to one ago today that, it would it/9/2017,\n",
+      "\n",
+      "IfIf if if one per. day. it so eggs ago it/9/2016. would 5 4//19. \n",
+      "\n",
+      "But Jane Jane first choice 1/19/2016. it (B)\n",
+      "  06\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date a month ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 12/31/2108, (B) 12/31/2074, (C) 03/29/2012, (D) 12/30/2011, (E) 12/21/2011, (F) 12/31/2011. ###  A month ago from the last day of January 2012 would be the last day of December 2011, which is 12/31/2011. --> F\n",
+      "***label***:\n",
+      "   A month ago from the last day of January 2012 would be the last day of December 2011, which is 12/31/2011. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 was The is a last of day is is January is2012 isDD/ is The\n",
+      " is is the for The is are JaneJane month A months31/31.. whichA) 12/31/299. orC) 12/31/20. (D) 03/31/20. (E) 12/31/2010, (F) 12/30/2010, The  Jane month ago from January day day of January 2012 would be December day day of January 2012. which would December/31/2011.  A choices\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating celebration of of the of of. I is the of a a a in MM MMDDDDYYYY?\n",
+      "\n",
+      "?:\n",
+      "? Answer choices Answer ()\n",
+      " 12 12 31 12 12108, () 12/31/2,, (B) 12/03/20, (C) 03/31/20, (D) 12/30/20, (F) 12/21/20,. ###  A ago in\n",
+      "\n",
+      "\n",
+      " of\n",
+      " January\n",
+      " have be the day of January December would ( were the/31/20,\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date one year ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/31/2011, (B) 04/20/2011, (C) 02/01/2011, (D) 02/08/2011, (E) 06/30/2011, (F) 10/31/2011. ###  One year ago from today would be 01/31/2011. --> A\n",
+      "***label***:\n",
+      "   One year ago from today would be 01/31/2011. --> A\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane was day day of the 2 is The is a last of day is is Jane is January isDD/DD/ is One\n",
+      " is is MM for The is are Jane1) A 0201/31. whichA) 02/31/2011. (C) 05/31/2011, (D) 02/01/2011, (E) 02/01/2011, (F) 06/31/2011, One  Jane year ago from today would be January/31/2011.  A\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating celebration of of the of of. I is the of of one ago from from in MM/DDDD\n",
+      "YY?\n",
+      "\n",
+      "?:\n",
+      "??\n",
+      " Answer ()\n",
+      " 1 1 31\n",
+      "\n",
+      "\n",
+      "A) 01///20, (B) 05/01/2011, (D) 01/01/2011, (D)\n",
+      "/07/2011, (D) 06/01/2011,\n",
+      " ###  choice\n",
+      "\n",
+      " today in\n",
+      " be 0101\n",
+      "2011,\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/13/2017, (B) 08/05/2017, (C) 06/21/2017, (D) 06/18/2017, (E) 06/19/2017, (F) 06/25/2017. ###  \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days later would be 05/29/2017. \n",
+      "But Jane ran out of eggs today, so the date tomorrow is 05/30/2017. \n",
+      "\n",
+      "Therefore, the answer is (C). --> E\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days later would be 05/29/2017. \n",
+      "But Jane ran out of eggs today, so the date tomorrow is 05/30/2017. \n",
+      "\n",
+      "Therefore, the answer is (C). --> E\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th \", is a people on The has 50 egg 100. The was ate 5. space. The's on next of is August MMF/DDYY is \n",
+      " is is 50?\n",
+      " is are Jane1 A\n",
+      " 1005 152017 Jane whichA) 06/13/2017, orB) 08/05/2017, orD) 06/21/2017, (E) 06/21/2017, (F) 06/20/2017, The  \n",
+      "\n",
+      "First 9th is 2017 is a:29/2017. \n",
+      "\n",
+      " eggs later would be 05/30/2017. \n",
+      "\n",
+      "  has out of eggs on, so she answer tomorrow in 05/30/2017. \n",
+      "TheThe, the answer is (B). --> C is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\",\"\" bought\" eggs...\".\" per day.. she.\".\".\" eggs.. a time time tomorrow in MM MMDDDDYYYY?\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Answer choices: ( A\n",
+      " 06/1313/, () 06//0505,,C) 09/ 06/2017,,D) 06/21/2017,,D) 06/21/2017, (F) 06/20/2017,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "//2017,.\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " would be 0505/2017.\n",
+      "\n",
+      "40\n",
+      " Jane out of of eggs today it so\n",
+      " was would 05/29/2017. \n",
+      "I40,\n",
+      " day answer:A)\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 10/14/1960, (B) 12/03/1960, (C) 01/11/1961, (D) 01/03/1982, (E) 01/18/1961, (F) 01/03/1961. ###  \n",
+      "\n",
+      "We are looking for their 5-year anniversary. \n",
+      "5 years from Jan 2, 1958 is Jan 2, 1963. \n",
+      "The day after Jan 2, 1963 is Jan 3, 1963. \n",
+      "\n",
+      "Therefore, the answer is 01/03/1963. --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "We are looking for their 5-year anniversary. \n",
+      "5 years from Jan 2, 1958 is Jan 2, 1963. \n",
+      "The day after Jan 2, 1963 is Jan 3, 1963. \n",
+      "\n",
+      "Therefore, the answer is 01/03/1963. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was not firstth or partnership..\n",
+      " is their 50 of? the?//DD/\n",
+      "\n",
+      "\n",
+      " is is it?\n",
+      " is are 501)\n",
+      " years31/18. whichB) 10/02/1960. orC) 12/03/196050, orD) 01/11/19, (E) 01/03/1961, andF) 01/02/1962, The  \n",
+      "\n",
+      "The're asked for a 50thyear anniversary. \n",
+      "\n",
+      " years are the 2, 1958 are Jan 2, 1958. \n",
+      "5 50 before Jan 2, 1958 is Jan 3, 1966. \n",
+      "\n",
+      "The, the date is (/03/19.  D.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and and on on 2, 1958. It not their 5-- anniversary today.. not other of today in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) (/1414/, () 10//031960,, () 12 01 0111196019,,C) 12 0101/1960,, ()\n",
+      "/01011919, (D) 01/18/1919,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      " looking, their-year..\n",
+      "\n",
+      "\n",
+      ".\n",
+      " from. 2, 1958, 2,, 1963\n",
+      " \n",
+      "\n",
+      " of the 2,,, Jan 2,,. \n",
+      "TheThe,\n",
+      " day answer the 0101/19.\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date one year ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 12/19/1959, (B) 01/02/2016, (C) 01/02/1960, (D) 06/02/1960, (E) 12/26/1959, (F) 01/02/1967. ###  \n",
+      "\n",
+      "The question is asking for the date one year ago from today. \n",
+      "Today is their 5-year anniversary, so today's date is 01/02/1963. \n",
+      "One year ago from 01/02/1963 is 01/02/1962. \n",
+      "\n",
+      "The answer is C. --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The question is asking for the date one year ago from today. \n",
+      "Today is their 5-year anniversary, so today's date is 01/02/1963. \n",
+      "One year ago from 01/02/1963 is 01/02/1962. \n",
+      "\n",
+      "The answer is C. --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was not firstth or partnership..\n",
+      " is their 50 of of and? the? the?//DD?\n",
+      "\n",
+      "\n",
+      " is is 50?\n",
+      ": are answers1)\n",
+      " +2/18. whichB) 1/19/1960, orC) 01/02/2016, (D) 01/02/1960, (E) 01/02/1960, (F) /02/1960, The  \n",
+      "\n",
+      "Jane couple is asked for the couple one year ago from the. \n",
+      "\n",
+      " is January 50thyear anniversary. which the is date is 01/02/1967. \n",
+      "\n",
+      " year ago from today/02/1965 would 01/02/196562. \n",
+      "\n",
+      "Therefore next is A.  C.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "Jane and and and on on 2, 1958. It not their 5--. today.. not other of one today today from from in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) 12/1919/, () 12/010101,, () 01/01,2016,,D) 01 06/021960,, () 06 12 122619, ( () 12 0101,1960,. ###  \n",
+      "\n",
+      "\n",
+      ",:. for question the year ago. the in\n",
+      "\n",
+      "The\n",
+      " the their-year anniversary today\n",
+      " so is.. is.01.19\n",
+      "..The of ago from today.02/19, the/02/01.. \n",
+      "OneOne day is asking.\n",
+      " -->.\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date 24 hours later in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 07/17/2017, (B) 06/20/2017, (C) 06/18/2017, (D) 06/19/2017, (E) 07/10/2017, (F) 06/15/2017. ###  \n",
+      "\n",
+      "May 9th, 2017: Jane buys 40 eggs. \n",
+      "May 10th, 2017: Jane eats 1 egg. \n",
+      "May 11th, 2017: Jane eats 1 egg. \n",
+      "\n",
+      "And so on, until...\n",
+      "\n",
+      "May 24th, 2017: Jane eats her last egg. \n",
+      "\n",
+      "May 25th, 2017: Jane has no more eggs. \n",
+      "\n",
+      "Therefore, the correct answer is (D) 06/19/2017. --> D\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017: Jane buys 40 eggs. \n",
+      "May 10th, 2017: Jane eats 1 egg. \n",
+      "May 11th, 2017: Jane eats 1 egg. \n",
+      "\n",
+      "And so on, until...\n",
+      "\n",
+      "May 24th, 2017: Jane eats her last egg. \n",
+      "\n",
+      "May 25th, 2017: Jane has no more eggs. \n",
+      "\n",
+      "Therefore, the correct answer is (D) 06/19/2017. --> D\n",
+      "***trained predict***:\n",
+      "  the 24,, 2016,'s a, from The was them egg day for The, eats out of eggs.\n",
+      "'s the point on hours after? the?YY?YYYY?\n",
+      "\n",
+      " means is it?\n",
+      ": are\n",
+      "1) The/09/2017, (B) 07/17/2017, (C) 06/21/2017, (D) 06/19/2017, (E) 06/20/2017, (F) 07/21/2017,\n",
+      "  \n",
+      "\n",
+      "The 16th, 2017   ate 40 eggs. \n",
+      "The 10th, 2017:  eats one egg. \n",
+      "May 11th, 2017:  eats 2 egg. \n",
+      "MayMay so on. until May JaneJane 12th, 2017:  eats 1 first egg. \n",
+      "\n",
+      "June 25th, 2017: Jane eats eaten choice eggs. \n",
+      "\n",
+      "June, the date answer is 06A) /19/2017.  D \n",
+      "***untrained predict***:\n",
+      "  the 1,, the,'s a, from She was them egg day. She, has out of eggs.\n",
+      " a the difference of hours after? the?MM?YYYYYY\n",
+      "\n",
+      " is is the?\n",
+      ": are\n",
+      "1) I/09/2017 - 11B) 07/17/2017, (C) 05/18/2017, (D) 06/11/2017, (E) 06/17/2017, (F) 07/11/2017,\n",
+      "\n",
+      "**************** TheThe 9th, 2017 Jane\n",
+      " bought 40 eggs. She \n",
+      " 9th, 2017: Jane buys one egg per \n",
+      "\n",
+      " 11th, 2017: Jane eats 1 egg. \n",
+      "\n",
+      "May now on. until the\n",
+      "JaneMay 12th, 2017: Jane buys 1 first egg. \n",
+      "\n",
+      "May 25th, 2017: Jane eats a eggs eggs. \n",
+      "\n",
+      "May, she date answer is:A) 07/17/2017,\n",
+      " 06 06\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/18/2016, (B) 07/16/2017, (C) 06/17/2017, (D) 12/18/2016, (E) 06/18/2017, (F) 06/18/2054. ###  \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "\n",
+      "40 days later would be 05/19/2017. \n",
+      "\n",
+      "But Jane ran out of eggs today, so it must be 05/18/2017. \n",
+      "\n",
+      "Therefore, the answer is (C). --> E\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "\n",
+      "40 days later would be 05/19/2017. \n",
+      "\n",
+      "But Jane ran out of eggs today, so it must be 05/18/2017. \n",
+      "\n",
+      "Therefore, the answer is (C). --> E\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th \", is a articles on The bought 50 egg hour. The was ate 5. space. The's on next of is September JaneF/YYYY is \n",
+      " is is it?\n",
+      " is are Jane1 A\n",
+      " 1017 162017 Jane whichA) 06/18/2016, orB) 07/18/2017, (B) 06/17/2017, (E)\n",
+      "/17/2016, (F) 06/19/2017,, The  \n",
+      "\n",
+      "The 9th, 2017 is a:29/2017. \n",
+      "\n",
+      "The eggs later would be 05/17/2017. \n",
+      "\n",
+      "The the has out of eggs on, which the's be 05/19/2017. \n",
+      "\n",
+      "The, the answer is (B). --> C is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\",\"\" bought\" eggs...\".\".\".\".. she.\".\".\" eggs.. a time time today in MM MMDDDDYYYY?\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Answer answers: ( A\n",
+      " 06/18182016, ()\n",
+      "//162016,,C) 07///2017,,D) 06///2017,, () 12/17/2016, (F) 06/18/2017,55. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      ",\n",
+      "\n",
+      "/ 09\n",
+      ",\n",
+      "\n",
+      "\n",
+      "May\n",
+      "\n",
+      "\n",
+      " would 05 0505/2017. \n",
+      "\n",
+      "40\n",
+      " Jane out of of eggs today it so it be. 0519/2017. \n",
+      "II,\n",
+      "\n",
+      " answer:A)\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date one week from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 02/07/2014, (B) 03/16/2012, (C) 02/07/2012, (D) 02/01/2012, (E) 01/31/2012, (F) 02/06/2012. ###  Jane is celebrating the last day of Jan 2012. The last day of Jan 2012 is 1/31/2012. One week from 1/31/2012 is 2/7/2012. Therefore, the answer is (C) 02/07/2012. --> F\n",
+      "***label***:\n",
+      "   Jane is celebrating the last day of Jan 2012. The last day of Jan 2012 is 1/31/2012. One week from 1/31/2012 is 2/7/2012. Therefore, the answer is (C) 02/07/2012. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 Jane The is a day of day is is is January isDD/DD is is One\n",
+      " is is the is Jane is are Jane1) A -31 Jane31\n",
+      " whichA) 02/07/2014. (C) 03/16/2012, (D) 02/07/2012, (E) 02/31/2012, (F) 01/31/2012, One  Jane is currently the last day of January 2012. One day day of January 2012 is February week31/2012. One week from today/31/2012 is 1/31/2012. One, one date is (A) 02/07/2012. --> C\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating the the day the Jan of. What is the the day week from day today MM MMDDDDYYYY?\n",
+      "\n",
+      " choice: true\n",
+      " Answer choices: ( A\n",
+      "\n",
+      ":0707 2014, () 02//162014, (B) 04/16/2014, (C) 03/070101, (D) 02/013131, (E) 01/01062012,. ###  Jane, celebrating the day of of in. ###  choice of Jan Jan. is 1 1/31.. from from/31/31, 1/31/7. One,, last answer 1A)) 0201/2014,\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date 24 hours later in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 02/01/2016, (B) 02/01/2079, (C) 11/10/2011, (D) 01/31/2012, (E) 02/01/2012, (F) 01/04/2012. ###  The last day of Jan 2012 is 1/31/2012. 24 hours later would be 1/31/2012 at 11:59 PM, which would turn into 2/1/2012 at 12:00 AM. Therefore, the answer is (E) 02/01/2012. --> E\n",
+      "***label***:\n",
+      "   The last day of Jan 2012 is 1/31/2012. 24 hours later would be 1/31/2012 at 11:59 PM, which would turn into 2/1/2012 at 12:00 AM. Therefore, the answer is (E) 02/01/2012. --> E\n",
+      "***trained predict***:\n",
+      " ,TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 Jane The is a day of is from is January isDD/DD is is The\n",
+      " is is the 24 24 is are 2424) 24 hours24/12. whichA) 02/30/2016/. (C) 02/30/20. (D) 11/30/20, (E) 01/01/20, (F) 01/30/2012, The  Jane day day of January 2012 is March/30/2012.  hours later would be 1/30/20, 3/59 AM, which is be into a/31/2012 at 11:00 AM. Therefore, the date is (A) 01/01/2012. --> E is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating the the day of Jan of. What is the the 24 24 later in MM/ MM MMYYYY?\n",
+      "\n",
+      " choice: true? Answer choices: ( A\n",
+      "\n",
+      " /01012016, () 02//012016,79 (C) 02// 1010,, () 11/01,20, (D) 01/01/20, (E) 02/01/2011,. ###  The last of the\n",
+      "\n",
+      " is 1 1/31,... in would in/1/31. at 11:: PM, would be turn 1///1, 11:59 PM,.,, last answer 1A) 02/01/01,.\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/25/2012, (B) 02/02/2012, (C) 02/11/2012, (D) 02/01/2012, (E) 02/15/2012, (F) 04/22/2012. ###  \n",
+      "\n",
+      "Jane is celebrating the last day of Jan 2012. \n",
+      "\n",
+      "The last day of Jan 2012 is 1/31/2012. \n",
+      "\n",
+      "The day after 1/31/2012 is 2/1/2012. \n",
+      "\n",
+      "Therefore, the answer is D. --> D\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "Jane is celebrating the last day of Jan 2012. \n",
+      "\n",
+      "The last day of Jan 2012 is 1/31/2012. \n",
+      "\n",
+      "The day after 1/31/2012 is 2/1/2012. \n",
+      "\n",
+      "Therefore, the answer is D. --> D\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 Jane The is considered day of is January isDD/DD is is Jane\n",
+      " is is Jane for Jane is are JaneJane) is 0202/2012. whichA) 02/25/2012. (C) 02/02/2012, (D) 02/11/2012, (E) 02/10/2012, (F) 02/31/2012, The  Jane\n",
+      "\n",
+      "Jane is on the last day of February 2012. \n",
+      "TheThe day day is January 2012 is February/31/2012. \n",
+      "\n",
+      "The day after 1/31/2012 is 2/31/2012. \n",
+      "\n",
+      "Therefore, the date is (. --> D.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating the the day the Jan of. What is the the tomorrow in MM/DDDDYYYY?\n",
+      "\n",
+      " choice: true\n",
+      ": choices: ( A) 01\n",
+      "2525\n",
+      ", () 01//,2012, (B) 03/02 112011,,C) 02/11012012, (D) 02/01/2015, (D) 02/15/2012,. ###  \n",
+      "\n",
+      "\n",
+      ": celebrating the day of of..\n",
+      "\n",
+      "\n",
+      "Jane Jane of of Jan Jan.. 1 1 3131.\n",
+      "\n",
+      "\n",
+      "Jane last of Jan Jan31/31, 1/31/31. \n",
+      "\n",
+      "The,\n",
+      " first answer 1 D\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 10/14/1960, (B) 12/03/1960, (C) 01/11/1961, (D) 01/03/1982, (E) 01/18/1961, (F) 01/03/1961. ###  \n",
+      "\n",
+      "The couple married on January 2, 1958. \n",
+      "\n",
+      "Five years later would be January 2, 1963. \n",
+      "\n",
+      "Therefore, the day after their 5-year anniversary would be January 3, 1963. \n",
+      "\n",
+      "The answer is (F). --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The couple married on January 2, 1958. \n",
+      "\n",
+      "Five years later would be January 2, 1963. \n",
+      "\n",
+      "Therefore, the day after their 5-year anniversary would be January 3, 1963. \n",
+      "\n",
+      "The answer is (F). --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was not firstth or partnership..\n",
+      " is their 50 of? the?//DD/\n",
+      "\n",
+      "\n",
+      " is is 50?\n",
+      " is are 501)\n",
+      " years31/18. whichB) 10/02/1960. orC) 12/03/196050, orD) 01/11/19, (E) 01/03/1961, andF) 01/02/1962, The  \n",
+      "\n",
+      "The first has on Jan 2, 1958. \n",
+      "\n",
+      "The years later, be January 3, 1966. \n",
+      "\n",
+      "The, the date after January wedding-year anniversary is be January 3, 1965. \n",
+      "\n",
+      "The answer is (B).  F is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and and on on 2, 1958. It not their 5-. anniversary today.. not the of today in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) (/1414/, () 10//031960,, () 12 01 01 11196019,,C) 12 0101/19,, ()\n",
+      "/01011919, (D) 01/18/1919,. ###  \n",
+      "\n",
+      "\n",
+      ",\n",
+      " on 1 2,.\n",
+      "\n",
+      "\n",
+      "The-,, be be 1,..\n",
+      "\n",
+      "FiveFive,, first of after day-year- day be January 2,,.\n",
+      "\n",
+      "\n",
+      "The day was:A)\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date yesterday in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/30/1925, (B) 09/30/2011, (C) 01/29/2012, (D) 02/06/2012, (E) 01/30/2012. ###  Yesterday was the last day of January, so today is February 1st. The day before February 1st is January 31st. Therefore, the date yesterday was 01/31/2012. --> E\n",
+      "***label***:\n",
+      "   Yesterday was the last day of January, so today is February 1st. The day before February 1st is January 31st. Therefore, the date yesterday was 01/31/2012. --> E\n",
+      "***trained predict***:\n",
+      " ,TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 was The is the last of is January is2012/DD is is The\n",
+      " is is the for The is are Jane1) is 0201/31. whichA) 02/30/19. (C) 09/30/2011. (D) 01/29/2011, (E) 02/29/2011, The   was the last day of January 2012 which yesterday is the 1st.  day before that 1st is February 30st, Therefore, the date yesterday is 01/30/2012. The A\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating celebration of of the of of. I is the of\n",
+      " in MM/DD\n",
+      "YYYY?\n",
+      "\n",
+      "?\n",
+      "\n",
+      "??\n",
+      " Answer ( A\n",
+      " 1 1 31\n",
+      " 1925, A) 01///19, (B) 09/2011/2011, (C) 01/01/2011, (D) 01/31/2011,. ### \n",
+      "\n",
+      " last time of the\n",
+      "\n",
+      " I I my, 1 January. last was was 1st. is 1th. I,, day of was the 0101.19.\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date 10 days ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 05/15/2017, (B) 06/08/2017, (C) 06/22/2017, (D) 06/06/2017, (E) 06/14/2017, (F) 06/08/1933. ###  \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days ago would be 40 days before 05/09/2017, or 04/19/2017. \n",
+      "10 days before that would be 10 days before 04/19/2017, or 04/09/2017. \n",
+      "\n",
+      "Therefore, the answer is (D) 06/06/2017. --> B\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days ago would be 40 days before 05/09/2017, or 04/19/2017. \n",
+      "10 days before that would be 10 days before 04/19/2017, or 04/09/2017. \n",
+      "\n",
+      "Therefore, the answer is (D) 06/06/2017. --> B\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th at, is 5 articles on The has 50 hour hour. The was was 5. information. The was new next of days 50? the JaneF/MMYY is \n",
+      " means is 10 10\n",
+      " 10 10 1010 100 10:29/2017 Jane orA) 05/15/2017, orB) 06/08/2017, (C) 06/29/2017, (E) 06/29/2017, (F) 06/15/2017,, (  Jane\n",
+      "\n",
+      "Jane 9th is 2017 is a:15/2017. \n",
+      "\n",
+      " days before would be 05. off the/15/2017. or 50/09/2017. \n",
+      "\n",
+      " days ago 05 count be 04 days before 05/19/2017, or 05/19/2017. \n",
+      "\n",
+      "10, the answer is (B) 06/08/2017. --> D)\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\", 2017\" married\" eggs.\" said.\" one.\". she today she ran.\".\".\" Today today is time time 10 10 today today MM//DDYYYY?\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Answer choices: ( A\n",
+      "\n",
+      "\n",
+      "1515\n",
+      ", () 05//0808, (C) 07/07222017, (D) 06/07062017, (E) 06/061414,,F) 06/060820171933. ###  \n",
+      "\n",
+      "\n",
+      " 9\n",
+      ", 2017 is//0909,\n",
+      "\n",
+      "\n",
+      "\n",
+      " ago in\n",
+      " 05 days ago. 05 05 092017., or.041919.\n",
+      " 1 days ago would that would 10 days before 05/19/2017, or 04/19/2017, \n",
+      "1010,\n",
+      "\n",
+      " choice:A))/08062017, \n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date one week ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/02/1961, (B) 10/01/1960, (C) 12/26/1960, (D) 01/09/1961, (E) 07/26/1960, (F) 12/26/1936. ###  \n",
+      "\n",
+      "The question is asking for one week ago from today. \n",
+      "Today is their 5-year anniversary, so today's date is 01/02/1963. \n",
+      "One week ago from 01/02/1963 is 12/26/1962. \n",
+      "\n",
+      "Therefore, the answer is (C) 12/26/1960. --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The question is asking for one week ago from today. \n",
+      "Today is their 5-year anniversary, so today's date is 01/02/1963. \n",
+      "One week ago from 01/02/1963 is 12/26/1962. \n",
+      "\n",
+      "Therefore, the answer is (C) 12/26/1960. --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and married a 50 or 2 was\n",
+      " was not firstth and partnership..\n",
+      " is their 50 of of and? the? the?//DD? is\n",
+      "\n",
+      " is is it?\n",
+      ": are answers1)\n",
+      " -02/00\n",
+      ". whichA) 01/02/19, (C) 01/01/19. (D) 12/02/196061, (E) 01/01/19, (F) 01/26/1961, The  \n",
+      "\n",
+      "Jane couple is asked for the week of from January. \n",
+      "The is January 50thyear anniversary. so the is date is 12/09/19. \n",
+      "\n",
+      " week ago from today/02/19 would 12/26/1936. \n",
+      "\n",
+      "Therefore, one answer is (B) 01/26/19.  D -->\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "Jane and and and on on 2 and 1958. It not their 5-.. today. is your other of one today today from from in MM//DDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A\n",
+      " 01 01020219 61, ()) 10/0101, (C) 10 12 122626,,D) 10 01 01090919,, () 10 07/261960,, () 07 12 1226196019,. ###  \n",
+      "\n",
+      "\n",
+      ",.. for one. for the in\n",
+      "\n",
+      "The. the their-year anniversary today. so is.. is../19...The day ago from today.02/1963. is-1226196019. \n",
+      "OneOne, I question is askingA)) 1226/19\n",
+      " \n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 10/14/1960, (B) 12/03/1960, (C) 01/11/1961, (D) 01/03/1982, (E) 01/18/1961, (F) 01/03/1961. ###  They married on January 2, 1958, so the next year would be 1959. January 2, 1959 would be their first anniversary. The second anniversary would be on January 2, 1960. So their fifth anniversary would be on January 2, 1961. Therefore, tomorrow's date would be 01/03/1961. --> F\n",
+      "***label***:\n",
+      "   They married on January 2, 1958, so the next year would be 1959. January 2, 1959 would be their first anniversary. The second anniversary would be on January 2, 1960. So their fifth anniversary would be on January 2, 1961. Therefore, tomorrow's date would be 01/03/1961. --> F\n",
+      "***trained predict***:\n",
+      " ,TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was not firstth or partnership..\n",
+      " is their 50 of? the?//DD/\n",
+      "\n",
+      "\n",
+      " is is it?\n",
+      " is are 11)\n",
+      " years2.18. whichA) 10/02/1960, orC) 12/03/196050, orD) 01/11/19, (E) 01/03/1961, andF) 01/02/1962, The   are on January 2, 1958. and it month day would be December.  2 is 1958 is be 1958 50 anniversary,  next anniversary would be December January 3, 1958, The, 50 anniversary would be 1960 January 3, 1961. So, the's date is be January/03/1961.  D\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "×Jane and and and on on 2, 1958. It not their 5--. today.. not other of tomorrow in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) (/1414/, () 10//0300,, () 12 01 01111119,,D) 12 0101031960,, ()\n",
+      "/01181919, ( () 01/18/1919,. ###  They their on January 2,. (, the the in would be 1959..... be. their year today January next.. be their their 2, 1959, January the first anniversary would be 1959 January 2, 1960. So their their, next would be their.12.1919..  \n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 12/23/1960, (B) 01/02/1990, (C) 01/02/1961, (D) 01/03/1961, (E) 01/02/1960, (F) 01/02/1894. ###  \n",
+      "\n",
+      "The couple married on January 2, 1958. \n",
+      "\n",
+      "Today is their 5-year anniversary. \n",
+      "\n",
+      "The date today is January 2, 1963. \n",
+      "\n",
+      "The answer is (C). --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The couple married on January 2, 1958. \n",
+      "\n",
+      "Today is their 5-year anniversary. \n",
+      "\n",
+      "The date today is January 2, 1963. \n",
+      "\n",
+      "The answer is (C). --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a 50 or 2 or\n",
+      " was a firstth or or..\n",
+      " is their 50 of? the?50/DI/ format\n",
+      "\n",
+      " is is 50? The is are 501 50\n",
+      " -31/2018. whichB) 2/02/19. orC) 01/02/199050, orD) 01/02/1961, orE) 01/02/19, (F) 01/02/1961, The  \n",
+      "\n",
+      "Jane couple has on January 2, 1958. \n",
+      "\n",
+      "The is January 50thyear anniversary. \n",
+      "\n",
+      "The 50 today is 01 2, 1958. \n",
+      "\n",
+      "The 50 is DB) --> C is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and love on Jan 2, 1958.\n",
+      " not their 5-- anniversary today.. not other of today in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) 12/ 2323/, A) 12/01 0201,,B) 01/01/1990,, (D) 01/01/1919, (D) 01/03/19, (F) 01/03/19,,. ###  \n",
+      "\n",
+      "\n",
+      ", married on 1 2,.\n",
+      "\n",
+      "\n",
+      "The, the their-day birthday.\n",
+      "\n",
+      "\n",
+      "The day today in their.,,..\n",
+      "\n",
+      "Today day today theirA)\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date one week ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/11/1923, (B) 11/11/2016, (C) 06/11/2017, (D) 06/07/2017, (E) 07/02/2017, (F) 07/11/2017. ###  \n",
+      "\n",
+      "Jane bought 40 eggs on May 9th, 2017. \n",
+      "\n",
+      "That means that one week ago from today would be May 2nd, 2017. \n",
+      "\n",
+      "The answer is 05/02/2017. --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "Jane bought 40 eggs on May 9th, 2017. \n",
+      "\n",
+      "That means that one week ago from today would be May 2nd, 2017. \n",
+      "\n",
+      "The answer is 05/02/2017. --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th \", is a people on The has 50 egg 100. The was ate.. space. The is on next of can is? Jane? September MM//YYYYYY \n",
+      " week is 50? One is are Jane1 A 10 1005 102017 16\n",
+      " whichB) 06/11/19. (C) 11/11/2016, (D) 06/11/2017, (E) 06/11/2017, (F) 07/02/2017, The  \n",
+      "\n",
+      "First buys 40 eggs. May 9th, 2017. \n",
+      "\n",
+      "The means that today week ago from today would be July 11nd, 2017. \n",
+      "\n",
+      "However next is (/11/2017.  B is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\",,\" free 40 eggs.. 40 more per day.. she.\" out of eggs... the time one week ago from from today MM MMDDMMYYYY?\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Answer answers: ( A\n",
+      ":/11111923, () 06///11 2016, () 11///2016, (C) 06/11072017, (D) 06/07/2017, (F) 07/02/2016,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      ". eggs on. 9 9,,\n",
+      "\n",
+      "\n",
+      "Jane\n",
+      ": I of ago from today in be be May 2\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "That first is: 0505/2017,\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/30/2012, (B) 01/31/2012, (C) 01/21/2012, (D) 03/22/2012, (E) 02/10/2012, (F) 01/29/2012. ###  Jane is celebrating the last day of Jan 2012. The last day of Jan 2012 is Jan 31st. Today's date is Jan 31st. Therefore, the date today is 01/31/2012. --> B\n",
+      "***label***:\n",
+      "   Jane is celebrating the last day of Jan 2012. The last day of Jan 2012 is Jan 31st. Today's date is Jan 31st. Therefore, the date today is 01/31/2012. --> B\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day was of the 2 was The is a next of is January 22012/DD is is The\n",
+      " is is the for The is are JaneJane) is 0201/31. whichA) 02/31/2012. (C) 01/31/2012, (D) 01/31/2012, (E) 01/31/2012, (F) 02/31/2012, The   is currently the last day of January 2012.  day day of January 2012 is January 3st. So is day is January 2st, Therefore, the date today is 01/30/2012. The B\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating the the day the of of. I is the of today the MM MMDDDDYYYY?\n",
+      "\n",
+      " choice:\n",
+      "?: choices: ( A) 01\n",
+      " 30\n",
+      "\n",
+      ",A) 01/30/2012, (C) 01/31/2012, (D) 01///22,,D) 03/21/2012, (D) 03/21/2012,. ### \n",
+      ",, the day of of 2012,   choice of Jan 2012\n",
+      ",. 31 31\n",
+      ", is is Jan 31st. Today,\n",
+      "\n",
+      " of is today 0101\n",
+      "\n",
+      ", \n",
+      "\n",
+      "***input***:\n",
+      " Today is the last day of the first quarter of 2008. What is the date today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 08/31/2008, (B) 05/05/2008, (C) 10/31/2008, (D) 03/24/2008, (E) 03/31/2008, (F) 12/23/2007. ###  \n",
+      "\n",
+      "The first quarter of 2008 is January 1st-March 31st. \n",
+      "\n",
+      "Today is the last day of the first quarter of 2008. \n",
+      "\n",
+      "Therefore, today is March 31st, 2008. \n",
+      "\n",
+      "The answer is E. --> E\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The first quarter of 2008 is January 1st-March 31st. \n",
+      "\n",
+      "Today is the last day of the first quarter of 2008. \n",
+      "\n",
+      "Therefore, today is March 31st, 2008. \n",
+      "\n",
+      "The answer is E. --> E\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Today Today first last on the last day of the.\n",
+      " is the first of is the isbas/EE/ is The\n",
+      " is is the for The is are TheMM) is:01/01 - whichA) 08/30/2008, (C) 05/01/2008, orD) 10/30/2008, orE) 03/24/2008, (F) 03/31/2008, The  \n",
+      "\n",
+      "The first quarter of 2008 is the 1st.the 1st. \n",
+      "TheThe is January first day of the first quarter. 2008. \n",
+      "\n",
+      "The, the's the 31st. and. \n",
+      "\n",
+      "The date is C. --> E.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Today Today last day of today last day of the of for as a last of of the MM for/ also HH\n",
+      "\n",
+      "\n",
+      " of of today true? Answer answers: ( A\n",
+      " 08 /3101\n",
+      "\n",
+      " () 08//0505, (C) 05/01/2008, (C) 10/03/2008,, () 10/24/2008, (F) 03/01/2008,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      " of the\n",
+      "\n",
+      " January 1 1th March 31 March\n",
+      "\n",
+      "\n",
+      "The\n",
+      " the first day of the last day of the 2008\n",
+      "\n",
+      "\n",
+      "The\n",
+      "\n",
+      ", the March 31th\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "The first to:\n",
+      "\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date yesterday in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/26/2017, (B) 06/25/2017, (C) 06/17/2017, (D) 06/17/2111, (E) 07/18/2017, (F) 07/13/2017. ###  \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days later would be 05/19/2017. \n",
+      "But Jane ate one egg per day, so she would have eaten her last egg on 05/18/2017. \n",
+      "Therefore, the day before, or 05/17/2017, is the answer. \n",
+      "\n",
+      "The answer is (C). --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 05/09/2017. \n",
+      "40 days later would be 05/19/2017. \n",
+      "But Jane ate one egg per day, so she would have eaten her last egg on 05/18/2017. \n",
+      "Therefore, the day before, or 05/17/2017, is the answer. \n",
+      "\n",
+      "The answer is (C). --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th at, is 5 articles on The has 50 hour hour. The was was 5. information. The was new next of? August JaneF/DD/ is \n",
+      " is is it?\n",
+      " is are 11 A\n",
+      " 1029 162017 1 AA) 06/26/2017, orB) 06/25/2017, orD) 06/16/2017., (E) 06/17/2017, (F)\n",
+      "/18/2017,\n",
+      "  \n",
+      "\n",
+      "Jane 9th is 2017 is a:28/2017. \n",
+      "\n",
+      " HOU later are be 05/28/2017. \n",
+      "\n",
+      "  has one per. day. which the was be been one last meal. May/19/2017. \n",
+      "The, the answer before 05 05 the/19/2017, was 05 day. \n",
+      "\n",
+      "The answer is (B). --> C 06\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\", 2017\" married\" eggs.\" said.\" one.\". she today she ran.\".\".\" Today today is time time yesterday today MM//DDYYYY?\n",
+      "\n",
+      " choice\n",
+      "\n",
+      "\n",
+      " Answer answers: ( A\n",
+      " 06 062626\n",
+      ", () 06// 252017, (C) 06/25 172017, (D) 06/25172017 2111111 () 06//182017, (F) 07//132017,. ###  \n",
+      "\n",
+      "\n",
+      " 9\n",
+      "\n",
+      " 2017 is\n",
+      "/ 0909,\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " later would 05//192017. \n",
+      "20\n",
+      " Jane one more. day. and so did so had one last last. on/19.17. \n",
+      "But Jane Jane answer after it was./19,17, would is day.  \n",
+      "Therefore day answer:A)).\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date 10 days ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/25/2012, (B) 01/21/2012, (C) 12/31/2011, (D) 04/21/2012, (E) 07/21/2011, (F) 10/13/2011. ###  The last day of Jan 2012 is Jan 31. 10 days ago would be Jan 21. Therefore, the answer is (B). --> B\n",
+      "***label***:\n",
+      "   The last day of Jan 2012 is Jan 31. 10 days ago would be Jan 21. Therefore, the answer is (B). --> B\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 was The is a last of days is is January is12 isDD is is The\n",
+      " is is 10 10 10 is are 1010 10 10 0201/31. whichA) 02/25/2012. (C) 01/21/2012. (D) 01/31/2011, (E) 04/21/2012, (F) 08 days31/2011, The   day day of January 2012 is 10 3,  days ago would be Jan 22, Therefore, 10 date is (A) --> B is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane Jane celebrating celebration of of the of of. I is the of of days ago in MM MMDDDDYYYY?\n",
+      "\n",
+      "?:\n",
+      "?? choices Answer ()\n",
+      " 1 1 25 1\n",
+      ",A) 01/25/,, (B) 01///31, (D) 12/31/2011, (D) 12/31/2012, (D) 07/31/2011,. ###   choice of\n",
+      "\n",
+      "\n",
+      " true\n",
+      " 31\n",
+      "/ ago in\n",
+      " be Jan 21\n",
+      ",\n",
+      " last answer:A)\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date yesterday in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 02/01/1961, (B) 01/01/2003, (C) 01/01/1958, (D) 04/09/1961, (E) 01/01/1961, (F) 11/01/1960. ###  \n",
+      "\n",
+      "The question is asking for the date \"yesterday.\" \n",
+      "Yesterday is one day before the present day. \n",
+      "The present day is their 5-year anniversary, which is Jan 2, 1958. \n",
+      "Therefore, yesterday was Jan 1, 1958. \n",
+      "\n",
+      "The answer is (C) 01/01/1958. --> E\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The question is asking for the date \"yesterday.\" \n",
+      "Yesterday is one day before the present day. \n",
+      "The present day is their 5-year anniversary, which is Jan 2, 1958. \n",
+      "Therefore, yesterday was Jan 1, 1958. \n",
+      "\n",
+      "The answer is (C) 01/01/1958. --> E\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and married a 50 or 2 was\n",
+      " was not firstth and partnership..\n",
+      " is their 50 of? the?//DI/\n",
+      "\n",
+      "\n",
+      " was was it?\n",
+      ": are 11)\n",
+      " -02/18\n",
+      ". whichA) 02/30/19, (C) 01/01/200361, (D) 01/01/1958, (E)\n",
+      "/01/1958, (F) 01/01/19, The  \n",
+      "\n",
+      "Jane couple asks asked for the date ofJane\"\" \n",
+      "\n",
+      " was January of after the wedding.. \n",
+      "The 50 day is January 50thyear anniversary. which means January 2, 1958. \n",
+      "\n",
+      ", the's the 2, 1958. \n",
+      "\n",
+      "The answer is (B). 01/01/1958. --> C.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "Jane and and and on on 2, 1958. It not their 5-.. today.. your other of yesterday in MM//DDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A\n",
+      " 02/010119 61, () 02//0101,, () 01/01/200319,,D) 01//090919, (D) 04/01/200319, (F) 04//0101,. ###  \n",
+      "\n",
+      "\n",
+      ",:. for question. \"\" in.\n",
+      "\n",
+      ". the one.. date..\n",
+      "\n",
+      "Today question is is one first-year anniversary.\".\".\" their.. 3,.The,, is one 2, 1, \n",
+      "TheTherefore present is theA))/0101200319, \n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date a month ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 05/19/2017, (B) 10/18/2017, (C) 06/24/2017, (D) 05/11/2017, (E) 05/04/2017, (F) 05/18/2017. ###  \n",
+      "\n",
+      "May 9th, 2017 is the starting point. \n",
+      "40 eggs divided by 30 days in a month equals 1.33 eggs per day. \n",
+      "1.33 eggs per day for 30 days equals 40 eggs. \n",
+      "\n",
+      "Therefore, the answer is 05/04/2017. --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is the starting point. \n",
+      "40 eggs divided by 30 days in a month equals 1.33 eggs per day. \n",
+      "1.33 eggs per day for 30 days equals 40 eggs. \n",
+      "\n",
+      "Therefore, the answer is 05/04/2017. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe On May 9th \", is a articles and The bought 50 egg hour. The was ate 5. space. The's on next of new is is 2017 is//DDYY is \n",
+      " means is it?\n",
+      " is are 11 month 10 102017,2017 Jane whichA) 05/19/2017, orB) 10/18/2017, orB) 06/29/2017, (E) 05/11/2017, (F) 05/11/2017, The  \n",
+      "\n",
+      "First 9th is 2017 is a last date. \n",
+      "\n",
+      " eggs are into 50 means means MM month is 50, 00. day. \n",
+      "\n",
+      ".33 eggs per day is a days is 1. per \n",
+      "4040, the date is (/19/2017.  B\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " On May 9\",\"\" bought\" eggs...\".\".\".\".. she..\".\" eggs.. a time time a month ago in MM MMDDDDYYYY?\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Answer answers: ( A\n",
+      "\n",
+      "/1919\n",
+      ", () 05//182017,,C) 10// 242017,, () 07/01112017, (E) 05/01042017, (F) 05/04182017,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      ",\n",
+      "\n",
+      " starting point\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " divided by divided days in a in equals 1... day in.\n",
+      "40.\n",
+      ". per day in for days in 1. per\n",
+      "\n",
+      "11,, first answer the 0501/2017 2017\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane is celebrating the last day of Jan 2012. What is the date one week ago from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/22/2012, (B) 01/24/1947, (C) 01/24/2012, (D) 02/06/2012, (E) 01/24/1923, (F) 08/24/2011. ###  One week ago from today would be 7 days, or 1 week. Today is 1/31/2012, so one week ago would be 1/24/2012. Therefore, the answer is C. --> C\n",
+      "***label***:\n",
+      "   One week ago from today would be 7 days, or 1 week. Today is 1/31/2012, so one week ago would be 1/24/2012. Therefore, the answer is C. --> C\n",
+      "***trained predict***:\n",
+      " ,TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane celebrating day day of the 2 Jane The is a next of day is is is is January isDD/DD is is One\n",
+      " is is the for The is are Jane1)\n",
+      " 0201/31\n",
+      " whichA) 02/22/2012.. orC) 01/25/19, (D) 01/24/2012, (E) 02/24/201247, (F) 01/24/2012, One  Jane week ago from today would be January days, which 1 week.  is a week24/2012. which 7 week ago would be 7/24/2012., the answer is (. --> C.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane is celebrating the the day the Jan of. I is the the day one ago from from today MM MMDD MMYYYY?\n",
+      "\n",
+      " choice:\n",
+      "\n",
+      " Answer choices: ( A\n",
+      " 01 1 22\n",
+      " 2012, A) 01/22/2012,47 (B C 01/24/19, (C) 01/240619, (D) 02/24/201219, (D) 01///19,. ###  choice, from today in have: 7 days\n",
+      "\n",
+      " 1,\n",
+      ".: day1/31, ( so.. from be 7/31/19, I, I first answer:.\n",
+      "\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date 24 hours later in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 01/02/1961, (B) 01/01/1961, (C) 12/03/1960, (D) 01/04/1961, (E) 01/03/1961, (F) 01/03/1964. ###  \n",
+      "\n",
+      "The 5-year anniversary is on Jan 2, 1963. \n",
+      "\n",
+      "24 hours later would be Jan 3, 1963. \n",
+      "\n",
+      "Therefore, the answer is (E). --> E\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The 5-year anniversary is on Jan 2, 1963. \n",
+      "\n",
+      "24 hours later would be Jan 3, 1963. \n",
+      "\n",
+      "Therefore, the answer is (E). --> E\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a 50 or 2 or\n",
+      " was a firstth or,..\n",
+      " is their 50 of of of? the?N/DD/?\n",
+      "\n",
+      " is is 50? The: are \"1)\n",
+      " -30/19\n",
+      ". whichA) 01/02/1961, (C) 01/01/19, orD) 12/01/1961, (E) 01/05/1961, (F) 01/04/19, The  \n",
+      "\n",
+      "The question-year anniversary is January Jan 2, 1958. \n",
+      "\n",
+      "The hours later would be January 3, 1965. \n",
+      "\n",
+      "Therefore, the answer is (D) --> E is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and love on Jan 2, 1958.\n",
+      " not their 5-- anniversary today.. not 5 of 24 24 today in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A\n",
+      " 01: 020219 61, A) 01/01011919, (B) 01///03, (D) 12/01/1919, (D) 12/01/1919, (F) 12/03/19,. ###  \n",
+      "\n",
+      "\n",
+      "-year,\n",
+      " true the 1.\n",
+      ".\n",
+      "\n",
+      "\n",
+      "The. in in be in 2,,.\n",
+      "\n",
+      "\n",
+      "24,\n",
+      " 5 answer onA)).\n",
+      "\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 10/14/1960, (B) 12/03/1960, (C) 01/11/1961, (D) 01/03/1982, (E) 01/18/1961, (F) 01/03/1961. ###  \n",
+      "\n",
+      "Today is their 5-year anniversary, so they got married 5 years ago. \n",
+      "5 years ago from today was January 2, 2013. \n",
+      "The day after January 2, 2013 was January 3, 2013. \n",
+      "\n",
+      "So, the answer is D: 01/03/1961. --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "Today is their 5-year anniversary, so they got married 5 years ago. \n",
+      "5 years ago from today was January 2, 2013. \n",
+      "The day after January 2, 2013 was January 3, 2013. \n",
+      "\n",
+      "So, the answer is D: 01/03/1961. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was not firstth or partnership..\n",
+      " is their 50 of? the?//DD/\n",
+      "\n",
+      "\n",
+      " is is it?\n",
+      " is are 11)\n",
+      " years2.18. whichA) 10/02/1960, orC) 12/03/196050, orD) 01/11/19, (E) 01/03/1961, andF) 01/02/1962, The  \n",
+      "\n",
+      "The is January 50thyear anniversary. which they're married on months ago. \n",
+      "\n",
+      " years ago was that was January 2, 1958. \n",
+      "January day before January 2, 2013 was January 3, 1958. \n",
+      "TheTherefore, the day is (. 01/03/1961.  D:\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "Jane and and and on on 2, 1958. It not their 5--. today.. not other of tomorrow in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A\n",
+      " (/1414/, () 10//0300,, () 12 01 01111119,,D) 12 0101031960,, ()\n",
+      "/01181919, ( () 01/18/1919,. ###  \n",
+      "\n",
+      "\n",
+      ": first-year day today\n",
+      ", are got,- ago.\n",
+      "\n",
+      "\n",
+      ". ago today from from their January 2,, \n",
+      "5 5 was from 2, 2013, January 2, 2013. \n",
+      "TheThe, so day was the.: 01 12 03191961. -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date a month ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 12/01/1960, (B) 05/02/1961, (C) 12/02/1960, (D) 12/02/2055, (E) 12/02/2000, (F) 11/18/1960. ###  \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "A month ago would be Dec 2, 1962. \n",
+      "\n",
+      "The answer is C. --> C\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "A month ago would be Dec 2, 1962. \n",
+      "\n",
+      "The answer is C. --> C\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and Jane Jane 50 or 2 or\n",
+      " was the firstth or,..\n",
+      " is their 50 of couple or? a?50/DD/ is\n",
+      "\n",
+      " is is 50? The is are 5050 Month\n",
+      " +31/19. whichA) 12/30/1950. orC) 05/02/19, orD) 12/02/1960, (E) 12/02/20, (F) 12/02/2000, A  \n",
+      "\n",
+      "The day-year anniversary is a 2, 1958. \n",
+      "A month ago would be December 2, 1960. \n",
+      "\n",
+      "Therefore month is (.  C is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and Jane on Jane 2, 1958.\n",
+      " not their 5-- anniversary today.. not 5 of a a ago in MM MMDD MMYYYY?\n",
+      "\n",
+      " choice: true? Answer choices: ( A\n",
+      " 12 12 12 12 1960, () 12///196019, (B) 05/01/1960 (D) 12/02/19,, (D) 12/02/20, (D) 12/01/20,. ###  \n",
+      "\n",
+      "\n",
+      "-th,. true..,.. \n",
+      ". in in have in 2, Dec. \n",
+      "AThe 5 was Jan.\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " On May 9th, 2017 Jane bought 40 eggs. She ate one per day. Today she ran out of eggs. What is the date one week from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 06/25/2017, (B) 05/25/2018, (C) 06/25/1964, (D) 06/24/2017, (E) 07/25/2017, (F) 06/25/2063. ###  \n",
+      "\n",
+      "May 9th, 2017 is 5/9/2017. \n",
+      "\n",
+      "40 days from that date is 6/18/2017. \n",
+      "\n",
+      "However, Jane ate one egg per day, so she would have eaten her last egg on 6/17/2017, one day before. \n",
+      "\n",
+      "Therefore, one week from today, or 6/17/2017, is 6/24/2017. \n",
+      "\n",
+      "The answer is D. --> A\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "May 9th, 2017 is 5/9/2017. \n",
+      "\n",
+      "40 days from that date is 6/18/2017. \n",
+      "\n",
+      "However, Jane ate one egg per day, so she would have eaten her last egg on 6/17/2017, one day before. \n",
+      "\n",
+      "Therefore, one week from today, or 6/17/2017, is 6/24/2017. \n",
+      "\n",
+      "The answer is D. --> A\n",
+      "***trained predict***:\n",
+      " \n",
+      " On Maythth a, Jane by, and The bought 50 egg cent.\n",
+      ", is 5. Jane.\n",
+      " started new next of day is Jane is Jane MMMM/YY/ is\n",
+      "\n",
+      " is is Jane?\n",
+      " 1 are\n",
+      "1) A:29/2017\n",
+      " AB) 06/25/2017, (C) 05/25/2017, (D) 05/25/2017, (E) 06/25/2017, (F) 07/25/201750,\n",
+      "  \n",
+      "\n",
+      "One 9th is 2017 is May days25/2017. \n",
+      "\n",
+      "May people from 5 day are 5/25/2017. \n",
+      "\n",
+      "6, 5 has 3 per on day. which 5 was be been one last full on May/25/2017. which week after the \n",
+      "\n",
+      "Therefore, 5 week from today would 5 5/25/2017, would 7/25/2017. \n",
+      "\n",
+      "Therefore answer is (.  E -->\n",
+      "***untrained predict***:\n",
+      " ,on May 9th,, Jane a 40 eggs. was cut one per. she she she.\".\" of.\" She she is day? one week from she today MM//DD/YY?\n",
+      "\n",
+      "\n",
+      " the true\n",
+      "\n",
+      " choices\n",
+      " ()\n",
+      " 06::\n",
+      "\n",
+      ",A) 06/25/2017, (C) 05/25/2019, (D) 05/25/1964, (D) 06/25/2018, (F) 07/24/2017,, ( ###  \n",
+      "\n",
+      "\n",
+      "th\n",
+      " 2017\n",
+      " (\n",
+      "//2017.\n",
+      "\n",
+      "\n",
+      "May. ago today that../9.2017.\n",
+      "\n",
+      "\n",
+      "Today, I did one per. day. today she ran.. one.....18.17. so last. she\n",
+      "\n",
+      "TodayToday, today day from today in in any days16,2017, or now/17/2017. \n",
+      "\n",
+      "Therefore day choice that.\n",
+      "\n",
+      ".\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date tomorrow in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 10/14/1960, (B) 12/03/1960, (C) 01/11/1961, (D) 01/03/1982, (E) 01/18/1961, (F) 01/03/1961. ###  \n",
+      "\n",
+      "The day after their 5-year anniversary would be the 6-year anniversary, which would be Jan 3. \n",
+      "\n",
+      "Therefore, the date tomorrow in MM/DD/YYYY would be 01/03/1961. --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The day after their 5-year anniversary would be the 6-year anniversary, which would be Jan 3. \n",
+      "\n",
+      "Therefore, the date tomorrow in MM/DD/YYYY would be 01/03/1961. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a or or 2 or\n",
+      " was a firstth or partnership..\n",
+      " is their 50 of? the?50/DD/?\n",
+      "\n",
+      " is is 50? The is are 5050 50\n",
+      " years31/18. whichB) 10/02/19. orC) 12/03/196050, orD) 01/11/19, (E) 01/03/1961, andF) 01/02/1962, The  \n",
+      "\n",
+      "The first before Jan weddingthyear anniversary is be January 50thyear anniversary. which is be January 2, \n",
+      "\n",
+      "The, the date tomorrow in MM/DD/YYYY is be Jan/03/1961.  D\n",
+      "\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and and on on 2, 1958. It not their 5-. anniversary today.. not other of today in MM/DDDDYYYY?\n",
+      "\n",
+      " choice is true? Answer choices: ( A) (/1414/, () 10//0319,, () 12 01 01 11196019,,C) 12 0101/19, (D)\n",
+      "/01011919, (D) 01/18/1919,. ###  \n",
+      "\n",
+      "\n",
+      "\n",
+      ", day-day day day be: day-day..\n",
+      " would be the Janrd\n",
+      " TheThe,\n",
+      " day of the the///DDYYYY be the 0101/1919,\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date one week from today in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 02/28/1961, (B) 07/09/1960, (C) 01/09/1963, (D) 01/16/1961, (E) 03/09/1960, (F) 01/09/1961. ###  \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "One week from that would be Jan 9, 1963. \n",
+      "\n",
+      "Therefore, the answer is C. --> F\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "One week from that would be Jan 9, 1963. \n",
+      "\n",
+      "Therefore, the answer is C. --> F\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and was a 50 or 2 or\n",
+      " was a firstth or or..\n",
+      " is their 50 of of of today? the?//DD/?\n",
+      "\n",
+      " is is 50? The: are 5050 50\n",
+      " -30 -30\n",
+      ". whichA) 02/28/19. (C) 07/28/19, orD) 01/09/1962, (E) 01/01/19, (F) 01/16/1961, The  \n",
+      "\n",
+      "The question-year anniversary is one 2, 1958. \n",
+      "The week from Jan date be Jan 3, 1960. \n",
+      "\n",
+      "Therefore, the answer is (.  C.\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and Jane on Jan 2, 1958.\n",
+      " not their 5-- anniversary today.. not 5 of one week from today in MM/DD MMYYYY?\n",
+      "\n",
+      " choice: true? Answer choices: ( A) 02:28 2819 61, () 02//09/ 1960 (B) 07/01/1960, (C) 01/01/196319, (D) 01 0303/19, (D) 01/16/1919,. ###  \n",
+      "\n",
+      "\n",
+      "-th,\n",
+      " true 1.\n",
+      ".\n",
+      "\n",
+      "\n",
+      " of from today. would in 2, in.\n",
+      "\n",
+      "\n",
+      "The,\n",
+      " 5 answer the.\n",
+      " -->\n",
+      "\n",
+      "***input***:\n",
+      " Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. What is the date 10 days ago in MM/DD/YYYY?\n",
+      "Which choice is true? Answer choices: (A) 12/30/1960, (B) 12/22/1960, (C) 12/23/1915, (D) 12/23/1960, (E) 10/09/1960, (F) 12/23/2003. ###  \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "10 days ago would be Dec 23, 1962. \n",
+      "\n",
+      "The answer is D. --> D\n",
+      "***label***:\n",
+      "   \n",
+      "\n",
+      "The 5-year anniversary is Jan 2, 1963. \n",
+      "10 days ago would be Dec 23, 1962. \n",
+      "\n",
+      "The answer is D. --> D\n",
+      "***trained predict***:\n",
+      " \n",
+      "TheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheTheThe Jane Jane and Jane Jane 50 or 2 or\n",
+      " was the firstth or,..\n",
+      " is their 50 of years to? the?N/DD/ is\n",
+      "\n",
+      " is is 50? The is are 5050 50\n",
+      " +31/30. whichA) 1/30/19, orC) 12/22/196050, orD) 12/24/19, orE) 12/23/19, (F) 12/23/19, 10  \n",
+      "\n",
+      "The question-year anniversary is a 2, 1958. \n",
+      "\n",
+      "  ago would be December 22, 1963. \n",
+      "\n",
+      "Therefore answer is (.  D is\n",
+      "***untrained predict***:\n",
+      " \n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      " Jane and and Jane on Jane 2, 2.\n",
+      " not their 5-- anniversary today.. our 5 of- days today in MM MMDDDDYYYY?\n",
+      "\n",
+      " choice: true? Answer choices: ( A\n",
+      " 1////, A) 12/30/19, (C) 12/22/1960,,,D) 12/22/19, (D) 12/12/19, (F) 12/22/19,. ###  \n",
+      "\n",
+      "\n",
+      "-\n",
+      ",\n",
+      " true 1., 1963. \n",
+      "/ ago in have in 2 23 23,\n",
+      "\n",
+      "1010 5 was Jan.\n",
+      " -->\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 验证效果\n",
+    "\n",
+    "# 训练后的模型\n",
+    "ckpt_path = \"outputs/model_epoch_9.pt\"\n",
+    "model_trained = GPT2LMHeadModel.from_pretrained(model_name_or_path)\n",
+    "model_trained.load_state_dict(torch.load(ckpt_path))\n",
+    "model_trained.eval()\n",
+    "\n",
+    "# 训练前模型\n",
+    "model_name_or_path = \"../../models/GPT-2\"  # 替换成自己的模型路径\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(model_name_or_path)\n",
+    "model = GPT2LMHeadModel.from_pretrained(model_name_or_path)\n",
+    "model.eval()\n",
+    "\n",
+    "# 测试\n",
+    "for step, batch in enumerate(test_dataloader):\n",
+    "    batchsize = batch[\"input_ids\"].shape[0]\n",
+    "    batch[\"labels\"][batch[\"labels\"]==-100] = tokenizer.eos_token_id\n",
+    "    for i in range(batchsize):\n",
+    "        kwargs = {\n",
+    "            \"input_ids\": batch[\"input_ids\"][i:i+1],\n",
+    "            \"attention_mask\": batch[\"attention_mask\"][i:i+1],\n",
+    "            \"labels\": batch[\"labels\"][i:i+1],\n",
+    "        }\n",
+    "        input = tokenizer.decode(batch[\"input_ids\"][i], skip_special_tokens=True)\n",
+    "        label = tokenizer.decode(batch[\"labels\"][i], skip_special_tokens=True)\n",
+    "        print(\"***input***:\\n\", input)\n",
+    "        print(\"***label***:\\n\", label)\n",
+    "\n",
+    "        res_trained = model_trained(**kwargs)\n",
+    "        res_untrained = model(**kwargs)\n",
+    "\n",
+    "        # 解码输出\n",
+    "        logits = res_trained['logits'].cpu()\n",
+    "        predict = torch.argmax(res_trained['logits'], dim=-1)\n",
+    "        predict_str = tokenizer.decode(predict[0], skip_special_tokens=True)\n",
+    "        print(\"***trained predict***:\\n\", predict_str)\n",
+    "\n",
+    "        logits = res_untrained['logits'].cpu()\n",
+    "        predict = torch.argmax(res_untrained['logits'], dim=-1)\n",
+    "        predict_str = tokenizer.decode(predict[0], skip_special_tokens=True)\n",
+    "        print(\"***untrained predict***:\\n\", predict_str)\n"
+   ]
   }
  ],
  "metadata": {
@@ -252,7 +3629,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.20"
+   "version": "3.9.18"
   }
  },
  "nbformat": 4,
diff --git a/docs/chapter2/code/CoT/env.yml b/docs/chapter2/code/CoT/env.yml
deleted file mode 100644
index 6a48ecd..0000000
--- a/docs/chapter2/code/CoT/env.yml
+++ /dev/null
@@ -1,203 +0,0 @@
-name: dl2
-channels:
-  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
-  - defaults
-dependencies:
-  - ca-certificates=2023.12.12=haa95532_0
-  - openssl=3.0.12=h2bbff1b_0
-  - pip=23.3.1=py39haa95532_0
-  - python=3.9.18=h1aa4202_0
-  - setuptools=68.2.2=py39haa95532_0
-  - sqlite=3.41.2=h2bbff1b_0
-  - vc=14.2=h21ff451_1
-  - vs2015_runtime=14.27.29016=h5e58377_2
-  - wheel=0.41.2=py39haa95532_0
-  - pip:
-    - absl-py==2.1.0
-    - accelerate==0.17.1
-    - aiohttp==3.9.1
-    - aiosignal==1.3.1
-    - annotated-types==0.6.0
-    - anyio==4.2.0
-    - argon2-cffi==23.1.0
-    - argon2-cffi-bindings==21.2.0
-    - arrow==1.3.0
-    - asttokens==2.4.1
-    - async-lru==2.0.4
-    - async-timeout==4.0.3
-    - attrs==23.2.0
-    - babel==2.14.0
-    - beautifulsoup4==4.12.3
-    - bleach==6.1.0
-    - certifi==2023.11.17
-    - cffi==1.16.0
-    - charset-normalizer==3.3.2
-    - click==8.1.7
-    - cloudpickle==3.0.0
-    - colorama==0.4.6
-    - comm==0.2.1
-    - contourpy==1.2.0
-    - cpm-kernels==1.0.11
-    - cycler==0.12.1
-    - dataclasses-json==0.6.4
-    - datasets==2.14.5
-    - debugpy==1.8.0
-    - decorator==5.1.1
-    - defusedxml==0.7.1
-    - dill==0.3.7
-    - docstring-parser==0.15
-    - exceptiongroup==1.2.0
-    - executing==2.0.1
-    - fastapi==0.110.0
-    - fastjsonschema==2.19.1
-    - filelock==3.13.1
-    - fonttools==4.47.2
-    - fqdn==1.5.1
-    - frozenlist==1.4.1
-    - fsspec==2023.6.0
-    - gdown==5.2.0
-    - greenlet==3.0.3
-    - gym==0.26.2
-    - gym-notices==0.0.8
-    - h11==0.14.0
-    - hnswlib==0.8.0
-    - huggingface-hub==0.22.2
-    - idna==3.6
-    - importlib-metadata==7.0.1
-    - importlib-resources==6.1.1
-    - ipykernel==6.29.0
-    - ipython==8.18.1
-    - ipywidgets==8.1.1
-    - isoduration==20.11.0
-    - jedi==0.19.1
-    - jinja2==3.1.3
-    - joblib==1.3.2
-    - json5==0.9.14
-    - jsonpatch==1.33
-    - jsonpointer==2.4
-    - jsonschema==4.21.1
-    - jsonschema-specifications==2023.12.1
-    - jupyter==1.0.0
-    - jupyter-client==8.6.0
-    - jupyter-console==6.6.3
-    - jupyter-core==5.7.1
-    - jupyter-events==0.9.0
-    - jupyter-lsp==2.2.2
-    - jupyter-server==2.12.5
-    - jupyter-server-terminals==0.5.2
-    - jupyterlab==4.0.12
-    - jupyterlab-pygments==0.3.0
-    - jupyterlab-server==2.25.2
-    - jupyterlab-widgets==3.0.9
-    - kiwisolver==1.4.5
-    - langchain==0.1.13
-    - langchain-community==0.0.29
-    - langchain-core==0.1.36
-    - langchain-text-splitters==0.0.1
-    - langsmith==0.1.38
-    - markdown-it-py==3.0.0
-    - markupsafe==2.1.3
-    - marshmallow==3.21.1
-    - matplotlib==3.8.2
-    - matplotlib-inline==0.1.6
-    - mdurl==0.1.2
-    - mistune==3.0.2
-    - mpmath==1.3.0
-    - multidict==6.0.4
-    - multiprocess==0.70.15
-    - mypy-extensions==1.0.0
-    - nbclient==0.9.0
-    - nbconvert==7.14.2
-    - nbformat==5.9.2
-    - nest-asyncio==1.6.0
-    - networkx==3.2.1
-    - nltk==3.8.1
-    - notebook==7.0.7
-    - notebook-shim==0.2.3
-    - numpy==1.26.3
-    - openai==0.27.4
-    - orjson==3.10.0
-    - overrides==7.7.0
-    - packaging==23.2
-    - pandas==2.2.0
-    - pandocfilters==1.5.1
-    - parso==0.8.3
-    - pillow==10.2.0
-    - platformdirs==4.2.0
-    - prometheus-client==0.19.0
-    - prompt-toolkit==3.0.43
-    - psutil==5.9.7
-    - pure-eval==0.2.2
-    - pyarrow==15.0.0
-    - pyarrow-hotfix==0.6
-    - pycparser==2.21
-    - pydantic==2.6.4
-    - pydantic-core==2.16.3
-    - pygame==2.1.0
-    - pygments==2.17.2
-    - pyparsing==3.1.1
-    - pyproject==1.3.1
-    - pysocks==1.7.1
-    - python-dateutil==2.8.2
-    - python-json-logger==2.0.7
-    - pytz==2023.4
-    - pywin32==306
-    - pywinpty==2.0.12
-    - pyyaml==6.0.1
-    - pyzmq==25.1.2
-    - qtconsole==5.5.1
-    - qtpy==2.4.1
-    - rank-bm25==0.2.2
-    - referencing==0.33.0
-    - regex==2023.12.25
-    - requests==2.31.0
-    - rfc3339-validator==0.1.4
-    - rfc3986-validator==0.1.1
-    - rich==13.7.0
-    - rouge-score==0.1.2
-    - rpds-py==0.17.1
-    - scikit-learn==1.3.2
-    - scipy==1.11.4
-    - send2trash==1.8.2
-    - sentence-transformers==2.2.2
-    - sentencepiece==0.1.99
-    - shtab==1.6.5
-    - six==1.16.0
-    - sniffio==1.3.0
-    - soupsieve==2.5
-    - sqlalchemy==2.0.29
-    - stack-data==0.6.3
-    - starlette==0.36.3
-    - sympy==1.12
-    - tenacity==8.2.3
-    - terminado==0.18.0
-    - threadpoolctl==3.2.0
-    - tinycss2==1.2.1
-    - tokenizers==0.13.3
-    - tomli==2.0.1
-    - torch==2.1.2
-    - torchsummary==1.5.1
-    - torchvision==0.16.2
-    - tornado==6.4
-    - tqdm==4.66.1
-    - traitlets==5.14.1
-    - transformers==4.28.1
-    - trl==0.7.2
-    - types-python-dateutil==2.8.19.20240106
-    - typing-extensions==4.9.0
-    - typing-inspect==0.9.0
-    - tyro==0.7.0
-    - tzdata==2023.4
-    - uri-template==1.3.0
-    - urllib3==2.1.0
-    - uvicorn==0.29.0
-    - wcwidth==0.2.13
-    - webcolors==1.13
-    - webencodings==0.5.1
-    - websocket-client==1.7.0
-    - widgetsnbextension==4.0.9
-    - xxhash==3.4.1
-    - yarl==1.9.4
-    - zipp==3.17.0
-prefix: E:\anaconda\envs\dl2
-
diff --git a/docs/chapter2/code/ICL/ICL_distill.ipynb b/docs/chapter2/code/ICL/ICL_distill.ipynb
index 74418d6..f432a22 100644
--- a/docs/chapter2/code/ICL/ICL_distill.ipynb
+++ b/docs/chapter2/code/ICL/ICL_distill.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -11,116 +11,43 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 10,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "A module that was compiled using NumPy 1.x cannot be run in\n",
-      "NumPy 2.0.2 as it may crash. To support both 1.x and 2.x\n",
-      "versions of NumPy, modules must be compiled with NumPy 2.0.\n",
-      "Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.\n",
-      "\n",
-      "If you are a user of the module, the easiest solution will be to\n",
-      "downgrade to 'numpy<2' or try to upgrade the affected module.\n",
-      "We expect that some modules will need time to support NumPy 2.\n",
-      "\n",
-      "Traceback (most recent call last):  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/runpy.py\", line 197, in _run_module_as_main\n",
-      "    return _run_code(code, main_globals, None,\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/runpy.py\", line 87, in _run_code\n",
-      "    exec(code, run_globals)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel_launcher.py\", line 18, in <module>\n",
-      "    app.launch_new_instance()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/traitlets/config/application.py\", line 1075, in launch_instance\n",
-      "    app.start()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/kernelapp.py\", line 739, in start\n",
-      "    self.io_loop.start()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/tornado/platform/asyncio.py\", line 205, in start\n",
-      "    self.asyncio_loop.run_forever()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever\n",
-      "    self._run_once()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once\n",
-      "    handle._run()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/asyncio/events.py\", line 80, in _run\n",
-      "    self._context.run(self._callback, *self._args)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 545, in dispatch_queue\n",
-      "    await self.process_one()\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 534, in process_one\n",
-      "    await dispatch(*args)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 437, in dispatch_shell\n",
-      "    await result\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 362, in execute_request\n",
-      "    await super().execute_request(stream, ident, parent)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 778, in execute_request\n",
-      "    reply_content = await reply_content\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 449, in do_execute\n",
-      "    res = shell.run_cell(\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/ipykernel/zmqshell.py\", line 549, in run_cell\n",
-      "    return super().run_cell(*args, **kwargs)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3048, in run_cell\n",
-      "    result = self._run_cell(\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3103, in _run_cell\n",
-      "    result = runner(coro)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/async_helpers.py\", line 129, in _pseudo_sync_runner\n",
-      "    coro.send(None)\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3308, in run_cell_async\n",
-      "    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3490, in run_ast_nodes\n",
-      "    if await self.run_code(code, result, async_=asy):\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3550, in run_code\n",
-      "    exec(code_obj, self.user_global_ns, self.user_ns)\n",
-      "  File \"/tmp/ipykernel_128086/1877710148.py\", line 7, in <module>\n",
-      "    import torch\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/__init__.py\", line 629, in <module>\n",
-      "    from .functional import *  # noqa: F403\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/functional.py\", line 6, in <module>\n",
-      "    import torch.nn.functional as F\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/nn/__init__.py\", line 1, in <module>\n",
-      "    from .modules import *  # noqa: F403\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/nn/modules/__init__.py\", line 2, in <module>\n",
-      "    from .linear import Identity, Linear, Bilinear, LazyLinear\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/nn/modules/linear.py\", line 6, in <module>\n",
-      "    from .. import functional as F\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/nn/functional.py\", line 11, in <module>\n",
-      "    from .._jit_internal import boolean_dispatch, _overload, BroadcastingList1, BroadcastingList2, BroadcastingList3\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/_jit_internal.py\", line 26, in <module>\n",
-      "    import torch.package._mangling as package_mangling\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/package/__init__.py\", line 12, in <module>\n",
-      "    from .package_importer import PackageImporter\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/package/package_importer.py\", line 26, in <module>\n",
-      "    from ._mock_zipreader import MockZipReader\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/package/_mock_zipreader.py\", line 17, in <module>\n",
-      "    _dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}\n",
-      "  File \"/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/package/_mock_zipreader.py\", line 17, in <dictcomp>\n",
-      "    _dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}\n",
-      "/home/PJLAB/gaoyufei/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/package/_mock_zipreader.py:17: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:67.)\n",
-      "  _dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}\n"
-     ]
-    }
-   ],
+   "outputs": [],
+   "source": [
+    "# %pip install numpy\n",
+    "# %pip install pickle\n",
+    "# %pip install transformers==4.28.1\n",
+    "# %pip install torch==2.1.2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import json\n",
     "import os\n",
     "import numpy as np\n",
     "import pickle as pkl\n",
-    "import random\n",
-    "import transformers\n",
-    "import torch\n"
+    "import torch\n",
+    "from transformers import AutoTokenizer\n",
+    "from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler\n",
+    "from transformers import AutoModelForCausalLM\n",
+    "from transformers import Adafactor, AdamW, get_linear_schedule_with_warmup"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "/home/PJLAB/gaoyufei/workdir/llm-deploy/docs/chapter2/code/ICL\n"
+      "/nas/shared/ADLab_Oasim/gaoyufei/ICL\n"
      ]
     }
    ],
@@ -137,7 +64,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 下载数据并解压到同级目录\n",
+    "# https://github.com/gyfffffff/llm-deploy/releases/download/icl_data/icl_data.zip\n",
+    "\n",
+    "# 使用Linux 系统的同学也可以直接运行：\n",
+    "# !wget https://github.com/gyfffffff/llm-deploy/releases/download/icl_data/icl_data.zip\n",
+    "\n",
+    "# !unzip icl_data.zip -d data/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
@@ -293,14 +235,14 @@
        "   'options': ['false', 'true']}]]"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "# 读取数据\n",
-    "train_data_files = [\"../../data/dream/dream_16_13_dev.jsonl\", \"../../data/wiki_qa/wiki_qa_16_13_dev.jsonl\"]\n",
+    "train_data_files = [\"data/icl_data/dream/dream_16_13_dev.jsonl\", \"data/icl_data/wiki_qa/wiki_qa_16_13_dev.jsonl\"]\n",
     "train_data = []\n",
     "for train_data_file in train_data_files:\n",
     "    with open(train_data_file, \"r\") as f:\n",
@@ -313,32 +255,44 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 下载模型 (这里教师模型和学生模型都用GPT-2)\n",
+    "\n",
+    "# windows\n",
+    "# %pip install -U \"huggingface-hub[cli]\"\n",
+    "# !$env:HF_ENDPOINT = \"https://hf-mirror.com\"\n",
+    "# !huggingface-cli download --resume-download openai-community/gpt2 --local-dir ../../models/GPT-2\n",
+    "\n",
+    "# linux\n",
+    "# %pip install -U \"huggingface-hub[cli]\"\n",
+    "# !export HF_ENDPOINT=https://hf-mirror.com\n",
+    "# !huggingface-cli download --resume-download gpt2 --local-dir ../models/GPT-2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [
-    {
-     "ename": "RuntimeError",
-     "evalue": "Numpy is not available",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[4], line 139\u001b[0m\n\u001b[1;32m    137\u001b[0m     dataloader \u001b[38;5;241m=\u001b[39m DataLoader(dataset, sampler\u001b[38;5;241m=\u001b[39msampler, batch_size\u001b[38;5;241m=\u001b[39mbatch_size)\n\u001b[1;32m    138\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m dataloader\n\u001b[0;32m--> 139\u001b[0m dataloader \u001b[38;5;241m=\u001b[39m \u001b[43mtensorize_for_training\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtrain_data\u001b[49m\u001b[43m)\u001b[49m\n",
-      "Cell \u001b[0;32mIn[4], line 113\u001b[0m, in \u001b[0;36mtensorize_for_training\u001b[0;34m(train_data, is_training)\u001b[0m\n\u001b[1;32m    111\u001b[0m     out \u001b[38;5;241m=\u001b[39m _tensorize_for_training(in_)\n\u001b[1;32m    112\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m key \u001b[38;5;129;01min\u001b[39;00m [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mattention_mask\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtoken_type_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m]:\n\u001b[0;32m--> 113\u001b[0m         inputs[key] \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[43mout\u001b[49m\u001b[43m[\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m]\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnumpy\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39mtolist()\n\u001b[1;32m    115\u001b[0m N \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlen\u001b[39m(inputs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m    116\u001b[0m indices \u001b[38;5;241m=\u001b[39m np\u001b[38;5;241m.\u001b[39mrandom\u001b[38;5;241m.\u001b[39mpermutation(\u001b[38;5;28mrange\u001b[39m(N))\n",
-      "\u001b[0;31mRuntimeError\u001b[0m: Numpy is not available"
-     ]
-    }
-   ],
+   "source": [
+    "## 数据处理"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "max_length=1024\n",
     "max_length_per_example=256\n",
-    "model_path = \"../../models/GPT-2\"\n",
+    "model_path = \"../models/GPT-2\"\n",
     "batch_size = 4\n",
     "\n",
-    "from transformers import AutoTokenizer\n",
     "tokenizer = AutoTokenizer.from_pretrained(model_path)\n",
-    "from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler\n",
     "\n",
+    "# 定义数据处理函数\n",
     "def prepro_sentence_pair_single(ids1, ids2, max_length,\n",
     "                                bos_token_id, eos_token_id,\n",
     "                                allow_truncation=False):\n",
@@ -359,7 +313,6 @@
     "    return input_ids, attention_mask, token_type_ids\n",
     "\n",
     "\n",
-    "\n",
     "def _prepro_each_datapoint(dp, is_first=True, is_training=False, for_demonstrations=False):\n",
     "    dp = dp.copy()\n",
     "\n",
@@ -408,7 +361,7 @@
     "\n",
     "\n",
     "def _tensorize_for_training(train_data):\n",
-    "    for dp in train_data:\n",
+    "    for dp in train_data:  # train_data： [{\"input\": str, \"output\": str}, ...]\n",
     "        assert type(dp)==dict, (\"Each example should be a dictionary\", dp)\n",
     "        assert \"input\" in dp and \"output\" in dp, (\"Training example should contain input and output\", dp)\n",
     "\n",
@@ -418,7 +371,6 @@
     "\n",
     "    input_ids, attention_mask, token_type_ids = [], [], []\n",
     "\n",
-    "\n",
     "    for dp in train_data:\n",
     "        inputs, outputs = _prepro_each_datapoint(\n",
     "            dp, is_first=True, is_training=True)\n",
@@ -432,27 +384,51 @@
     "\n",
     "    return dict(input_ids=torch.LongTensor(input_ids),\n",
     "                attention_mask=torch.LongTensor(attention_mask),\n",
-    "                token_type_ids=torch.LongTensor(token_type_ids))\n",
-    "\n",
+    "                token_type_ids=torch.LongTensor(token_type_ids))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Finish saving preprocessed data ...\n"
+     ]
+    }
+   ],
+   "source": [
     "# 数据转为tensor\n",
     "\n",
+    "# 数据格式：\n",
+    "# [[{}, {}, ...], [{}, {}, ...], ...]  \n",
+    "# 每个子列表是一个数据集，每个字典是一个数据点\n",
+    "\n",
+    "\n",
     "def tensorize_for_training(train_data, is_training=True):\n",
     "    inputs = {\"input_ids\": [], \"attention_mask\": [], \"token_type_ids\": []}\n",
-    "\n",
+    "    \n",
+    "    # 张量化每一条数据\n",
     "    for in_ in train_data:\n",
     "        out = _tensorize_for_training(in_)\n",
     "        for key in [\"input_ids\", \"attention_mask\", \"token_type_ids\"]:\n",
     "            inputs[key] += out[key].numpy().tolist()\n",
     "\n",
+    "    # 数据打乱\n",
     "    N = len(inputs[\"input_ids\"])\n",
     "    indices = np.random.permutation(range(N))\n",
     "    for k, v in inputs.items():\n",
     "        inputs[k] = np.array(v)[indices]\n",
     "\n",
-    "    with open(\"../../data/preprocessed_data.kpl\", \"wb\") as f:\n",
+    "    # 保存数据\n",
+    "    with open(\"data/preprocessed_data.kpl\", \"wb\") as f:\n",
     "        pkl.dump({k:v for k, v in inputs.items()}, f)\n",
     "    print(\"Finish saving preprocessed data ...\")\n",
     "\n",
+    "    # 定义dataset，dataloader\n",
     "    for k, v in inputs.items():\n",
     "        inputs[k] = torch.LongTensor(v)\n",
     "    shape = inputs[\"input_ids\"].shape\n",
@@ -471,6 +447,51 @@
     "dataloader = tensorize_for_training(train_data)\n"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 加载wikitext数据集\n",
+    "from datasets import load_dataset\n",
+    "tokenizer.pad_token = tokenizer.eos_token \n",
+    "dataset = load_dataset(\"./data/wikitext\", 'wikitext-103-raw-v1', split='validation')\n",
+    "\n",
+    "def encode(examples):\n",
+    "    return tokenizer(examples[\"text\"], truncation=True, max_length=max_length, padding=\"max_length\")\n",
+    "\n",
+    "dataset = dataset.map(encode, batched=True)\n",
+    "\n",
+    "dataset.set_format(type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"text\"])\n",
+    "text_dataloader = torch.utils.data.DataLoader(dataset, batch_size=4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'text': ['', ' = Homarus gammarus = \\n', '', ' Homarus gammarus , known as the European lobster or common lobster , is a species of clawed lobster from the eastern Atlantic Ocean , Mediterranean Sea and parts of the Black Sea . It is closely related to the American lobster , H. americanus . It may grow to a length of 60 cm ( 24 in ) and a mass of 6 kilograms ( 13 lb ) , and bears a conspicuous pair of claws . In life , the lobsters are blue , only becoming \" lobster red \" on cooking . Mating occurs in the summer , producing eggs which are carried by the females for up to a year before hatching into planktonic larvae . Homarus gammarus is a highly esteemed food , and is widely caught using lobster pots , mostly around the British Isles . \\n'], 'input_ids': tensor([[50256, 50256, 50256,  ..., 50256, 50256, 50256],\n",
+      "        [  796,  8074, 20272,  ..., 50256, 50256, 50256],\n",
+      "        [50256, 50256, 50256,  ..., 50256, 50256, 50256],\n",
+      "        [ 8074, 20272,  9106,  ..., 50256, 50256, 50256]]), 'attention_mask': tensor([[0, 0, 0,  ..., 0, 0, 0],\n",
+      "        [1, 1, 1,  ..., 0, 0, 0],\n",
+      "        [0, 0, 0,  ..., 0, 0, 0],\n",
+      "        [1, 1, 1,  ..., 0, 0, 0]])}\n"
+     ]
+    }
+   ],
+   "source": [
+    "for batch in text_dataloader:\n",
+    "    print(batch)\n",
+    "    break"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -480,14 +501,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 20,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "11/21/2024 12:28:12 - INFO - __main__ - output\n"
+      "12/21/2024 09:53:10 - INFO - __main__ - output\n"
      ]
     }
    ],
@@ -495,8 +516,9 @@
     "import logging\n",
     "\n",
     "# 指定日志路径\n",
-    "log_file = \"log.txt\"\n",
+    "\n",
     "out_dir = \"output\"\n",
+    "log_file = f\"{out_dir}/log.txt\"\n",
     "handlers = [logging.StreamHandler()]\n",
     "handlers.append(logging.FileHandler(log_file))\n",
     "logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',\n",
@@ -512,45 +534,117 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 定义不同的损失函数\n",
+    "device = torch.device(\"cpu\")\n",
+    "model = AutoModelForCausalLM.from_pretrained(model_path)\n",
+    "model.to(device)\n",
+    "def run_model_meta_icl(input_ids, attention_mask, token_type_ids, labels=None):\n",
+    "        outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
+    "        logits = outputs.logits[..., :-1, :].contiguous()\n",
+    "\n",
+    "        if labels is None:\n",
+    "            labels = input_ids\n",
+    "        labels = labels[..., 1:].contiguous()\n",
+    "        label_mask = token_type_ids[..., 1:].contiguous()\n",
+    "\n",
+    "        loss_fct = torch.nn.CrossEntropyLoss(reduction=\"none\")\n",
+    "        losses = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1)) # [batch_size, length]\n",
+    "\n",
+    "        losses = losses.view(logits.size(0), logits.size(1)) * label_mask\n",
+    "        return torch.sum(losses, axis=1) / torch.sum(label_mask, axis=1)\n",
+    "\n",
+    "def run_model_icl_distill(input_ids, attention_mask, token_type_ids, text_input_ids, text_attention_mask, step, labels=None):\n",
+    "        beta = 0.2\n",
+    "        stu_outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
+    "        stu_logits = stu_outputs.logits[..., :-1, :].contiguous()\n",
+    "        prob_stu = torch.nn.functional.softmax(stu_logits, dim=-1)\n",
+    "\n",
+    "        tea_outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
+    "        tea_logits = tea_outputs.logits[..., :-1, :].contiguous()\n",
+    "        log_tea_prob = torch.nn.functional.log_softmax(tea_logits, dim=-1)\n",
+    "\n",
+    "        # soft icl loss: 学生和教师输出的交叉熵\n",
+    "        soft_icl_loss = -torch.sum(prob_stu * log_tea_prob, axis=-1)\n",
+    "        # soft_loss_fct = torch.nn.CrossEntropyLoss()\n",
+    "        # print(stu_logits.view(-1, stu_logits.size(-1)).shape, soft_targets.view(-1).shape)\n",
+    "        # soft_icl_loss = soft_loss_fct(stu_logits.view(-1, stu_logits.size(-1)), soft_targets.view(-1))\n",
+    "        \n",
+    "        # soft_losses = soft_loss_fct(stu_logits.view(-1, stu_logits.size(-1)), soft_targets.view(-1).argmax(-1))\n",
+    "\n",
+    "        # soft_losses = soft_losses.view(stu_logits.size(0), stu_logits.size(1))\n",
+    "        # soft_losses = torch.sum(soft_losses, axis=1)\n",
+    "\n",
+    "        # soft lm loss: 学生和教师输出的交叉熵\n",
+    "        stu_text_outputs = model(input_ids=text_input_ids, attention_mask=text_attention_mask)\n",
+    "        stu_text_logits = stu_text_outputs.logits[..., :-1, :].contiguous()\n",
+    "        prob_stu_text = torch.nn.functional.softmax(stu_text_logits, dim=-1)\n",
+    "        \n",
+    "        tea_text_outputs = model(input_ids=text_input_ids, attention_mask=text_attention_mask)\n",
+    "        tea_text_logits = tea_text_outputs.logits[..., :-1, :].contiguous()\n",
+    "        log_tea_prob_text = torch.nn.functional.softmax(tea_text_logits, dim=-1)\n",
+    "\n",
+    "        soft_lm_loss = -torch.sum(prob_stu_text * log_tea_prob_text, axis=-1)\n",
+    "    \n",
+    "        # soft_text_loss_fct = torch.nn.CrossEntropyLoss()\n",
+    "        # soft_text_losses = soft_text_loss_fct(stu_text_logits.view(-1, stu_text_logits.size(-1)), soft_text_targets.view(-1))\n",
+    "\n",
+    "        # soft_text_losses = soft_text_losses.view(stu_text_logits.size(0), stu_text_logits.size(1))\n",
+    "\n",
+    "        soft_loss = soft_icl_loss + beta * soft_lm_loss\n",
+    "\n",
+    "        # hard icl loss: 学生和ground-truth的交叉熵\n",
+    "        if labels is None:\n",
+    "            labels = input_ids\n",
+    "        labels = labels[..., 1:].contiguous()\n",
+    "        label_mask = token_type_ids[..., 1:].contiguous()\n",
+    "\n",
+    "        loss_fct = torch.nn.CrossEntropyLoss(reduction=\"none\")\n",
+    "        losses = loss_fct(stu_logits.view(-1, stu_logits.size(-1)), labels.view(-1)) # [batch_size, length]\n",
+    "\n",
+    "        hard_icl_losses = losses.view(stu_logits.size(0), stu_logits.size(1)) * label_mask\n",
+    "        hard_icl_loss = torch.sum(hard_icl_losses, axis=1) / torch.sum(label_mask, axis=1)\n",
+    "\n",
+    "        # hard lm loss: 学生和ground-truth的交叉熵\n",
+    "        text_outputs = model(input_ids=text_input_ids, attention_mask=text_attention_mask)\n",
+    "        labels_text = input_ids[..., 1:].contiguous()\n",
+    "        logits_text = text_outputs.logits[..., :-1, :].contiguous()\n",
+    "        loss_fct = torch.nn.CrossEntropyLoss()\n",
+    "        hard_lm_loss = loss_fct(logits_text.view(-1, logits_text.size(-1)), labels_text.view(-1))\n",
+    "        \n",
+    "        hard_loss = hard_icl_loss + beta * hard_lm_loss\n",
+    "\n",
+    "        # \"We linearly decrease the weight of hard-label loss α(t) and linearly increase the weight of soft-label loss during training.\"\n",
+    "        alpha = 1-step/len(dataloader)\n",
+    "\n",
+    "        loss = alpha * torch.sum(hard_loss) + (1-alpha) * soft_loss\n",
+    "\n",
+    "        return loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "11/21/2024 12:30:07 - INFO - __main__ - global step 5\t train loss 4.04\n"
-     ]
-    },
-    {
-     "ename": "KeyboardInterrupt",
-     "evalue": "",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[8], line 98\u001b[0m\n\u001b[1;32m     95\u001b[0m                         \u001b[38;5;28;01mbreak\u001b[39;00m\n\u001b[1;32m     97\u001b[0m         logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mFinish training\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m---> 98\u001b[0m \u001b[43mdo_train\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbatch_size\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnum_training_steps\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msave_period\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlog_period\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlog_period\u001b[49m\u001b[43m)\u001b[49m\n",
-      "Cell \u001b[0;32mIn[8], line 74\u001b[0m, in \u001b[0;36mdo_train\u001b[0;34m(data, batch_size, num_training_steps, save_period, log_period, gradient_accumulation_steps, max_grad_norm)\u001b[0m\n\u001b[1;32m     70\u001b[0m loss \u001b[38;5;241m=\u001b[39m loss\u001b[38;5;241m.\u001b[39mmean()\n\u001b[1;32m     72\u001b[0m train_losses\u001b[38;5;241m.\u001b[39mappend(loss\u001b[38;5;241m.\u001b[39mdetach()\u001b[38;5;241m.\u001b[39mcpu())\n\u001b[0;32m---> 74\u001b[0m \u001b[43mloss\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     76\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m global_step \u001b[38;5;241m%\u001b[39m gradient_accumulation_steps \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[1;32m     77\u001b[0m         torch\u001b[38;5;241m.\u001b[39mnn\u001b[38;5;241m.\u001b[39mutils\u001b[38;5;241m.\u001b[39mclip_grad_norm_(model\u001b[38;5;241m.\u001b[39mparameters(), max_grad_norm)\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/_tensor.py:581\u001b[0m, in \u001b[0;36mTensor.backward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m    571\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m has_torch_function_unary(\u001b[38;5;28mself\u001b[39m):\n\u001b[1;32m    572\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m handle_torch_function(\n\u001b[1;32m    573\u001b[0m         Tensor\u001b[38;5;241m.\u001b[39mbackward,\n\u001b[1;32m    574\u001b[0m         (\u001b[38;5;28mself\u001b[39m,),\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    579\u001b[0m         inputs\u001b[38;5;241m=\u001b[39minputs,\n\u001b[1;32m    580\u001b[0m     )\n\u001b[0;32m--> 581\u001b[0m \u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mautograd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    582\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgradient\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43minputs\u001b[49m\n\u001b[1;32m    583\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/autograd/__init__.py:347\u001b[0m, in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m    342\u001b[0m     retain_graph \u001b[38;5;241m=\u001b[39m create_graph\n\u001b[1;32m    344\u001b[0m \u001b[38;5;66;03m# The reason we repeat the same comment below is that\u001b[39;00m\n\u001b[1;32m    345\u001b[0m \u001b[38;5;66;03m# some Python versions print out the first line of a multi-line function\u001b[39;00m\n\u001b[1;32m    346\u001b[0m \u001b[38;5;66;03m# calls in the traceback and some print out the last line\u001b[39;00m\n\u001b[0;32m--> 347\u001b[0m \u001b[43m_engine_run_backward\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    348\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtensors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    349\u001b[0m \u001b[43m    \u001b[49m\u001b[43mgrad_tensors_\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    350\u001b[0m \u001b[43m    \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    351\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    352\u001b[0m \u001b[43m    \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    353\u001b[0m \u001b[43m    \u001b[49m\u001b[43mallow_unreachable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    354\u001b[0m \u001b[43m    \u001b[49m\u001b[43maccumulate_grad\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    355\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/torch/autograd/graph.py:825\u001b[0m, in \u001b[0;36m_engine_run_backward\u001b[0;34m(t_outputs, *args, **kwargs)\u001b[0m\n\u001b[1;32m    823\u001b[0m     unregister_hooks \u001b[38;5;241m=\u001b[39m _register_logging_hooks_on_whole_graph(t_outputs)\n\u001b[1;32m    824\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 825\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mVariable\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execution_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun_backward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m  \u001b[49m\u001b[38;5;66;43;03m# Calls into the C++ engine to run the backward pass\u001b[39;49;00m\n\u001b[1;32m    826\u001b[0m \u001b[43m        \u001b[49m\u001b[43mt_outputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m    827\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m  \u001b[38;5;66;03m# Calls into the C++ engine to run the backward pass\u001b[39;00m\n\u001b[1;32m    828\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m    829\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m attach_logging_hooks:\n",
-      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+      "12/21/2024 10:02:59 - INFO - __main__ - Saving model parameters at step=4\n"
      ]
     }
    ],
    "source": [
-    "import torch\n",
-    "\n",
-    "device = torch.device(\"cpu\")\n",
-    "from transformers import AutoModelForCausalLM\n",
-    "model = AutoModelForCausalLM.from_pretrained(model_path)\n",
-    "model.to(device)\n",
-    "from transformers import Adafactor, AdamW, get_linear_schedule_with_warmup\n",
-    "\n",
+    "# 训练\n",
     "save_period = 10\n",
     "log_period = 5\n",
     "batch_size = 4\n",
     "num_training_steps = 1000\n",
+    "epoches = 100\n",
     "\n",
     "no_decay = ['bias', 'LayerNorm.weight']\n",
     "optimizer_grouped_parameters = [\n",
@@ -558,45 +652,23 @@
     "        {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}\n",
     "]\n",
     "\n",
+    "optimizer = AdamW(optimizer_grouped_parameters, lr=1e-05, eps=1e-08)\n",
+    "scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)   \n",
     "\n",
-    "optimizer = AdamW(optimizer_grouped_parameters,\n",
-    "                    lr=1e-05,\n",
-    "                    eps=1e-08)\n",
-    "scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)\n",
-    "        \n",
     "model.train()\n",
     "\n",
-    "def run_model(input_ids, attention_mask, token_type_ids, labels=None):\n",
-    "        outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
-    "        logits = outputs.logits[..., :-1, :].contiguous()\n",
-    "\n",
-    "        if labels is None:\n",
-    "            labels = input_ids\n",
-    "        labels = labels[..., 1:].contiguous()\n",
-    "        label_mask = token_type_ids[..., 1:].contiguous()\n",
-    "\n",
-    "        loss_fct = torch.nn.CrossEntropyLoss(reduction=\"none\")\n",
-    "        losses = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1)) # [batch_size, length]\n",
-    "\n",
-    "        losses = losses.view(logits.size(0), logits.size(1)) * label_mask\n",
-    "        return torch.sum(losses, axis=1) / torch.sum(label_mask, axis=1)\n",
-    "\n",
     "def save(step):\n",
     "        model_state_dict = {key[7:] if key.startswith(\"module.\") else key: value.cpu()\n",
     "                        for key, value in model.state_dict().items()}\n",
     "        torch.save(model_state_dict, os.path.join(out_dir, \"model-{}.pt\".format(step)))\n",
     "        logger.info(\"Saving model parameters at step=%d\" % step)\n",
     "\n",
-    "def do_train(data, batch_size, num_training_steps, save_period, log_period, gradient_accumulation_steps=1, max_grad_norm=1.0):\n",
-    "        n_trainable_params = len([param for param in model.parameters() if param.requires_grad])\n",
-    "        n_gpus = 1\n",
+    "def do_train(num_training_steps, save_period, log_period, gradient_accumulation_steps=1, max_grad_norm=1.0):\n",
     "        global_step = 0\n",
     "        train_losses = []\n",
-    "        best_accuracy = -1\n",
-    "        stop_training=False\n",
-    "\n",
-    "        for epoch in range(num_training_steps):\n",
-    "                for batch in dataloader:\n",
+    "        dataloader2 = list(text_dataloader)\n",
+    "        for epoch in range(epoches):\n",
+    "                for step, batch in enumerate(dataloader):\n",
     "                        global_step += 1\n",
     "\n",
     "                        input_ids=batch[0].to(device)\n",
@@ -606,8 +678,10 @@
     "                                labels=None\n",
     "                        else:\n",
     "                                labels=batch[3].to(device)\n",
-    "\n",
-    "                        loss = run_model(input_ids, attention_mask, token_type_ids=token_type_ids, labels=labels)\n",
+    "                        text_input_ids = dataloader2[global_step%len(dataloader2)][\"input_ids\"].to(device)\n",
+    "                        text_attention_mask = dataloader2[global_step%len(dataloader2)][\"attention_mask\"].to(device)\n",
+    "                        # loss = run_model_meta_icl(input_ids, attention_mask, token_type_ids=token_type_ids, labels=labels)\n",
+    "                        loss = run_model_icl_distill(input_ids, attention_mask, token_type_ids=token_type_ids, labels=labels, step=global_step, text_input_ids=text_input_ids, text_attention_mask=text_attention_mask)\n",
     "                        loss = loss.mean()\n",
     "\n",
     "                        train_losses.append(loss.detach().cpu())\n",
@@ -636,7 +710,7 @@
     "                        break\n",
     "\n",
     "        logger.info(\"Finish training\")\n",
-    "do_train(data, batch_size, num_training_steps, save_period, log_period=log_period)"
+    "do_train(data, batch_size, num_training_steps, save_period, log_period)"
    ]
   },
   {
diff --git a/docs/chapter2/code/InstructFollowing/InstructFollowing_distill.ipynb b/docs/chapter2/code/InstructFollowing/InstructFollowing_distill.ipynb
index b037751..2339e2b 100644
--- a/docs/chapter2/code/InstructFollowing/InstructFollowing_distill.ipynb
+++ b/docs/chapter2/code/InstructFollowing/InstructFollowing_distill.ipynb
@@ -5,6 +5,19 @@
    "execution_count": 2,
    "metadata": {},
    "outputs": [],
+   "source": [
+    "# %pip install transformers==4.28.1\n",
+    "# %pip install torch==2.1.2\n",
+    "# %pip install datasets==2.14.5\n",
+    "\n",
+    "# 更多环境信息，可以查看 code/env.yml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "# 导入必要包\n",
     "import torch\n",
@@ -13,29 +26,60 @@
     "import datasets"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 下载数据\n",
+    "# https://github.com/gyfffffff/llm-deploy/releases/tag/IF_data\n",
+    "# 将下载的两个json文件放入同级目录data下"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 下载模型\n",
+    "\n",
+    "# windows\n",
+    "# %pip install -U \"huggingface-hub[cli]\"\n",
+    "# !$env:HF_ENDPOINT = \"https://hf-mirror.com\"\n",
+    "# !huggingface-cli download --resume-download openai-community/gpt2 --local-dir ../../models/GPT-2\n",
+    "\n",
+    "# linux\n",
+    "# %pip install -U \"huggingface-hub[cli]\"\n",
+    "# !export HF_ENDPOINT=https://hf-mirror.com\n",
+    "# !huggingface-cli download --resume-download openai-community/gpt2 --local-dir ../../models/GPT-2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
    "outputs": [
     {
-     "ename": "TypeError",
-     "evalue": "Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`.",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[5], line 64\u001b[0m\n\u001b[1;32m     62\u001b[0m flatdata \u001b[38;5;241m=\u001b[39m get_flat_data(datapath)\n\u001b[1;32m     63\u001b[0m dataset \u001b[38;5;241m=\u001b[39m datasets\u001b[38;5;241m.\u001b[39mDataset\u001b[38;5;241m.\u001b[39mfrom_dict(flatdata)\n\u001b[0;32m---> 64\u001b[0m dataset \u001b[38;5;241m=\u001b[39m \u001b[43mdataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmap\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtokenize\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbatched\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbatch_size\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mlen\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mdataset\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m  \u001b[38;5;66;03m# 使用map方法对数据集进行批处理\u001b[39;00m\n\u001b[1;32m     65\u001b[0m dataset\u001b[38;5;241m.\u001b[39mset_format(\u001b[38;5;28mtype\u001b[39m\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtorch\u001b[39m\u001b[38;5;124m\"\u001b[39m, columns\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mattention_mask\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlabels\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m     66\u001b[0m dataloader \u001b[38;5;241m=\u001b[39m DataLoader(dataset, batch_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m4\u001b[39m, shuffle\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/datasets/arrow_dataset.py:1289\u001b[0m, in \u001b[0;36mDataset.map\u001b[0;34m(self, function, with_indices, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint)\u001b[0m\n\u001b[1;32m   1287\u001b[0m test_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m[:\u001b[38;5;241m2\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m batched \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m   1288\u001b[0m test_indices \u001b[38;5;241m=\u001b[39m [\u001b[38;5;241m0\u001b[39m, \u001b[38;5;241m1\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m batched \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m-> 1289\u001b[0m update_data \u001b[38;5;241m=\u001b[39m \u001b[43mdoes_function_return_dict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtest_inputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtest_indices\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1290\u001b[0m logger\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTesting finished, running the mapping function on the dataset\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1292\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m num_proc \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m num_proc \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n",
-      "File \u001b[0;32m~/anaconda3/envs/llmdeploy/lib/python3.9/site-packages/datasets/arrow_dataset.py:1276\u001b[0m, in \u001b[0;36mDataset.map.<locals>.does_function_return_dict\u001b[0;34m(inputs, indices)\u001b[0m\n\u001b[1;32m   1272\u001b[0m     all_dict_values_are_lists \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mall\u001b[39m(\n\u001b[1;32m   1273\u001b[0m         \u001b[38;5;28misinstance\u001b[39m(value, allowed_batch_return_types) \u001b[38;5;28;01mfor\u001b[39;00m value \u001b[38;5;129;01min\u001b[39;00m processed_inputs\u001b[38;5;241m.\u001b[39mvalues()\n\u001b[1;32m   1274\u001b[0m     )\n\u001b[1;32m   1275\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m all_dict_values_are_lists \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mFalse\u001b[39;00m:\n\u001b[0;32m-> 1276\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(\n\u001b[1;32m   1277\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mProvided `function` which is applied to all elements of table returns a `dict` of types \u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m. When using `batched=True`, make sure provided `function` returns a `dict` of types like `\u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m`.\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mformat(\n\u001b[1;32m   1278\u001b[0m                 [\u001b[38;5;28mtype\u001b[39m(x) \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m processed_inputs\u001b[38;5;241m.\u001b[39mvalues()], allowed_batch_return_types\n\u001b[1;32m   1279\u001b[0m             )\n\u001b[1;32m   1280\u001b[0m         )\n\u001b[1;32m   1282\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m does_return_dict\n",
-      "\u001b[0;31mTypeError\u001b[0m: Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`."
-     ]
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0e2bbfb7283745f2a93d8436c99e5e8b",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/12 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
     }
    ],
    "source": [
     "# 准备数据\n",
-    "# datapath = '../../data/alpaca_gpt4_data_dev.json'  # dev数据只有少量数据，用于开发，实际训练时请使用full数据集\n",
-    "datapath = '../../data/alpaca_gpt4_data.json'  # full数据集\n",
+    "datapath = 'data/alpaca_gpt4_data_dev.json'  # dev数据只有少量数据，用于开发，实际训练时请使用full数据集\n",
+    "# datapath = 'data/alpaca_gpt4_data.json'  # full数据集\n",
     "\n",
     "# 定义tokenizer\n",
     "from transformers import GPT2Tokenizer\n",
@@ -104,20 +148,426 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch: 0, step: 0, loss: 8.52311897277832\n",
-      "val step: 0, loss: 5.500304698944092\n",
-      "epoch: 1, step: 0, loss: 5.841464042663574\n",
-      "val step: 0, loss: 4.260898113250732\n",
-      "epoch: 2, step: 0, loss: 4.249709129333496\n",
-      "val step: 0, loss: 3.278724193572998\n"
+      "epoch: 0, step: 0, loss: 7.793501853942871\n"
      ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">&lt;module&gt;</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">39</span>                                                                                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">36 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>torch.save(model.state_dict(), <span style=\"color: #808000; text-decoration-color: #808000\">f\"model_{</span>epoch<span style=\"color: #808000; text-decoration-color: #808000\">}.pt\"</span>)                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">37 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>val()                                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">38 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>39 train()                                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">40 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">train</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">30</span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">27 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>optimizer = torch.optim.Adam(model.parameters(), lr=<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1e-5</span>)                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">28 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">for</span> epoch <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">in</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">range</span>(<span style=\"color: #0000ff; text-decoration-color: #0000ff\">3</span>):                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">29 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">for</span> step, batch <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">in</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">enumerate</span>(dataloader):                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>30 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>loss = train_step(batch)                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">31 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>loss.backward()                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">32 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>optimizer.step()                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">33 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>optimizer.zero_grad()                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">train_step</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">13</span>                                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">10 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #808000; text-decoration-color: #808000\">\"attention_mask\"</span>: batch[<span style=\"color: #808000; text-decoration-color: #808000\">\"attention_mask\"</span>],                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">11 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #808000; text-decoration-color: #808000\">\"labels\"</span>: batch[<span style=\"color: #808000; text-decoration-color: #808000\">\"labels\"</span>],                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">12 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>}                                                                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>13 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>res = model(**kwargs)[<span style=\"color: #808000; text-decoration-color: #808000\">\"loss\"</span>]                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">14 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> res                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">15 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">16 </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">val</span>():                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1518</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_wrapped_call_impl</span>     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1515 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1516 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl(*args, **kwargs)  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># type: ignore[misc]</span>        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1517 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1518 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._call_impl(*args, **kwargs)                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1519 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1520 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, *args, **kwargs):                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1521 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>forward_call = (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._slow_forward <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> torch._C._get_tracing_state() <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.fo  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1527</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1524 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._forward_hooks   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1525 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_hooks                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1526 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_pre_hooks):                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1527 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> forward_call(*args, **kwargs)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1528 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1529 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1530 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>result = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1075</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span> <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1072 </span><span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">│   │   </span><span style=\"color: #808000; text-decoration-color: #808000\">\"\"\"</span>                                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1073 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>return_dict = return_dict <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> return_dict <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.config.use_return  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1074 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1075 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>transformer_outputs = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.transformer(                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1076 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>input_ids,                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1077 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>past_key_values=past_key_values,                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1078 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>attention_mask=attention_mask,                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1518</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_wrapped_call_impl</span>     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1515 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1516 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl(*args, **kwargs)  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># type: ignore[misc]</span>        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1517 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1518 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._call_impl(*args, **kwargs)                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1519 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1520 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, *args, **kwargs):                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1521 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>forward_call = (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._slow_forward <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> torch._C._get_tracing_state() <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.fo  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1527</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1524 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._forward_hooks   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1525 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_hooks                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1526 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_pre_hooks):                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1527 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> forward_call(*args, **kwargs)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1528 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1529 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1530 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>result = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">899</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span>  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 896 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>encoder_attention_mask,                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 897 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>)                                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 898 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 899 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>outputs = block(                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 900 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>hidden_states,                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 901 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>layer_past=layer_past,                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 902 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>attention_mask=attention_mask,                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1518</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_wrapped_call_impl</span>     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1515 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1516 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl(*args, **kwargs)  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># type: ignore[misc]</span>        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1517 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1518 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._call_impl(*args, **kwargs)                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1519 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1520 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, *args, **kwargs):                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1521 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>forward_call = (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._slow_forward <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> torch._C._get_tracing_state() <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.fo  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1527</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1524 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._forward_hooks   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1525 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_hooks                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1526 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_pre_hooks):                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1527 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> forward_call(*args, **kwargs)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1528 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1529 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1530 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>result = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">389</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span>  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 386 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>) -&gt; Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tuple[torch.FloatTensor  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 387 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>residual = hidden_states                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 388 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>hidden_states = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.ln_1(hidden_states)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 389 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>attn_outputs = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.attn(                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 390 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>hidden_states,                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 391 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>layer_past=layer_past,                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 392 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>attention_mask=attention_mask,                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1518</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_wrapped_call_impl</span>     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1515 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1516 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl(*args, **kwargs)  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># type: ignore[misc]</span>        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1517 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1518 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._call_impl(*args, **kwargs)                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1519 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1520 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, *args, **kwargs):                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1521 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>forward_call = (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._slow_forward <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> torch._C._get_tracing_state() <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.fo  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1527</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1524 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._forward_hooks   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1525 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_hooks                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1526 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_pre_hooks):                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1527 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> forward_call(*args, **kwargs)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1528 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1529 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1530 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>result = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">330</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span>  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 327 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.reorder_and_upcast_attn:                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 328 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>attn_output, attn_weights = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._upcast_and_reordered_attn(query, key, valu  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 329 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 330 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>attn_output, attn_weights = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._attn(query, key, value, attention_mask, he  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 331 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 332 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>attn_output = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._merge_heads(attn_output, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.num_heads, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.head_dim)       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 333 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>attn_output = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.c_proj(attn_output)                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">211</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_attn</span>    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 208 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 209 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># Downcast (if necessary) back to V's dtype (if in mixed-precision) -- No-Op oth</span>  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 210 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>attn_weights = attn_weights.type(value.dtype)                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 211 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>attn_weights = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.attn_dropout(attn_weights)                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 212 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 213 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># Mask heads if we want to</span>                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 214 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> head_mask <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1518</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_wrapped_call_impl</span>     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1515 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">is</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1516 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._compiled_call_impl(*args, **kwargs)  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># type: ignore[misc]</span>        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1517 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1518 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._call_impl(*args, **kwargs)                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1519 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                      <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1520 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, *args, **kwargs):                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1521 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>forward_call = (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._slow_forward <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> torch._C._get_tracing_state() <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.fo  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1527</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">_call_impl</span>             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1524 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">not</span> (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._forward_hooks   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1525 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_pre_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_backward_hooks                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1526 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_hooks <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> _global_forward_pre_hooks):                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1527 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> forward_call(*args, **kwargs)                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1528 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1529 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1530 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span>result = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">None</span>                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\dropout.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">58</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span>                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 55 </span><span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">│   </span><span style=\"color: #808000; text-decoration-color: #808000\">\"\"\"</span>                                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 56 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span>                                                                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 57 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">forward</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>: Tensor) -&gt; Tensor:                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 58 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> F.dropout(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.p, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.training, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.inplace)                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 59 </span>                                                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 60 </span>                                                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 61 </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">class</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00; text-decoration: underline\">Dropout1d</span>(_DropoutNd):                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #808000; text-decoration-color: #808000\">e:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\functional.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1266</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">dropout</span>                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1263 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> handle_torch_function(dropout, (<span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>,), <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, p=p, training=training, i  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1264 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> p &lt; <span style=\"color: #0000ff; text-decoration-color: #0000ff\">0.0</span> <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> p &gt; <span style=\"color: #0000ff; text-decoration-color: #0000ff\">1.0</span>:                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1265 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">ValueError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">f\"dropout probability has to be between 0 and 1, but got {</span>p<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span>)   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1266 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> _VF.dropout_(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, p, training) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> inplace <span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span> _VF.dropout(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, p, traini  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1267 </span>                                                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1268 </span>                                                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">1269 </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">alpha_dropout</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>: Tensor, p: <span style=\"color: #00ffff; text-decoration-color: #00ffff\">float</span> = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">0.5</span>, training: <span style=\"color: #00ffff; text-decoration-color: #00ffff\">bool</span> = <span style=\"color: #0000ff; text-decoration-color: #0000ff\">False</span>, inplace: <span style=\"color: #00ffff; text-decoration-color: #00ffff\">bool</span> =  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">KeyboardInterrupt</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
+       "\u001b[31m│\u001b[0m in \u001b[92m<module>\u001b[0m:\u001b[94m39\u001b[0m                                                                                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m36 \u001b[0m\u001b[2m│   │   \u001b[0mtorch.save(model.state_dict(), \u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mmodel_\u001b[0m\u001b[33m{\u001b[0mepoch\u001b[33m}\u001b[0m\u001b[33m.pt\u001b[0m\u001b[33m\"\u001b[0m)                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m37 \u001b[0m\u001b[2m│   │   \u001b[0mval()                                                                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m38 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m39 train()                                                                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m40 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m in \u001b[92mtrain\u001b[0m:\u001b[94m30\u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m27 \u001b[0m\u001b[2m│   \u001b[0moptimizer = torch.optim.Adam(model.parameters(), lr=\u001b[94m1e-5\u001b[0m)                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m28 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mfor\u001b[0m epoch \u001b[95min\u001b[0m \u001b[96mrange\u001b[0m(\u001b[94m3\u001b[0m):                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m29 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mfor\u001b[0m step, batch \u001b[95min\u001b[0m \u001b[96menumerate\u001b[0m(dataloader):                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m30 \u001b[2m│   │   │   \u001b[0mloss = train_step(batch)                                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m31 \u001b[0m\u001b[2m│   │   │   \u001b[0mloss.backward()                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m32 \u001b[0m\u001b[2m│   │   │   \u001b[0moptimizer.step()                                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m33 \u001b[0m\u001b[2m│   │   │   \u001b[0moptimizer.zero_grad()                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m in \u001b[92mtrain_step\u001b[0m:\u001b[94m13\u001b[0m                                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m10 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[33m\"\u001b[0m\u001b[33mattention_mask\u001b[0m\u001b[33m\"\u001b[0m: batch[\u001b[33m\"\u001b[0m\u001b[33mattention_mask\u001b[0m\u001b[33m\"\u001b[0m],                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m11 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[33m\"\u001b[0m\u001b[33mlabels\u001b[0m\u001b[33m\"\u001b[0m: batch[\u001b[33m\"\u001b[0m\u001b[33mlabels\u001b[0m\u001b[33m\"\u001b[0m],                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m12 \u001b[0m\u001b[2m│   \u001b[0m}                                                                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m13 \u001b[2m│   \u001b[0mres = model(**kwargs)[\u001b[33m\"\u001b[0m\u001b[33mloss\u001b[0m\u001b[33m\"\u001b[0m]                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m14 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mreturn\u001b[0m res                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m15 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m16 \u001b[0m\u001b[94mdef\u001b[0m \u001b[92mval\u001b[0m():                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1518\u001b[0m in \u001b[92m_wrapped_call_impl\u001b[0m     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1515 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1516 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl(*args, **kwargs)  \u001b[2m# type: ignore[misc]\u001b[0m        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1517 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1518 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._call_impl(*args, **kwargs)                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1519 \u001b[0m\u001b[2m│   \u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1520 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_call_impl\u001b[0m(\u001b[96mself\u001b[0m, *args, **kwargs):                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1521 \u001b[0m\u001b[2m│   │   \u001b[0mforward_call = (\u001b[96mself\u001b[0m._slow_forward \u001b[94mif\u001b[0m torch._C._get_tracing_state() \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.fo  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1527\u001b[0m in \u001b[92m_call_impl\u001b[0m             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1524 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[95mnot\u001b[0m (\u001b[96mself\u001b[0m._backward_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._backward_pre_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._forward_hooks   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1525 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_backward_pre_hooks \u001b[95mor\u001b[0m _global_backward_hooks                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1526 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_forward_hooks \u001b[95mor\u001b[0m _global_forward_pre_hooks):                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1527 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m forward_call(*args, **kwargs)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1528 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1529 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1530 \u001b[0m\u001b[2m│   │   │   \u001b[0mresult = \u001b[94mNone\u001b[0m                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py\u001b[0m:\u001b[94m1075\u001b[0m in \u001b[92mforward\u001b[0m \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1072 \u001b[0m\u001b[2;33m│   │   \u001b[0m\u001b[33m\"\"\"\u001b[0m                                                                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1073 \u001b[0m\u001b[2m│   │   \u001b[0mreturn_dict = return_dict \u001b[94mif\u001b[0m return_dict \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.config.use_return  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1074 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1075 \u001b[2m│   │   \u001b[0mtransformer_outputs = \u001b[96mself\u001b[0m.transformer(                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1076 \u001b[0m\u001b[2m│   │   │   \u001b[0minput_ids,                                                                    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1077 \u001b[0m\u001b[2m│   │   │   \u001b[0mpast_key_values=past_key_values,                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1078 \u001b[0m\u001b[2m│   │   │   \u001b[0mattention_mask=attention_mask,                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1518\u001b[0m in \u001b[92m_wrapped_call_impl\u001b[0m     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1515 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1516 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl(*args, **kwargs)  \u001b[2m# type: ignore[misc]\u001b[0m        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1517 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1518 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._call_impl(*args, **kwargs)                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1519 \u001b[0m\u001b[2m│   \u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1520 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_call_impl\u001b[0m(\u001b[96mself\u001b[0m, *args, **kwargs):                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1521 \u001b[0m\u001b[2m│   │   \u001b[0mforward_call = (\u001b[96mself\u001b[0m._slow_forward \u001b[94mif\u001b[0m torch._C._get_tracing_state() \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.fo  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1527\u001b[0m in \u001b[92m_call_impl\u001b[0m             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1524 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[95mnot\u001b[0m (\u001b[96mself\u001b[0m._backward_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._backward_pre_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._forward_hooks   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1525 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_backward_pre_hooks \u001b[95mor\u001b[0m _global_backward_hooks                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1526 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_forward_hooks \u001b[95mor\u001b[0m _global_forward_pre_hooks):                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1527 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m forward_call(*args, **kwargs)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1528 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1529 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1530 \u001b[0m\u001b[2m│   │   │   \u001b[0mresult = \u001b[94mNone\u001b[0m                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py\u001b[0m:\u001b[94m899\u001b[0m in \u001b[92mforward\u001b[0m  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 896 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mencoder_attention_mask,                                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 897 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m)                                                                         \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 898 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                         \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 899 \u001b[2m│   │   │   │   \u001b[0moutputs = block(                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 900 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mhidden_states,                                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 901 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mlayer_past=layer_past,                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 902 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mattention_mask=attention_mask,                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1518\u001b[0m in \u001b[92m_wrapped_call_impl\u001b[0m     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1515 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1516 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl(*args, **kwargs)  \u001b[2m# type: ignore[misc]\u001b[0m        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1517 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1518 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._call_impl(*args, **kwargs)                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1519 \u001b[0m\u001b[2m│   \u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1520 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_call_impl\u001b[0m(\u001b[96mself\u001b[0m, *args, **kwargs):                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1521 \u001b[0m\u001b[2m│   │   \u001b[0mforward_call = (\u001b[96mself\u001b[0m._slow_forward \u001b[94mif\u001b[0m torch._C._get_tracing_state() \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.fo  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1527\u001b[0m in \u001b[92m_call_impl\u001b[0m             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1524 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[95mnot\u001b[0m (\u001b[96mself\u001b[0m._backward_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._backward_pre_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._forward_hooks   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1525 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_backward_pre_hooks \u001b[95mor\u001b[0m _global_backward_hooks                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1526 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_forward_hooks \u001b[95mor\u001b[0m _global_forward_pre_hooks):                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1527 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m forward_call(*args, **kwargs)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1528 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1529 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1530 \u001b[0m\u001b[2m│   │   │   \u001b[0mresult = \u001b[94mNone\u001b[0m                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py\u001b[0m:\u001b[94m389\u001b[0m in \u001b[92mforward\u001b[0m  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 386 \u001b[0m\u001b[2m│   \u001b[0m) -> Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tuple[torch.FloatTensor  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 387 \u001b[0m\u001b[2m│   │   \u001b[0mresidual = hidden_states                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 388 \u001b[0m\u001b[2m│   │   \u001b[0mhidden_states = \u001b[96mself\u001b[0m.ln_1(hidden_states)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 389 \u001b[2m│   │   \u001b[0mattn_outputs = \u001b[96mself\u001b[0m.attn(                                                         \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 390 \u001b[0m\u001b[2m│   │   │   \u001b[0mhidden_states,                                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 391 \u001b[0m\u001b[2m│   │   │   \u001b[0mlayer_past=layer_past,                                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 392 \u001b[0m\u001b[2m│   │   │   \u001b[0mattention_mask=attention_mask,                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1518\u001b[0m in \u001b[92m_wrapped_call_impl\u001b[0m     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1515 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1516 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl(*args, **kwargs)  \u001b[2m# type: ignore[misc]\u001b[0m        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1517 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1518 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._call_impl(*args, **kwargs)                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1519 \u001b[0m\u001b[2m│   \u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1520 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_call_impl\u001b[0m(\u001b[96mself\u001b[0m, *args, **kwargs):                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1521 \u001b[0m\u001b[2m│   │   \u001b[0mforward_call = (\u001b[96mself\u001b[0m._slow_forward \u001b[94mif\u001b[0m torch._C._get_tracing_state() \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.fo  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1527\u001b[0m in \u001b[92m_call_impl\u001b[0m             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1524 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[95mnot\u001b[0m (\u001b[96mself\u001b[0m._backward_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._backward_pre_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._forward_hooks   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1525 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_backward_pre_hooks \u001b[95mor\u001b[0m _global_backward_hooks                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1526 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_forward_hooks \u001b[95mor\u001b[0m _global_forward_pre_hooks):                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1527 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m forward_call(*args, **kwargs)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1528 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1529 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1530 \u001b[0m\u001b[2m│   │   │   \u001b[0mresult = \u001b[94mNone\u001b[0m                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py\u001b[0m:\u001b[94m330\u001b[0m in \u001b[92mforward\u001b[0m  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 327 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.reorder_and_upcast_attn:                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 328 \u001b[0m\u001b[2m│   │   │   \u001b[0mattn_output, attn_weights = \u001b[96mself\u001b[0m._upcast_and_reordered_attn(query, key, valu  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 329 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 330 \u001b[2m│   │   │   \u001b[0mattn_output, attn_weights = \u001b[96mself\u001b[0m._attn(query, key, value, attention_mask, he  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 331 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 332 \u001b[0m\u001b[2m│   │   \u001b[0mattn_output = \u001b[96mself\u001b[0m._merge_heads(attn_output, \u001b[96mself\u001b[0m.num_heads, \u001b[96mself\u001b[0m.head_dim)       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 333 \u001b[0m\u001b[2m│   │   \u001b[0mattn_output = \u001b[96mself\u001b[0m.c_proj(attn_output)                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\transformers\\models\\gpt2\\modeling_gpt2.py\u001b[0m:\u001b[94m211\u001b[0m in \u001b[92m_attn\u001b[0m    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 208 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 209 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# Downcast (if necessary) back to V's dtype (if in mixed-precision) -- No-Op oth\u001b[0m  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 210 \u001b[0m\u001b[2m│   │   \u001b[0mattn_weights = attn_weights.type(value.dtype)                                     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 211 \u001b[2m│   │   \u001b[0mattn_weights = \u001b[96mself\u001b[0m.attn_dropout(attn_weights)                                    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 212 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 213 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# Mask heads if we want to\u001b[0m                                                        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 214 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m head_mask \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                                         \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1518\u001b[0m in \u001b[92m_wrapped_call_impl\u001b[0m     \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1515 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl \u001b[95mis\u001b[0m \u001b[95mnot\u001b[0m \u001b[94mNone\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1516 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._compiled_call_impl(*args, **kwargs)  \u001b[2m# type: ignore[misc]\u001b[0m        \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1517 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1518 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m \u001b[96mself\u001b[0m._call_impl(*args, **kwargs)                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1519 \u001b[0m\u001b[2m│   \u001b[0m                                                                                      \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1520 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92m_call_impl\u001b[0m(\u001b[96mself\u001b[0m, *args, **kwargs):                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1521 \u001b[0m\u001b[2m│   │   \u001b[0mforward_call = (\u001b[96mself\u001b[0m._slow_forward \u001b[94mif\u001b[0m torch._C._get_tracing_state() \u001b[94melse\u001b[0m \u001b[96mself\u001b[0m.fo  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\module.py\u001b[0m:\u001b[94m1527\u001b[0m in \u001b[92m_call_impl\u001b[0m             \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1524 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[95mnot\u001b[0m (\u001b[96mself\u001b[0m._backward_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._backward_pre_hooks \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m._forward_hooks   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1525 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_backward_pre_hooks \u001b[95mor\u001b[0m _global_backward_hooks                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1526 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[95mor\u001b[0m _global_forward_hooks \u001b[95mor\u001b[0m _global_forward_pre_hooks):                   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1527 \u001b[2m│   │   │   \u001b[0m\u001b[94mreturn\u001b[0m forward_call(*args, **kwargs)                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1528 \u001b[0m\u001b[2m│   │   \u001b[0m                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1529 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                              \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1530 \u001b[0m\u001b[2m│   │   │   \u001b[0mresult = \u001b[94mNone\u001b[0m                                                                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\modules\\dropout.py\u001b[0m:\u001b[94m58\u001b[0m in \u001b[92mforward\u001b[0m                 \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 55 \u001b[0m\u001b[2;33m│   \u001b[0m\u001b[33m\"\"\"\u001b[0m                                                                                    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 56 \u001b[0m\u001b[2m│   \u001b[0m                                                                                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 57 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92mforward\u001b[0m(\u001b[96mself\u001b[0m, \u001b[96minput\u001b[0m: Tensor) -> Tensor:                                            \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 58 \u001b[2m│   │   \u001b[0m\u001b[94mreturn\u001b[0m F.dropout(\u001b[96minput\u001b[0m, \u001b[96mself\u001b[0m.p, \u001b[96mself\u001b[0m.training, \u001b[96mself\u001b[0m.inplace)                       \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 59 \u001b[0m                                                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 60 \u001b[0m                                                                                           \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m 61 \u001b[0m\u001b[94mclass\u001b[0m \u001b[4;92mDropout1d\u001b[0m(_DropoutNd):                                                               \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[33me:\\anaconda\\envs\\dl2\\lib\\site-packages\\torch\\nn\\functional.py\u001b[0m:\u001b[94m1266\u001b[0m in \u001b[92mdropout\u001b[0m                    \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1263 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mreturn\u001b[0m handle_torch_function(dropout, (\u001b[96minput\u001b[0m,), \u001b[96minput\u001b[0m, p=p, training=training, i  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1264 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mif\u001b[0m p < \u001b[94m0.0\u001b[0m \u001b[95mor\u001b[0m p > \u001b[94m1.0\u001b[0m:                                                                \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1265 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mValueError\u001b[0m(\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mdropout probability has to be between 0 and 1, but got \u001b[0m\u001b[33m{\u001b[0mp\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m)   \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1266 \u001b[2m│   \u001b[0m\u001b[94mreturn\u001b[0m _VF.dropout_(\u001b[96minput\u001b[0m, p, training) \u001b[94mif\u001b[0m inplace \u001b[94melse\u001b[0m _VF.dropout(\u001b[96minput\u001b[0m, p, traini  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1267 \u001b[0m                                                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1268 \u001b[0m                                                                                          \u001b[31m│\u001b[0m\n",
+       "\u001b[31m│\u001b[0m   \u001b[2m1269 \u001b[0m\u001b[94mdef\u001b[0m \u001b[92malpha_dropout\u001b[0m(\u001b[96minput\u001b[0m: Tensor, p: \u001b[96mfloat\u001b[0m = \u001b[94m0.5\u001b[0m, training: \u001b[96mbool\u001b[0m = \u001b[94mFalse\u001b[0m, inplace: \u001b[96mbool\u001b[0m =  \u001b[31m│\u001b[0m\n",
+       "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
+       "\u001b[1;91mKeyboardInterrupt\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
     }
    ],
    "source": [
diff --git a/docs/chapter2/code/env.yml b/docs/chapter2/code/env.yml
new file mode 100644
index 0000000..26f5ba4
--- /dev/null
+++ b/docs/chapter2/code/env.yml
@@ -0,0 +1,180 @@
+name: distill
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - ca-certificates=2024.11.26=h06a4308_0
+  - ld_impl_linux-64=2.40=h12ee557_0
+  - libffi=3.4.4=h6a678d5_1
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.15=h5eee18b_0
+  - pip=24.2=py39h06a4308_0
+  - python=3.9.20=he870216_1
+  - readline=8.2=h5eee18b_0
+  - setuptools=75.1.0=py39h06a4308_0
+  - sqlite=3.45.3=h5eee18b_0
+  - tk=8.6.14=h39e8969_0
+  - wheel=0.44.0=py39h06a4308_0
+  - xz=5.4.6=h5eee18b_1
+  - zlib=1.2.13=h5eee18b_1
+  - pip:
+      - accelerate==1.2.0
+      - aiohappyeyeballs==2.4.4
+      - aiohttp==3.11.11
+      - aiosignal==1.3.2
+      - annotated-types==0.7.0
+      - anyio==4.7.0
+      - argon2-cffi==23.1.0
+      - argon2-cffi-bindings==21.2.0
+      - arrow==1.3.0
+      - asttokens==3.0.0
+      - async-lru==2.0.4
+      - async-timeout==5.0.1
+      - attrs==24.2.0
+      - babel==2.16.0
+      - beautifulsoup4==4.12.3
+      - bleach==6.2.0
+      - certifi==2024.8.30
+      - cffi==1.17.1
+      - charset-normalizer==3.4.0
+      - click==8.1.7
+      - comm==0.2.2
+      - datasets==3.2.0
+      - debugpy==1.8.9
+      - decorator==5.1.1
+      - defusedxml==0.7.1
+      - dill==0.3.8
+      - docker-pycreds==0.4.0
+      - eval-type-backport==0.2.0
+      - exceptiongroup==1.2.2
+      - executing==2.1.0
+      - fastjsonschema==2.21.1
+      - filelock==3.16.1
+      - fqdn==1.5.1
+      - frozenlist==1.5.0
+      - fsspec==2024.9.0
+      - gitdb==4.0.11
+      - gitpython==3.1.43
+      - h11==0.14.0
+      - httpcore==1.0.7
+      - httpx==0.28.1
+      - huggingface-hub==0.26.5
+      - idna==3.10
+      - importlib-metadata==8.5.0
+      - ipykernel==6.29.5
+      - ipython==8.18.1
+      - ipywidgets==8.1.5
+      - isoduration==20.11.0
+      - jedi==0.19.2
+      - jinja2==3.1.4
+      - json5==0.10.0
+      - jsonpointer==3.0.0
+      - jsonschema==4.23.0
+      - jsonschema-specifications==2024.10.1
+      - jupyter==1.1.1
+      - jupyter-client==8.6.3
+      - jupyter-console==6.6.3
+      - jupyter-core==5.7.2
+      - jupyter-events==0.10.0
+      - jupyter-lsp==2.2.5
+      - jupyter-server==2.14.2
+      - jupyter-server-terminals==0.5.3
+      - jupyterlab==4.3.3
+      - jupyterlab-pygments==0.3.0
+      - jupyterlab-server==2.27.3
+      - jupyterlab-widgets==3.0.13
+      - markupsafe==3.0.2
+      - matplotlib-inline==0.1.7
+      - mistune==3.0.2
+      - mpmath==1.3.0
+      - multidict==6.1.0
+      - multiprocess==0.70.16
+      - nbclient==0.10.1
+      - nbconvert==7.16.4
+      - nbformat==5.10.4
+      - nest-asyncio==1.6.0
+      - networkx==3.2.1
+      - notebook==7.3.1
+      - notebook-shim==0.2.4
+      - numpy==2.0.2
+      - nvidia-cublas-cu12==12.4.5.8
+      - nvidia-cuda-cupti-cu12==12.4.127
+      - nvidia-cuda-nvrtc-cu12==12.4.127
+      - nvidia-cuda-runtime-cu12==12.4.127
+      - nvidia-cudnn-cu12==9.1.0.70
+      - nvidia-cufft-cu12==11.2.1.3
+      - nvidia-curand-cu12==10.3.5.147
+      - nvidia-cusolver-cu12==11.6.1.9
+      - nvidia-cusparse-cu12==12.3.1.170
+      - nvidia-nccl-cu12==2.21.5
+      - nvidia-nvjitlink-cu12==12.4.127
+      - nvidia-nvtx-cu12==12.4.127
+      - overrides==7.7.0
+      - packaging==24.2
+      - pandas==2.2.3
+      - pandocfilters==1.5.1
+      - parso==0.8.4
+      - pathlib==1.0.1
+      - pexpect==4.9.0
+      - platformdirs==4.3.6
+      - prometheus-client==0.21.1
+      - prompt-toolkit==3.0.48
+      - propcache==0.2.1
+      - protobuf==5.29.1
+      - psutil==6.1.0
+      - ptyprocess==0.7.0
+      - pure-eval==0.2.3
+      - pyarrow==18.1.0
+      - pycparser==2.22
+      - pydantic==2.10.3
+      - pydantic-core==2.27.1
+      - pygments==2.18.0
+      - python-dateutil==2.9.0.post0
+      - python-json-logger==3.2.0
+      - pytz==2024.2
+      - pyyaml==6.0.2
+      - pyzmq==26.2.0
+      - referencing==0.35.1
+      - regex==2024.11.6
+      - requests==2.32.3
+      - rfc3339-validator==0.1.4
+      - rfc3986-validator==0.1.1
+      - rpds-py==0.22.3
+      - safetensors==0.4.5
+      - send2trash==1.8.3
+      - sentry-sdk==2.19.2
+      - setproctitle==1.3.4
+      - six==1.17.0
+      - smmap==5.0.1
+      - sniffio==1.3.1
+      - soupsieve==2.6
+      - stack-data==0.6.3
+      - sympy==1.13.1
+      - terminado==0.18.1
+      - tinycss2==1.4.0
+      - tokenizers==0.13.3
+      - tomli==2.2.1
+      - torch==2.5.1
+      - tornado==6.4.2
+      - tqdm==4.67.1
+      - traitlets==5.14.3
+      - transformers==4.28.1
+      - triton==3.1.0
+      - types-python-dateutil==2.9.0.20241206
+      - typing-extensions==4.12.2
+      - tzdata==2024.2
+      - uri-template==1.3.0
+      - urllib3==2.2.3
+      - wcwidth==0.2.13
+      - webcolors==24.11.1
+      - webencodings==0.5.1
+      - websocket-client==1.8.0
+      - widgetsnbextension==4.0.13
+      - xxhash==3.5.0
+      - yarl==1.18.3
+      - zipp==3.21.0
+prefix: /home/PJLAB/gaoyufei/anaconda3/envs/babyllama
diff --git a/docs/chapter2/code/requirements.txt b/docs/chapter2/code/requirements.txt
deleted file mode 100644
index 0bc857d..0000000
--- a/docs/chapter2/code/requirements.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-datasets==2.14.5
-torch==2.1.2
\ No newline at end of file
diff --git a/docs/chapter2/models/models_download.sh b/docs/chapter2/models/models_download.sh
index ed3de8a..1bb56da 100644
--- a/docs/chapter2/models/models_download.sh
+++ b/docs/chapter2/models/models_download.sh
@@ -2,5 +2,4 @@
 
 
 # GPT-2
-huggingface-cli download --resume-download openai-community/gpt2 --local-dir docs/chapter2/models/GPT-2
-
+huggingface-cli download --resume-download openai-community/gpt2 --local-dir docs/chapter2/models/GPT-2
\ No newline at end of file